“You get small lessons and can practice skills right away. It’s so convenient. Committing to Dataquest was an easy decision.”

Huyen Vu

Data Engineer @Solita

Path overview

In this path, you’ll master the mandatory technical skills, including Python programming, data pipelines, and data processing. You’ll also learn how to implement algorithms, how to work with multi-table databases using SQL, and key tools like pandas, NumPy, SQLite, MapReduce, and PostgreSQL.

Best of all, you’ll learn by doing — you’ll write code and get feedback directly in the browser. You’ll apply your skills to several guided projects involving realistic business scenarios to build your portfolio and prepare for your next interview.

Key skills

  • Programming with Python and build complex data architecture to support organizations’ data strategy
  • Managing data pipelines and data processes to ensure correct implementation of your data architecture
  • Using data wrangling to clean, reshape, and unify multiple datasets and large amounts of data to be organized for analysis
  • Automating tasks to optimize the entire data workflow

Path outline

Part 1: Introduction to Python [4 courses]

Introduction to Python for Data Engineering 4h

  • Define the fundamentals of programming in Python
  • Employ Jupyter Notebook
  • Build a portfolio project

Dictionaries and Functions in Python 7h

  • Create and update dictionaries
  • Create your own functions
  • Employ Jupyter Notebook
  • Build a portfolio project

Intermediate Python for Data Engineering 6h

  • Clean text data
  • Define object-oriented programming in Python
  • Process dates and times

Programming Concepts in Python 4h

  • Define how Python represents data
  • Define encodings
  • Process text files
  • Optimize data usage

Part 2: Introduction to Algorithms [1 course]

Introduction to Algorithms 8h

  • Analyze the time complexity of an algorithm
  • Analyze the space complexity of an algorithm
  • Trade memory for speed

Part 3: The Command Line and Git [4 courses]

Command Line for Data Science 4h

  • Employ the command line for data science
  • Define important command line concepts
  • Modify the behavior of commands with options
  • Navigate the filesystem
  • Employ glob patterns and wildcards
  • Manage users and permissions

Text Processing for Data Science 4h

  • Read and explore documentation
  • Inspect files
  • Perform basic text processing
  • Define different kinds of output
  • Redirect and pipe output
  • Employ streams and file descriptors

Intermediate Command Line for Data Science 3h

  • Employ Jupyter console
  • Process data from the command line

Introduction to Git and Version Control 4h

  • Organize your code using version control
  • Employ Git and GitHub to collaborate with others
  • Resolve conflicts in version control

Part 4: Working with Data Sources [2 courses]

SQL Fundamentals 5h

  • Analyze data using SQL
  • Organize data using SQL
  • Write SQL queries to estimate summary statistics

Intermediate SQL for Data Engineering 7h

  • Query data across multiple tables
  • Answer business questions using SQL
  • Define table relations

Part 5: Production Databases [2 courses]

PostgresSQL for Data Engineering 8h

  • Identify how Postgres improves data sharing
  • Create tables using Postgres from a CSV file
  • Implement a database

Optimizing PostgreSQL Databases 5h

  • Debug Postgres queries
  • Apply the fundamentals of Postgres's internal tooling
  • Speed up Postgres querying using indexes

Part 6: Handling Large Data Sets in Python [5 courses]

NumPy for Data Engineering 4h

  • Manipulate n-dimensional arrays
  • Perform numeric calculations with n-dimensional arrays
  • Identify the differences between NumPy and pure Python

Processing Large Datasets In Pandas 5h

  • Reduce the memory footprint of a pandas DataFrame
  • Process large DataFrames in chunks using SQLite

Parallel Processing for Data Engineering 5h

  • Process data in parallel
  • Implement MapReduce
  • Solve problems using MapReduce

Introduction to Data Structures 4h

  • Implement linked lists, queues, stacks, and dictionaries
  • Employ inheritance
  • Apply data structures to solve problems

Recursion and Trees for Data Engineering 6h

  • Traverse tree data structures using recursion
  • Identify the different types of tree data structures
  • Implement different types of tree data structures

Part 7: Data Pipelines [1 course]

Building a Data Pipeline 4h

  • Define functional programming
  • Define advanced Python concepts such as closures and decorators
  • Write a robust data pipeline with a scheduler in Python

The Dataquest guarantee


Dataquest has helped thousands of people start new careers in data. If you put in the work and follow our path, you’ll master data skills and grow your career.


We believe so strongly in our paths that we offer a full satisfaction guarantee. If you complete a career path on Dataquest and aren’t satisfied with your outcome, we’ll give you a refund.

Master skills faster with Dataquest

Go from zero to job-ready

Go from zero to job-ready

Learn exactly what you need to achieve your goal. Don’t waste time on unrelated lessons.

Build your project portfolio

Build your project portfolio

Build confidence with our in-depth projects, and show off your data skills.

Challenge yourself with exercises

Challenge yourself with exercises

Work with real data from day one with interactive lessons and hands-on exercises.

Showcase your path certification

Showcase your path certification

Impress employers by completing a capstone project and certifying it with an expert review.

Projects in this path

Project: Learn and Install Jupyter Notebook

Learn the basics of Jupyter Notebook

Guided Project: Profitable App Profiles for the App Store and Google Play Markets

Learn to combine the skills you learned in this course to perform practical data analysis.

Guided Project: Exploring Hacker News Posts

Practice using loops, cleaning strings, and working with dates in python.

Guided Project: Building Fast Queries on a CSV

Apply what you have learned to implement an inventory system for a laptop store with efficient queries.

Project: Git Installation and GitHub Integration

Learn how to install Git and authenticate with GitHub.

Plus 10 more projects

Build your project portfolio with the Data Analyst in Python path.

Learning resources

What certificate should I get for data science? We inte...
Read Article
Will getting a SQL certification actually help you get ...
Read Article
What are 5 real-world tasks that cover most of the skil...
Read Article
If you’re an aspiring business analyst, you probably ...
Read Article

Grow your career with

of learners recommend
Dataquest for career advancement
Dataquest rating on
G2Crowd and SwitchUp
Average salary boost
for learners who complete a path

Aaron Melton

Business Analyst at Aditi Consulting

“Dataquest starts at the most basic level, so a beginner can understand the concepts. I tried learning to code before, using Codecademy and Coursera. I struggled because I had no background in coding, and I was spending a lot of time Googling. Dataquest helped me actually learn.”


Jessica Ko

Machine Learning Engineer at Twitter

“I liked the interactive environment on Dataquest. The material was clear and well organized. I spent more time practicing then watching videos and it made me want to keep learning.”


Victoria E. Guzik

Associate Data Scientist at Callisto Media

“I really love learning on Dataquest. I looked into a couple of other options and I found that they were much too handhold-y and fill in the blank relative to Dataquest’s method. The projects on Dataquest were key to getting my job. I doubled my income!”

Join 1M+ data learners on


Sign up for a free account


Choose a course or path


Learn with hands-on exercises


Apply your skills

Start learning with a free account today.