career Path

Data Engineer

Name: Data Engineer
Availability: RequiresRegistration
Rating: 4.8 (359 reviews)

From zero to job-ready in 4 months

Get all the skills and knowledge you need to become a data engineer. You’ll learn how to work with data architecture, data processing, and data systems. By the end, you’ll be able to build a unique data infrastructure, manage data pipelines and data processing, and maintain data systems.

Enroll For Free

4.8 (359 reviews)

117,575 learners enrolled in this path.

Beginner friendly
4 months (5 hrs/week)
Self paced
22 Courses
12 projects

“You get small lessons and can practice skills right away. It’s so convenient. Committing to Dataquest was an easy decision.”

Huyen Vu

Data Engineer @Solita

Data Engineer

Enroll For Free

Path overview

In this path, you’ll master the mandatory technical skills, including Python programming, data pipelines, and data processing. You’ll also learn how to implement algorithms, how to work with multi-table databases using SQL, and key tools like pandas, NumPy, SQLite, MapReduce, and PostgreSQL.
Best of all, you’ll learn by doing – you’ll write code and get feedback directly in the browser. You’ll apply your skills to several guided projects involving realistic business scenarios to build your portfolio and prepare for your next interview.

Key skills

Programming with Python and build complex data architecture to support organizations' data strategy
Managing data pipelines and data processes to ensure correct implementation of your data architecture
Using data wrangling to clean, reshape, and unify multiple datasets and large amounts of data to be organized for analysis
Automating tasks to optimize the entire data workflow

Path outline

Part 1: Introduction to Python [4 courses]

Introduction to Python for Data Engineering 4h

Objectives

Define the fundamentals of programming in Python
Employ Jupyter Notebook
Build a portfolio project

Dictionaries and Functions in Python 5h

Objectives

Create and update dictionaries
Create your own functions
Employ Jupyter Notebook
Build a portfolio project

Intermediate Python for Data Engineering 6h

Objectives

Clean text data
Define object-oriented programming in Python
Process dates and times

Programming Concepts in Python 4h

Objectives

Define how Python represents data
Define encodings
Process text files
Optimize data usage

Part 2: Introduction to Algorithms [1 course]

Introduction to Algorithms 8h

Objectives

Analyze the time complexity of an algorithm
Analyze the space complexity of an algorithm
Trade memory for speed

Part 3: The Command Line and Git [4 courses]

Command Line for Data Science 4h

Objectives

Employ the command line for data science
Define important command line concepts
Modify the behavior of commands with options
Navigate the filesystem
Employ glob patterns and wildcards
Manage users and permissions

Text Processing for Data Science 4h

Objectives

Read and explore documentation
Inspect files
Perform basic text processing
Define different kinds of output
Redirect and pipe output
Employ streams and file descriptors

Intermediate Command Line for Data Science 3h

Objectives

Employ Jupyter console
Process data from the command line

Introduction to Git and Version Control 3h

Objectives

Organize your code using version control
Employ Git and GitHub to collaborate with others
Resolve conflicts in version control

Part 4: Working with Data Sources Using SQL [5 courses]

Introduction to SQL and Databases 5h

Objectives

Define the structure of SQL
Create basic queries to extract data from tables in a database
Define databases
Identify different versions of SQL
Write good SQL code

Summarizing Data in SQL 3h

Objectives

Employ SQL to compute statistics
Provide statistics by group
Filter results over groups

Combining Tables in SQL 3h

Objectives

Combine tables using inner joins
Employ different types of joins
Employ other SQL clauses with joins
Join on complex conditions
Employ set operators like UNION and EXCEPT

Querying SQLite from Python 1h

Objectives

Run SQL queries using sqlite3 in Python
Employ cursors and tuples

SQL Subqueries 6h

Objectives

Nest a query inside another query
Employ different types of subqueries
Employ common table expressions
Scale your project with complex queries

Part 5: Production Databases [2 courses]

PostgreSQL for Data Engineering 8h

Objectives

Identify how Postgres improves data sharing
Create tables using Postgres from a CSV file
Implement a database

Optimizing PostgreSQL Databases 5h

Objectives

Debug Postgres queries
Apply the fundamentals of Postgres's internal tooling
Speed up Postgres querying using indexes

Part 6: Handling Large Data Sets in Python [5 courses]

NumPy for Data Engineering 4h

Objectives

Manipulate n-dimensional arrays
Perform numeric calculations with n-dimensional arrays
Identify the differences between NumPy and pure Python

Processing Large Datasets In Pandas 5h

Objectives

Reduce the memory footprint of a pandas DataFrame
Process large DataFrames in chunks using SQLite

Parallel Processing for Data Engineering 5h

Objectives

Process data in parallel
Implement MapReduce
Solve problems using MapReduce

Introduction to Data Structures 4h

Objectives

Implement linked lists, queues, stacks, and dictionaries
Employ inheritance
Apply data structures to solve problems

Recursion and Trees for Data Engineering 6h

Objectives

Traverse tree data structures using recursion
Identify the different types of tree data structures
Implement different types of tree data structures

Part 7: Data Pipelines [1 course]

Building a Data Pipeline 4h

Objectives

Define functional programming
Define advanced Python concepts such as closures and decorators
Write a robust data pipeline with a scheduler in Python

The Dataquest guarantee

Dataquest has helped thousands of people start new careers in data. If you put in the work and follow our path, you’ll master data skills and grow your career.

We believe so strongly in our paths that we offer a full satisfaction guarantee. If you complete a career path on Dataquest and aren’t satisfied with your outcome, we’ll give you a refund.

Master skills faster with Dataquest

Go from zero to job-ready

Learn exactly what you need to achieve your goal. Don’t waste time on unrelated lessons.

Build your project portfolio

Build confidence with our in-depth projects, and show off your data skills.

Challenge yourself with exercises

Work with real data from day one with interactive lessons and hands-on exercises.

Showcase your path certification

Share the evidence of your hard work with your network and potential employers.

Projects in this path

Profitable App Profiles for the App Store and Google Play Markets

For this project, we’ll assume the role of data analysts for a company that builds free Android and iOS apps. Our revenue depends on in-app ads, so our goal is to analyze data to determine which kinds of apps attract more users.

View Project

Exploring Hacker News Posts

For this project, we’ll step into the role of data analysts to explore Hacker News submissions, analyzing trends using skills in string manipulation, object-oriented programming, and date handling in Python.

View Project

Building Fast Queries on a CSV

For this project, we’ll step into the role of Python developers to build an inventory system for a laptop store. We’ll apply efficient data structures and algorithms to enable fast queries.

View Project

Analyzing Kickstarter Projects

For this project, you’ll assume the role of a data analyst at a startup considering launching a Kickstarter campaign. You’ll analyze data to help the team understand what might influence a campaign’s success.

View Project

Customers and Products Analysis Using SQL

For this project, you’ll step into the role of a data analyst at a scale model car company. You’ll use SQL skills like joins and subqueries to explore a sales database and provide data-driven answers to key business questions about inventory, customers, and marketing.

View Project

Plus 7 more projects

Build your project portfolio with the Data Analyst in Python path.

Learning resources

What is a Data Engineer?

Data engineers build pipelines that prepare and transfo...

Read Article

Why You Should Learn Data Engineering

Why should you learn data engineering? Here are four bi...

Read Article

8 Data Engineering Jobs That Are in Demand

Are you interested in a career that combines cutting-ed...

Read Article

Grow your career with
Dataquest.

98%

of learners recommend

Dataquest for career advancement

4.85

Dataquest rating

SwitchUp Best Bootcamps

$30k

Average salary boost

for learners who complete a path

Aaron Melton

Business Analyst at Aditi Consulting

“Dataquest starts at the most basic level, so a beginner can understand the concepts. I tried learning to code before, using Codecademy and Coursera. I struggled because I had no background in coding, and I was spending a lot of time Googling. Dataquest helped me actually learn.”

Jessica Ko

Machine Learning Engineer at Twitter

“I liked the interactive environment on Dataquest. The material was clear and well organized. I spent more time practicing then watching videos and it made me want to keep learning.”

Victoria E. Guzik

Associate Data Scientist at Callisto Media

“I really love learning on Dataquest. I looked into a couple of other options and I found that they were much too handhold-y and fill in the blank relative to Dataquest’s method. The projects on Dataquest were key to getting my job. I doubled my income!”

Data Engineer

Huyen Vu

Data Engineer

Path overview

Key skills

Path outline

Part 1: Introduction to Python [4 courses]

Introduction to Python for Data Engineering 4h

Dictionaries and Functions in Python 5h

Intermediate Python for Data Engineering 6h

Programming Concepts in Python 4h

Part 2: Introduction to Algorithms [1 course]

Introduction to Algorithms 8h

Part 3: The Command Line and Git [4 courses]

Command Line for Data Science 4h

Text Processing for Data Science 4h

Intermediate Command Line for Data Science 3h

Introduction to Git and Version Control 3h

Part 4: Working with Data Sources Using SQL [5 courses]

Introduction to SQL and Databases 5h

Summarizing Data in SQL 3h

Combining Tables in SQL 3h

Querying SQLite from Python 1h

SQL Subqueries 6h

Part 5: Production Databases [2 courses]

PostgreSQL for Data Engineering 8h

Optimizing PostgreSQL Databases 5h

Part 6: Handling Large Data Sets in Python [5 courses]

NumPy for Data Engineering 4h

Processing Large Datasets In Pandas 5h

Parallel Processing for Data Engineering 5h

Introduction to Data Structures 4h

Recursion and Trees for Data Engineering 6h

Part 7: Data Pipelines [1 course]

Building a Data Pipeline 4h

The Dataquest guarantee

Master skills faster with Dataquest

Go from zero to job-ready

Build your project portfolio

Challenge yourself with exercises

Showcase your path certification

Projects in this path

Profitable App Profiles for the App Store and Google Play Markets

Exploring Hacker News Posts

Building Fast Queries on a CSV

Analyzing Kickstarter Projects

Customers and Products Analysis Using SQL

Plus 7 more projects

Learning resources

What is a Data Engineer?

Why You Should Learn Data Engineering

8 Data Engineering Jobs That Are in Demand

Grow your career withDataquest.

Aaron Melton

Business Analyst at Aditi Consulting

Jessica Ko

Machine Learning Engineer at Twitter

Victoria E. Guzik

Associate Data Scientist at Callisto Media

Join 1M+ data learners onDataquest.

Create a free account

Choose a learning path

Complete exercises and projects

Advance your career

Start learning today

Grow your career with
Dataquest.

Join 1M+ data learners on
Dataquest.