Path overview
In this path, you’ll master the mandatory technical skills, including Python programming, data pipelines, and data processing. You’ll also learn how to implement algorithms, how to work with multi-table databases using SQL, and key tools like pandas, NumPy, SQLite, MapReduce, and PostgreSQL.
Best of all, you’ll learn by doing — you’ll write code and get feedback directly in the browser. You’ll apply your skills to several guided projects involving realistic business scenarios to build your portfolio and prepare for your next interview.
Key skills
- Programming with Python and build complex data architecture to support organizations’ data strategy
- Managing data pipelines and data processes to ensure correct implementation of your data architecture
- Using data wrangling to clean, reshape, and unify multiple datasets and large amounts of data to be organized for analysis
- Automating tasks to optimize the entire data workflow
Path outline
Part 1: Introduction to Python [4 courses]
Introduction to Python for Data Engineering 4h
Objectives- Define the fundamentals of programming in Python
- Employ Jupyter Notebook
- Build a portfolio project
Dictionaries and Functions in Python 7h
Objectives- Create and update dictionaries
- Create your own functions
- Employ Jupyter Notebook
- Build a portfolio project
Intermediate Python for Data Engineering 6h
Objectives- Clean text data
- Define object-oriented programming in Python
- Process dates and times
Programming Concepts in Python 4h
Objectives- Define how Python represents data
- Define encodings
- Process text files
- Optimize data usage
Part 2: Introduction to Algorithms [1 course]
Introduction to Algorithms 8h
Objectives- Analyze the time complexity of an algorithm
- Analyze the space complexity of an algorithm
- Trade memory for speed
Part 3: The Command Line and Git [4 courses]
Command Line for Data Science 4h
Objectives- Employ the command line for data science
- Define important command line concepts
- Modify the behavior of commands with options
- Navigate the filesystem
- Employ glob patterns and wildcards
- Manage users and permissions
Text Processing for Data Science 4h
Objectives- Read and explore documentation
- Inspect files
- Perform basic text processing
- Define different kinds of output
- Redirect and pipe output
- Employ streams and file descriptors
Intermediate Command Line for Data Science 3h
Objectives- Employ Jupyter console
- Process data from the command line
Introduction to Git and Version Control 4h
Objectives- Organize your code using version control
- Employ Git and GitHub to collaborate with others
- Resolve conflicts in version control
Part 4: Working with Data Sources Using SQL [5 courses]
Introduction to SQL and Databases 5h
Objectives- Define the structure of SQL
- Create basic queries to extract data from tables in a database
- Define databases
- Identify different versions of SQL
- Write good SQL code
Summarizing Data in SQL 3h
Objectives- Employ SQL to compute statistics
- Provide statistics by group
- Filter results over groups
Combining Tables in SQL 3h
Objectives- Combine tables using inner joins
- Employ different types of joins
- Employ other SQL clauses with joins
- Join on complex conditions
- Employ set operators like UNION and EXCEPT
Querying SQLite from Python 1h
Objectives- Run SQL queries using sqlite3 in Python
- Employ cursors and tuples
SQL Subqueries 6h
Objectives- Nest a query inside another query
- Employ different types of subqueries
- Employ common table expressions
- Scale your project with complex queries
Part 5: Production Databases [2 courses]
PostgreSQL for Data Engineering 8h
Objectives- Identify how Postgres improves data sharing
- Create tables using Postgres from a CSV file
- Implement a database
Optimizing PostgreSQL Databases 5h
Objectives- Debug Postgres queries
- Apply the fundamentals of Postgres's internal tooling
- Speed up Postgres querying using indexes
Part 6: Handling Large Data Sets in Python [5 courses]
NumPy for Data Engineering 4h
Objectives- Manipulate n-dimensional arrays
- Perform numeric calculations with n-dimensional arrays
- Identify the differences between NumPy and pure Python
Processing Large Datasets In Pandas 5h
Objectives- Reduce the memory footprint of a pandas DataFrame
- Process large DataFrames in chunks using SQLite
Parallel Processing for Data Engineering 5h
Objectives- Process data in parallel
- Implement MapReduce
- Solve problems using MapReduce
Introduction to Data Structures 4h
Objectives- Implement linked lists, queues, stacks, and dictionaries
- Employ inheritance
- Apply data structures to solve problems
Recursion and Trees for Data Engineering 6h
Objectives- Traverse tree data structures using recursion
- Identify the different types of tree data structures
- Implement different types of tree data structures
Part 7: Data Pipelines [1 course]
Building a Data Pipeline 4h
Objectives- Define functional programming
- Define advanced Python concepts such as closures and decorators
- Write a robust data pipeline with a scheduler in Python
The Dataquest guarantee
Dataquest has helped thousands of people start new careers in data. If you put in the work and follow our path, you’ll master data skills and grow your career.
We believe so strongly in our paths that we offer a full satisfaction guarantee. If you complete a career path on Dataquest and aren’t satisfied with your outcome, we’ll give you a refund.
Master skills faster with Dataquest
Go from zero to job-ready
Learn exactly what you need to achieve your goal. Don’t waste time on unrelated lessons.
Build your project portfolio
Build confidence with our in-depth projects, and show off your data skills.
Challenge yourself with exercises
Work with real data from day one with interactive lessons and hands-on exercises.
Showcase your path certification
Share the evidence of your hard work with your network and potential employers.
Projects in this path
Learn and Install Jupyter Notebook
For this project, you’ll take on the role of a Jupyter Notebook beginner. You’ll learn the essentials of running code, adding explanatory text, and installing Jupyter locally to prepare for real-world data projects.
Profitable App Profiles for the App Store and Google Play Markets
For this project, we’ll assume the role of data analysts for a company that builds free Android and iOS apps. Our revenue depends on in-app ads, so our goal is to analyze data to determine which kinds of apps attract more users.
Exploring Hacker News Posts
For this project, we’ll step into the role of data analysts to explore Hacker News submissions, analyzing trends using skills in string manipulation, object-oriented programming, and date handling in Python.
Building Fast Queries on a CSV
For this project, we’ll step into the role of Python developers to build an inventory system for a laptop store. We’ll apply efficient data structures and algorithms to enable fast queries.
Git Installation and GitHub Integration
For this project, you’ll set up Git and GitHub to start using version control for your data science projects, enabling you to track changes, collaborate with others, and share your work.