Course overview

In your data science career, you’ll rarely get a dataset that is in precisely the state you want. That’s why data cleaning is such an invaluable skill in data science. This course builds on our previous Advanced Data Cleaning course and will make you a valuable asset to any data science team.

After learning how to prepare the data for analysis, the real fun begins — you’ll complete two data analysis and visualization guided projects using data from some of the biggest names in film culture.

Key skills

• Using the “two-phase” process to complete end-to-end data cleaning projects
• Combining, manipulating, exploring, and analyzing multiple datasets
• Completing compelling data cleaning guided projects

Course outline

Data Cleaning Walkthrough 2h

Lesson Objectives
• Research and prepare multiple datasets
• Clean data across multiple datasets

Data Cleaning Walkthrough: Combining the Data 2h

Lesson Objectives
• Combine multiple datasets
• Perform joins in pandas

Data Cleaning Walkthrough: Analyzing and Visualizing the Data 1h

Lesson Objectives
• Compute correlations in pandas
• Map schools using basemap

Guided Project: Analyzing NYC High School Data 1h

Lesson Objectives
• Generate scatter plots to compare columns

Challenge: Cleaning Data 1h

Lesson Objectives
• Clean data in pandas
• Apply functions over columns in pandas

Guided Project: Star Wars Survey 2h

Lesson Objectives
• Clean and map column values in pandas
• Compute summary statistics

Projects in this course

Analyzing NYC High School Data

For this project, you’ll assume the role of a data scientist analyzing relationships between SAT scores and demographic factors in NYC public schools to determine if the SAT is a fair test.

Star Wars Survey

For this project, you’ll become a data analyst exploring FiveThirtyEight’s Star Wars survey data. You’ll use Python and pandas to map values, compute statistics, and analyze the data to uncover fan film preferences.

