MISSION 370

Working With Missing Data

In our Working with Missing Data mission, you will learn to identify and deal with missing and incorrect data. More specifically, you will learn how to use Python and pandas to identify missing data, and you'll learn to decide how best to correct the data and perform the required cleaning.

In addition to learning how to identify missing data using Python and pandas code, you will also learn how to identify missing data using visualizations made with matplotlib and Seaborn, two powerful Python data visualization libraries.

Once you've grasped identifying missing data, you'll learn different techniques for filling in the holes in your dataset. You'll try out various statistical methods that can be used for imputation, and you'll also learn to fill in the gaps by supplementing your data set with external data. Accounting for duplicate or missing data is a common task data scientists face when analyzing data drawn from multiple systems, so it's a crucial data cleaning skill to understand.

In this mission, you will be working to identify and fill in the gaps in real NYPD Motor Vehicle Collisions data. Thinking like a data scientist, you'll explore the data set and practice your new skills and decision-making to turn this messy data into something that's ready for real analysis.

Objectives

  • Identify missing data using both code and visualization.
  • Replace missing data using imputation.
  • Fill in missing values by using external data.

Mission Outline

1. Introduction
2. Verifying the Total Columns
3. Filling and Verifying the Killed and Injured Data
4. Assigning the Corrected Data Back to the Main Dataframe
5. Visualizing Missing Data with Plots
6. Analyzing Correlations in Missing Data
7. Finding the Most Common Values Across Multiple Columns
8. Filling Unknown Values with a Placeholder
9. Missing Data in the "Location" Columns
10. Imputing Location Data
11. Next Steps
12. Takeaways

python-data-cleaning-advanced

Course Info:

Python Data Cleaning Advanced

Intermediate

The average completion time for this course is 6–8-hours.

This course requires a basic subscription and includes four missions. It is the sixth course in the Data Analyst in Python Path and Data Scientist in Python Path

START LEARNING FREE

Take a Look Inside