Working With Missing Data
In our Working with Missing Data mission, you will learn to identify and deal with missing and incorrect data. More specifically, you will learn how to use Python and pandas to identify missing data, and you'll learn to decide how best to correct the data and perform the required cleaning.
In addition to learning how to identify missing data using Python and pandas code, you will also learn how to identify missing data using visualizations made with matplotlib and Seaborn, two powerful Python data visualization libraries.
Once you've grasped identifying missing data, you'll learn different techniques for filling in the holes in your dataset. You'll try out various statistical methods that can be used for imputation, and you'll also learn to fill in the gaps by supplementing your data set with external data. Accounting for duplicate or missing data is a common task data scientists face when analyzing data drawn from multiple systems, so it's a crucial data cleaning skill to understand.
In this mission, you will be working to identify and fill in the gaps in real NYPD Motor Vehicle Collisions data. Thinking like a data scientist, you'll explore the data set and practice your new skills and decision-making to turn this messy data into something that's ready for real analysis.
2. Verifying the Total Columns
3. Filling and Verifying the Killed and Injured Data
4. Assigning the Corrected Data Back to the Main Dataframe
5. Visualizing Missing Data with Plots
6. Analyzing Correlations in Missing Data
7. Finding the Most Common Values Across Multiple Columns
8. Filling Unknown Values with a Placeholder
9. Missing Data in the "Location" Columns
10. Imputing Location Data
11. Next Steps