In our Working with Missing Data mission, you will learn to identify and deal with missing and incorrect data. More specifically, you will learn how to use R to identify missing data, and you'll learn to decide how best to correct the data and perform the required cleaning.
In addition to learning how to identify missing data using R code, you will also learn how to identify missing data using visualizations made using ggplot2.
Once you've grasped identifying missing data, you'll learn different techniques for filling in the holes in your dataset. You'll try out various statistical methods that can be used for imputation, and you'll also learn to fill in the gaps by supplementing your data set with external data. Accounting for duplicate or missing data is a common task data scientists face when analyzing data drawn from multiple systems, so it's a crucial data cleaning skill to understand.
In this mission, you will be working to identify and fill in the gaps in real NYPD Motor Vehicle Collisions data. Thinking like a data scientist, you'll explore the data set and practice your new skills and decision-making to turn this messy data into something that's ready for real analysis.
2. Summing Values over Rows
3. Verifying the Total Columns
4. Filling and Verifying the Killed and Injured Data
5. Preparing Data for Missing Data Visualization
6. Visualizating Missing Data with Heatmaps
7. Visualizing Correlation Matrix with Heatmaps
8. Analyzing Correlations in Missing Data
9. Finding the Most Common Values Across Multiple Columns
10. Filling Unknown Values with a Placeholder
11. Missing Data in the "Location" Columns
12. Next Steps