In our Working with Missing Data mission, you will learn to identify and deal with missing and incorrect data. More specifically, you will learn how to use Python and pandas to identify missing data, decide how best to correct the data and perform the required cleaning. You’ll learn how to identify missing data using Python and pandas code, as well as how to identify missing data using visualization using matplotlib and seaborn, two powerful visualization libraries.
Additionally, you will learn to fill in missing data using either imputation or by using external data. In many situations, a data analyst or data scientist will find themselves in a scenario where they'll need to account for duplicate or missing data when analyzing data drawn from multiple systems.
In this mission, you will work with NYPD Motor Vehicle Collisions data to give a thorough overview of how to identify and fill in missing data. Because you'll be working with real-world data, you will get the opportunity to think like a data analyst or data scientist as you explore a dataset. By the end of this mission, you will have a better working knowledge of regular expressions and how to use them to do some powerful string manipulation.
2. Verifying the Total Columns
3. Filling and Verifying the Killed and Injured Data
4. Assigning the Corrected Data Back to the Main Dataframe
5. Visualizing Missing Data with Plots
6. Analyzing Correlations in Missing Data
7. Finding the Most Common Values Across Multiple Columns
8. Filling Unknown Values with a Placeholder
9. Missing Data in the "Location" Columns
10. Imputing Location Data
11. Next Steps