Working With Missing And Duplicate Data

As we near the end of the Data Cleaning and Analysis course, we’ll cover a topic that’s essential to any data cleaning workflow — handling missing and duplicate data.

In the Pandas Fundamentals course, you learned that there are various ways to handle missing data:

  • Remove any rows that have missing values.
  • Remove any columns that have missing values.
  • Fill the missing values with some other value.
  • Leave the missing values as is.

In this lesson, you’ll explore each of these options in detail and learn when to use them. You’ll work with 2015, 2016, and 2017 World Happiness Reports again. More specifically, you’ll combine them and clean missing values as you start to define a more complete data cleaning workflow.

Missing or duplicate data may exist in a dataset for a number of different reasons. Sometimes, missing or duplicate data is introduced as we perform cleaning and transformation tasks such as combining data, reindexing data, and reshaping data

Other times, it exists in the original dataset for reasons such as user input error or data storage or conversion issues

In the case of missing values, they may also exist in the original dataset to purposely indicate that data is unavailable.

As you work through each concept, you’ll get to apply what you’ve learned from within your browser; there’s no need to use your own machine to do the exercises. The Python environment inside of this course includes answer-checking to ensure you’ve fully mastered each concept before learning the next.

Objectives

  • Learn techniques for dropping rows and columns with missing data.
  • Learn how to impute values to replicate missing data.
  • Learn how to identify and drop duplicate rows.

Lesson Outline

  1. Introduction
  2. Identifying Missing Values
  3. Correcting Data Cleaning Errors that Result in Missing Values
  4. Visualizing Missing Data
  5. Using Data From Additional Sources to Fill in Missing Values
  6. Identifying Duplicates Values
  7. Correcting Duplicates Values
  8. Handle Missing Values by Dropping Columns
  9. Handle Missing Values by Dropping Columns Continued
  10. Analyzing Missing Data
  11. Handling Missing Values with Imputation
  12. Dropping Rows
  13. Next steps
  14. Takeaways

Get started for free

No credit card required.

Or With

By creating an account you agree to accept our terms of use and privacy policy.