DATA CLEANING IN R > MISSION 4 > MISSING DATA

Dealing With Missing Data

Attempting to analyze data that is completely messy can be a daunting task, if not impossible. Most of the datasets you come across will require some amount of cleaning before you can start analyzing and making sense of the data. As you progress through your data analyst or data scientist career, 80% of your work will be cleaning data so your analysis can become easier and/or possible. 

Throughout this lesson and subsequent lessons, you will gradually grow your data cleaning in R skills to ensure you are prepared to land your first job in data!

In this lesson, you will learn tools and build intuition you need to decide how to handle missing values in your dataset and learn to decide how best to correct the data and perform the required cleaning.

As you start this lesson, you will learn that missing data refers to data that no value is present for an observation of a variable and can either be classified as implicit and explicit. While you could drop the data, you might be losing some important data. To avoid dropping rows, you'll learn about imputing missing data or substituting the missing data with an appropriate value.

While learning about how to identify and filling in gaps where there is missing data, you'll work with New York City high school data and start to analyze what factors influence SAT scores the most. 

With each concept, you'll be using our code running system with answer checking so you can ensure you've mastered each concept before moving on to the next concept. Thinking like a data scientist, you'll explore the data set and practice your new skills and decision-making to turn this messy data into something that's ready for real analysis.

Objectives

  • Learn techniques for omitting missing values from calculations
  • Understand how different approaches to handling missing values affect analysis.
  • Build intiution to help you decide how to approach analyses when data are missing.

Mission Outline

1. Exploring Academic Success and Demographics by Borough
2. Defining "Missing Data"
3. Contagious Missing Values
4. Dropping Rows With Missing Values for one Variable
5. Complete Cases: Dropping All Rows With Missing Data
6. Using Complete Cases: When to Avoid
7. Understanding Effects of Different Techniques for Handling Missing Data
8. Imputing to Replace Missing Data
9. Next Steps
10. Takeaways

r-data-cleaning

Course Info:

Advanced

The median completion time for this course is 8.05 hours. View details

This course is free, and includes four missions and one guided project. It is the fourth course in the Data Analyst in R path.

START LEARNING FREE

Take a Look Inside