mission 136

Data Cleaning Walkthrough

In this data cleaning walkthroguh lesson of this data cleaning project course, we'll walk through the first part of a complete data science project, including how to acquire the raw data. The project will focus on exploring and analyzing a data set. You'll develop our data cleaning and storytelling skills, which will enable us to build complete projects on our own.

At many points in your career, you'll need to be able to build complete, end-to-end data science projects on your own. Data science projects usually consist of one of two things:

  • An exploration and analysis of a set of data. One example might involve analyzing donors to political campaigns, creating a plot, and then sharing an analysis of the plot with others.
  • An operational system that generates predictions based on data that updates continually. An algorithm that pulls in daily stock ticker data and predicts which stock prices will rise and fall would be one example.

You'll find the ability to create data science projects useful in several different contexts:

  •  Projects will help you build a portfolio, which is critical to finding a job as a data analyst or scientist.
  • Working on projects will help you learn new skills and reinforce existing concepts.
  • Most "real-world" data science and analysis work consisting of developing internal projects.
  • Projects allow you to investigate interesting phenomena and satisfy your curiosity.

In this data cleaning walkthrough lesson, you'll use data about New York City public schools. More specifically, you will investigate the correlations between SAT scores and demographics to see which factors such as race, gender, income, etc. influence the SAT score the most.

Objectives

  • Learn how to research and prepare multiple datasets.
  • Learn how to perform data cleaning across multiple datasets.

Mission Outline

1. Introduction
2. Finding All of the Relevant Data Sets
3. Finding Background Information
4. Reading in the Data
5. Exploring the SAT Data
6. Exploring the Remaining Data
7. Reading in the Survey Data
8. Reading in the Survey Data
9. Cleaning Up the Surveys
10. Inserting DBN Fields
11. Inserting DBN Fields
12. Combining the SAT Scores
13. Parsing Geographic Coordinates for Schools
14. Extracting the Longitude
15. Next Steps
16. Takeaways

data-exploration

Course Info:

Intermediate

The median completion time for this course is 7.5 hours. ​View Details​​​

This course requires a basic subscription. This course includes four missions and two guided projects. It is the seventh course in the Data Analyst in Python path and the Data Scientist in Python path.

START LEARNING FREE

Take a Look Inside