Data Cleaning Walkthrough

In this data cleaning walkthroguh lesson of this data cleaning project course, we’ll walk through the first part of a complete data science project, including how to acquire the raw data. The project will focus on exploring and analyzing a data set. You’ll develop our data cleaning and storytelling skills, which will enable us to build complete projects on our own.

At many points in your career, you’ll need to be able to build complete, end-to-end data science projects on your own. Data science projects usually consist of one of two things:

  • An exploration and analysis of a set of data. One example might involve analyzing donors to political campaigns, creating a plot, and then sharing an analysis of the plot with others.
  • An operational system that generates predictions based on data that updates continually. An algorithm that pulls in daily stock ticker data and predicts which stock prices will rise and fall would be one example.

You’ll find the ability to create data science projects useful in several different contexts:

  •  Projects will help you build a portfolio, which is critical to finding a job as a data analyst or scientist.
  • Working on projects will help you learn new skills and reinforce existing concepts.
  • Most “real-world” data science and analysis work consisting of developing internal projects.
  • Projects allow you to investigate interesting phenomena and satisfy your curiosity.

In this data cleaning walkthrough lesson, you’ll use data about New York City public schools. More specifically, you will investigate the correlations between SAT scores and demographics to see which factors such as race, gender, income, etc. influence the SAT score the most.


  • Learn how to research and prepare multiple datasets.
  • Learn how to perform data cleaning across multiple datasets.

Lesson Outline

  1. Introduction
  2. Finding All of the Relevant Data Sets
  3. Finding Background Information
  4. Reading in the Data
  5. Exploring the SAT Data
  6. Exploring the Remaining Data
  7. Reading in the Survey Data
  8. Reading in the Survey Data
  9. Cleaning Up the Surveys
  10. Inserting DBN Fields
  11. Inserting DBN Fields
  12. Combining the SAT Scores
  13. Parsing Geographic Coordinates for Schools
  14. Extracting the Longitude
  15. Next Steps
  16. Takeaways

Get started for free

No credit card required.

Or With

By creating an account you agree to accept our terms of use and privacy policy.