MISSION 138

Data Cleaning Walkthrough: Analyzing and Visualizing the Data

In this mission, we'll discover correlations, create plots, and then make maps. The first thing we'll do is find any correlations between any of the columns and sat_score. This will help us determine which columns might be interesting to plot out or investigate further. Afterward, we'll perform more analysis and make maps using the columns we've identified.

At many points in your career, you'll need to be able to build complete, end-to-end data science projects on your own. Data science projects usually consist of one of two things:

  • An exploration and analysis of a set of data. One example might involve analyzing donors to political campaigns, creating a plot, and then sharing an analysis of the plot with others.
  • An operational system that generates predictions based on data that updates continually. For example, an algorithm that pulls in daily stock ticker data and predicts which stock prices will rise and fall.

For this particular end-to-end data science project, we began investigating possible relationships between SAT scores and demographics. In order to do this, we acquired several data sets containing information about New York City public schools. We cleaned them, then combined them into a single data set named combined that we're now ready to analyze and visualize.

You'll find the ability to create data science projects useful in several different contexts:

  • Projects will help you build a portfolio, which is critical to finding a job as a data analyst or scientist.
  • Working on projects will help you learn new skills and reinforce existing concepts.
  • Most "real-world" data science and analysis work consisting of developing internal projects.
  • Projects allow you to investigate interesting phenomena and satisfy your curiosity.

Objectives

  • Learn to compute correlations in pandas.
  • Learn to map schools using basemap.

Mission Outline

1. Introduction
2. Finding Correlations With the r Value
3. Finding Correlations With the r Value
4. Plotting Enrollment With the Plot() Accessor
5. Plotting Enrollment With the Plot() Accessor
6. Exploring Schools With Low SAT Scores and Enrollment
7. Plotting Language Learning Percentage
8. Mapping the Schools With Basemap
9. Mapping the Schools With Basemap
10. Plotting Out Statistics
11. Calculating District-Level Statistics
12. Plotting Percent Of English Learners by District
13. Next Steps
14. Takeways

data-exploration

Course Info:

Intermediate

The median completion time for this course is 7.5 hours. ​View Details​​​

This course requires a basic subscription. This course includes four missions and two guided projects. It is the seventh course in the Data Analyst in Python path and the Data Scientist in Python path.

START LEARNING FREE

Take a Look Inside