In our interactive control flow, iteration, and functions course, we've learned a lot of different programming concepts: control flow, loops and functions.

These concepts are great to know in isolation, but really they should be thought of as tools in a toolkit. These programming "tools" are what we will use to clean and manipulate data when we encounter a dataset. It is extremely rare to receive a dataset that will not need any processing before moving on to analysis, so we must always be prepared to clean it to our needs.

As such, we must learn how to use these tools in different contexts, which translates to one thing: practicing data analysis! In this guided project, we'll apply all of the concepts that we've learned to get acquainted with a dataset, do some data cleaning and analyze it to get some information from it.

In this guided project, we will be acting as a data analyst for a company that sells books for learning programming. Your company has produced multiple books, and each has received many reviews. Your company wants we to check out the sales data and see if we can extract any useful information from it. We'll walk through this process as we progress through the project. 

At the end of this project, you'll have an end-to-end data analysis workflow taking shape!


  • Synthesize what you've learned in our Control Flow course.
  • Apply your new skills to a real data analysis problem.
  • Build a complete data science project!

Lesson Outline

  1. Introduction
  2. Getting Familiar with the Data
  3. Handing Missing Data
  4. Dealing with Inconsistent Labels
  5. Transforming the Review Data
  6. Analyzing the Data
  7. Reporting the Results
  8. Further Steps
  9. Next Steps