In the Cleaning and Preparing Data lesson, you discovered that not all data that data scientists encounter is clean data that’s ready for analysis. Oftentimes, you'll need to revamp the date or clean the data to get it into a format ready for analysis. That lesson was focused on cleaning text data and preparing it for analysis by using string techniques such as replacing substrings, capitalizing strings, and parsing numbers from complex strings.

In this mission, you'll build on the cleaning work you did with the Museum of Modern Art (MoMA) data set in the previous mission, and get into the fun part: analyzing the data! You'll learn how to insert variables into strings, improve on your knowledge on creating frequency tables, formatting numbers inside strings, and more! 

You'll also learn how to: calculate how old an artist was when they created their artwork, analyze and interpret the distribution of artist ages, create functions to summarize the data, and print summaries in an easy-to-read-way.

These techniques will be extremely valuable as you continue to learn how to be a data expert. You'll not only use them whenever you analyze data but also when you explore data before performing a more complex task, such as machine learning.


  • Learn basic data analysis techniques.
  • Learn how to summarize numeric data.
  • Learn how to format strings in Python.

Lesson Outline

1. Reading our MoMA Data Set
2. Calculating Artist Ages
3. Converting Ages to Decades
4. Summarizing the Decade Data
5. Inserting Variables Into Strings
6. Creating an Artist Frequency Table
7. Creating an Artist Summary Function
8. Formatting Numbers Inside Strings
9. Challenge: Summarizing Artwork Gender Data
10. Next Steps
11. Takeaways