In this first mission, you will learn to work with and prepare text data in Python. You will learn string manipulation techniques such as replacing substrings, capitalizing strings, and parsing numbers from complex strings. Techniques like these are critical for taking messy text data and turning converting it to a uniform format for easier analysis.
In the Python Fundamentals course, we explored the basics of working with data using the Python programming language. In this course, you'll continue your Python journey with a focus on strings, object-oriented programming, dates and times, and many other concepts that are essential for a data scientist. In later courses, you'll build on this programming knowledge to learn data visualization, statistics, machine learning, and more.
In the first course, the data you worked with didn't have many quirks — all the values were in a consistent format. Data with a consistent format is often described as "clean." But in real life, not all the data we encounter is going to be clean; we often need to prepare it in a process called data cleaning (or sometimes, “data munging”).
In this course, we’ll introduce some basic data cleaning techniques. We cover more advanced data cleaning techniques in our Data Cleaning in Python: Advanced course, but for now, we'll dig into basic data cleaning using a real-world data set about all of the artwork contained in the Museum of Modern Art (MoMA).
1. Reading our MoMA Data Set
2. Calculating Artist Ages
3. Converting Ages to Decades
4. Summarizing the Decade Data
5. Inserting Variables Into Strings
6. Creating an Artist Frequency Table
7. Creating an Artist Summary Function
8. Formatting Numbers Inside Strings
9. Challenge: Summarizing Artwork Gender Data
10. Next Steps