MISSION 346

Working With Strings In Pandas

In the Transforming Data With Pandas lesson, we learned how to use the `apply()`, `map()`, and `applymap()` methods to apply a function to a series. While we could certainly use these methods to clean strings in columns, pandas has built-in many vectorized string methods that can perform these tasks quicker and with fewer keystrokes.

We introduced some of these methods already in the Pandas Fundamentals course when we learned the following data cleaning tasks:

  • Cleaning column names
  • Extracting values from the start of strings
  • Extracting values from the end of strings

In this lesson, we'll learn a couple of other string cleaning tasks such as:

  • Finding specific strings or substrings in columns
  • Extracting substrings from unstructured data
  • Removing strings or substrings from a series

As you learn these tasks, you'll also work to build intuition around how these string methods operate so that you can explore methods you haven't explicitly covered on your own.

We'll again work with the 2015 World Happiness Report and additional economic data from the World Bank.

As you work through each concept, you’ll get to apply what you’ve learned from within your browser; there's no need to use your own machine to do the exercises. The Python environment inside of this course includes answer-checking to ensure you've fully mastered each concept before learning the next.

Objectives

  • Practice manipulating strings with pandas.
  • Learn how to use regular expressions.
  • How to insert and update data in database tables.

Mission Outline

1. Introduction
2. Using Apply to Transform Strings
3. Vectorized String Methods Overview
4. Exploring Missing Values with Vectorized String Methods
5. Finding Specific Words in Strings
6. Finding Specific Words in Strings Continued
7. Extracting Substrings from a Series
8. Extracting Substrings from a Series Continued
9. Extracting All Matches of a Pattern from a Series
10. Extracting More Than One Group of Patterns from a Series
11. Challenge: Clean a String Column, Aggregate the Data, and Plot the Results
12. Next steps
13. Takeaways

python-datacleaning

Course Info:

Beginner

The median completion time for this course is 7.2 hours. View Details

This course requires a basic subscription and includes five missions and one guided project.  It is the sixth course in the Data Analyst in Python path and Data Scientist in Python path.

START LEARNING FREE

Take a Look Inside