Working With Strings In Pandas
In the Transforming Data With Pandas lesson, we learned how to use the `apply()`, `map()`, and `applymap()` methods to apply a function to a series. While we could certainly use these methods to clean strings in columns, pandas has built-in many vectorized string methods that can perform these tasks quicker and with fewer keystrokes.
We introduced some of these methods already in the Pandas Fundamentals course when we learned the following data cleaning tasks:
- Cleaning column names
- Extracting values from the start of strings
- Extracting values from the end of strings
In this lesson, we’ll learn a couple of other string cleaning tasks such as:
- Finding specific strings or substrings in columns
- Extracting substrings from unstructured data
- Removing strings or substrings from a series
As you learn these tasks, you’ll also work to build intuition around how these string methods operate so that you can explore methods you haven’t explicitly covered on your own.
We’ll again work with the 2015 World Happiness Report and additional economic data from the World Bank.
As you work through each concept, you’ll get to apply what you’ve learned from within your browser; there’s no need to use your own machine to do the exercises. The Python environment inside of this course includes answer-checking to ensure you’ve fully mastered each concept before learning the next.
- Practice manipulating strings with pandas.
- Learn how to use regular expressions.
- How to insert and update data in database tables.
- Using Apply to Transform Strings
- Vectorized String Methods Overview
- Exploring Missing Values with Vectorized String Methods
- Finding Specific Words in Strings
- Finding Specific Words in Strings Continued
- Extracting Substrings from a Series
- Extracting Substrings from a Series Continued
- Extracting All Matches of a Pattern from a Series
- Extracting More Than One Group of Patterns from a Series
- Challenge: Clean a String Column, Aggregate the Data, and Plot the Results
- Next steps