Exploring Data With pandas: Fundamentals
In the last mission, you learned the basics of the pandas library. We explored the primary data structure in pandas, the DataFrame, and learned some of the ways pandas makes working with data easier than NumPy:
- Axis values in DataFrames can have string labels, not just numeric ones, which makes selecting data much easier.
- DataFrames can contain columns with multiple data types: including integer, float, and string.
In this mission, you’ll learn another way pandas makes works with data easier. It has many built-in methods such as .describe()
, .iloc[]
, .loc[]
, as well as functions for common exploration and analysis tasks. As you learn these, you’ll also explore how pandas uses many of the concepts we learned in the NumPy missions, including vectorized operations and boolean indexing.
As in the previous mission, you’ll be working with a data set from Fortune magazine‘s Global 500 list 2017, which ranks the top 500 corporations worldwide by revenue. At the end of the mission, you will calculate a specific statistic for each of the three most common countries in the data set.
As with every mission at Dataquest, you’ll be given an opportunity to practice each concept using our code editor with built-in answer checking to ensure that you’ve mastered a concept before moving on to this next.
Objectives
- How to use common methods for exploring data.
- How to assign data in pandas using index labels.
- How to use pandas to analyze data.
Lesson Outline
- Introduction to the Data
- Vectorized Operations
- Series Data Exploration Methods
- Series Describe Method
- Method Chaining
- Dataframe Exploration Methods
- Dataframe Describe Method
- Assignment with pandas
- Using Boolean Indexing with pandas Objects
- Using Boolean Arrays to Assign Values
- Creating New Columns
- Challenge: Top Performers by Country
- Next Steps
- Takeaways
Get started for free
No credit card required.
By creating an account you agree to accept our terms of use and privacy policy.