MISSION 343

Data Aggregation

So far, we've learned how to use the pandas library and how to create visualizations with data sets that didn't require much cleanup. However, most data sets in real life require extensive cleaning and manipulation before we can extract any meaningful insights. In fact, Forbes estimates that data scientists spend about 60% of their time cleaning and organizing data, so it's critical to be able to manipulate data quickly and efficiently.

In this lesson, we'll start by learning how to aggregate data with pandas. You'll learn how to use loops to aggregate data and then how to aggregate data using GroupBy objects. After learning about the GroupBy object, you will learn how to compute multiple and custom aggregations with the `agg()` method. You will also learn how to perform aggregations using pivot tables.

Throughout this course, we'll work with the World Happiness Report, an annual report created by the UN Sustainable Development Solutions Network with the intent of guiding policy. The report assigns each country a happiness score based on the answers to a poll question that asks respondents to rank their life on a scale of 0 to 10.

It also includes estimates of factors that may contribute to each country's happiness, including economic production, social support, life expectancy, freedom, absence of corruption, and generosity, to provide context for the score. Although these factors aren't actually used in the calculation of the happiness score, they can help illustrate why a country received a certain score.

Objectives

  • Learn different techniques for aggregating data.
  • Build intuition around the groupby operation.

Mission Outline

1. Introduction
2. Introduction to the Data
3. Using Loops to Aggregate Data
4. The GroupBy Operation
5. Creating GroupBy Objects
6. Exploring GroupBy Objects
7. Common Aggregation Methods with Groupby
8. Aggregating Specific Columns with Groupby
9. Introduction to the Agg() Method
10. Computing Multiple and Custom Aggregations with the Agg() Method
11. Aggregation with Pivot Tables
12. Aggregating Multiple Columns and Functions with Pivot Tables
13. Next steps
14. Takeaways

python-datacleaning

Course Info:

Beginner

The median completion time for this course is 7.2 hours. View Details

This course requires a basic subscription and includes five missions and one guided project.  It is the sixth course in the Data Analyst in Python path and Data Scientist in Python path.

START LEARNING FREE

Take a Look Inside