MISSION 164

Processing Dataframes in Chunks

In the last lesson, we explored how to reduce a pandas dataframe's memory footprint by selecting the correct column types. 

But we need a different strategy for working with data sets that don't fit into memory even after we've optimized types and filtered columns. Instead of trying to load the full data set into memory, we can load and process it in chunks. 

In this lesson, you'll learn how to break a problem down into dataframe chunks, and when processing large datasets in chunks is beneficial.

In addition to learning how to process dataframes in chunks, you'll learn about GroupBy objects, how to use them, and how to observe the groups in a GroupBy object. 

To facilitate your learning of processing dataframes in chunks, you'll continue working with data on the Museum of Modern Art's exhibitions from 1929 to 1989. By the end of this lesson, you should have a good understanding of processing dataframes and how the process can be advantageous.

As you work through this lesson, you’ll get to apply what you’ve learned from within your browser so that there's no need to use your own machine to do the exercises. The Python environment inside of this course includes answer checking so you can ensure that you've fully mastered each concept before learning the next concept.

Objectives

  • Learn how to use dataframe chunks.
  • Learn how to increase processing speed in pandas.

Mission Outline

1. Processing Chunks
2. Processing Chunks
3. Counting Across Chunks
4. Batch Processing
5. Optimizing Performance
6. Counting Unique Values
7. Combining Chunks Using GroupBy
8. Working With Intermediate Dataframes
9. Next Steps
10. Takeaways

pandas-large-datasets

Course Info:

Intermediate

The median completion time for this course is 5.3 hours. View Details

This course requires a premium subscription and includes three missions, and two guided projects.  It is the third course in the Data Engineer path.

START LEARNING FREE

Take a Look Inside