MISSION 165

Guided Project: Practice Optimizing Dataframes and Processing in Chunks

In this guided project, we'll practice processing large datasets and optimizing a dataframe's memory usage. We'll be working with financial lending data from Lending Club, a marketplace for personal loans that matches borrowers with investors.

In this project specifically, you’ll be tasked with dealing with Lending Club’s rather massive data set by optimizing the way that your data is processed to reduce the load on memory. This will include optimizing the dataframe’s memory footprint and processing the data in chunks, as we have covered in previous lessons.

These projects are meant to be challenging to better prepare you for the real world, so don't be discouraged if you have to refer back to previous missions. If you haven't worked with Jupyter Notebook before or need a refresher, we recommend completing our Jupyter Notebook Guided Project before continuing.

As with all guided projects, we encourage you to experiment and extend your project, taking it in unique directions after you finish the guided tasks to make it a more compelling addition to your portfolio!

Objectives

  • Learn to apply your dataframe chunking skills on a new dataset.

Mission Outline

1. Introduction
2. Exploring the Data in Chunks
3. Optimizing String Columns
4. Optimizing Numeric Columns
5. Next Steps

pandas-large-datasets

Course Info:

Intermediate

The median completion time for this course is 5.3 hours. View Details

This course requires a premium subscription and includes three missions, and two guided projects.  It is the third course in the Data Engineer path.

START LEARNING FREE

Take a Look Inside