Processing Large Datasets in Pandas

Learn how to work with medium-sized datasets by optimizing your pandas workflow, processing data in batches, and augmenting pandas with SQLite.

This course will teach you how to make compelling visualizations using the ggplot package for R.

By the end of this course, you'll be able to:

  • Learn how to reduce the memory footprint of a pandas DataFrame.
  • Explore how to process large DataFrame in chunks and using SQLite.

Course Info:

Processing Large Datasets In Pandas


The average completion time for this course is 10-hours.

This course requires a premium subscription and includes 2 free missions, 2 paid missions, and 1 guided project.  It is the 3rd course in the Data Engineer path.


Learn how to Process Large Datasets in Pandas

Optimizing a DataFrame Memory Footprint

Learn how to reduce a DataFrame's memory footprint by selecting the correct data types.

Processing DataFrame in Chunks

Learn how to break a problem down into DataFrame chunks.

Practice Optimizing DataFrames and Processing in Chunks

Practice optimizing DataFrame types and working in chunks.

Augmenting Pandas with SQLite

Learn to create and interpret scatter plots to explore relationships between variables.

Analyzing Startup Fundraising Deals from Crunchbase

Practice analyzing data using the pandas SQLite workflow.