In our Processing Large Datasets in Pandas course, you’ll learn how to work with medium-sized datasets in Python by optimizing your pandas workflow, processing data in batches, and augmenting pandas with SQLite.

In this course, you'll learn to reduce the memory footprint of a pandas dataframe while working with data from the Museum of Modern Art. You'll also learn how to work with dataframe chunks and how to use them to increase processing speed in pandas. You will also get the chance to practice working with dataframe chunks and optimize dataframe types while exploring data from the Lending Club.

After learning about optimizing dataframes and working with dataframe chunks, you will learn how to augment pandas with SQLite to combine the best of both tools. We’ll cover when to use disk space over in-memory space, as well as how to run SQL queries using pandas.

At the end of the course, you'll complete a project in which you will work on a real-life example of using the pandas SQLite workflow to analyze startup fundraising deals using data from CrunchBase. This project is a chance for you to combine the skills you learned in this course and analyze startup fundraising deals using a new workflow. It will also serve as a portfolio project that you can showcase to your future employer so they can feel confident in your data engineering and SQLite skills!

By the end of this course, you'll be able to:

  • Learn how to reduce the memory footprint of a pandas DataFrame.
  • Explore how to process large DataFrame in chunks and using SQLite.

Learn how to Process Large Datasets in Pandas

Optimizing a DataFrame Memory Footprint

Learn how to reduce a DataFrame's memory footprint by selecting the correct data types.

Processing DataFrame in Chunks

Learn how to break a problem down into DataFrame chunks.

Practice Optimizing DataFrames and Processing in Chunks

Practice optimizing DataFrame types and working in chunks.

Augmenting Pandas with SQLite

Learn to create and interpret scatter plots to explore relationships between variables.

Analyzing Startup Fundraising Deals from Crunchbase

Practice analyzing data using the pandas SQLite workflow.