MISSION 166

Augmenting Pandas With SQLite

So far on the Data Engineering path, we've explored a few different ways we can work with medium-sized data sets in pandas. First, we learned how to reduce a dataframe's memory footprint by selecting the optimal data types for each column. Then, we discussed how to work with dataframe chunks and modify our processing logic. 

In this mission, we'll explore how to use pandas and SQLite together. If you're unfamiliar with SQLite, we cover the basics of SQLite in our lesson on using PostgreSQL.

We'll continue to work with the data set on MOMA Exhibitions from the first two missions in this processing large datasets course. You can read more about this data set, and download it, at data.world.

As you work through this lesson on using pandas and SQLite, you’ll get to apply what you’ve learned from within your browser so that there's no need to use your own machine to do the exercises. The Python environment inside of this course includes answer checking so you can ensure that you've fully mastered each concept before learning the next concept.

Objectives

  • How to use pandas with SQLite.
  • Learn when to use disk space over in-memory space.
  • Learn to run SQL queries using pandas.

Mission Outline

1. Augmenting Pandas With SQLite
2. Pandas Types vs. SQLite Types
3. Setting Appropriate Types
4. Computing Primarily in SQL
5. Computing Primarily in Pandas
6. Reading in SQL Results Using Chunks
7. Next Steps
8. Takeaways

pandas-large-datasets

Course Info:

Intermediate

The median completion time for this course is 5.3 hours. View Details

This course requires a premium subscription and includes three missions, and two guided projects.  It is the third course in the Data Engineer path.

START LEARNING FREE

Take a Look Inside