So far on the Data Engineering path, we've explored a few different ways we can work with medium-sized data sets in pandas. First, we learned how to reduce a dataframe's memory footprint by selecting the optimal data types for each column. Then, we discussed how to work with dataframe chunks and modify our processing logic. 

In this lesson, we'll explore how to use pandas and SQLite together. If you're unfamiliar with SQLite, we cover the basics of SQLite in our lesson on using PostgreSQL.

We'll continue to work with the data set on MOMA Exhibitions from the first two lessons in this processing large datasets course. You can read more about this data set, and download it, at data.world.

As you work through this lesson on using pandas and SQLite, you’ll get to apply what you’ve learned from within your browser so that there's no need to use your own machine to do the exercises. The Python environment inside of this course includes answer checking so you can ensure that you've fully mastered each concept before learning the next concept.

Objectives

  • How to use pandas with SQLite.
  • Learn when to use disk space over in-memory space.
  • Learn to run SQL queries using pandas.

Lesson Outline

1. Augmenting Pandas With SQLite
2. Pandas Types vs. SQLite Types
3. Setting Appropriate Types
4. Computing Primarily in SQL
5. Computing Primarily in Pandas
6. Reading in SQL Results Using Chunks
7. Next Steps
8. Takeaways