Augmenting Pandas With SQLite
So far on the Data Engineering path, we've explored a few different ways we can work with medium-sized data sets in pandas. First, we learned how to reduce a dataframe's memory footprint by selecting the optimal data types for each column. Then, we discussed how to work with dataframe chunks and modify our processing logic.
In this mission, we'll explore how to use pandas and SQLite together. If you're unfamiliar with SQLite, we cover the basics of SQLite in our lesson on using PostgreSQL.
We'll continue to work with the data set on MOMA Exhibitions from the first two missions in this processing large datasets course. You can read more about this data set, and download it, at data.world.
As you work through this lesson on using pandas and SQLite, you’ll get to apply what you’ve learned from within your browser so that there's no need to use your own machine to do the exercises. The Python environment inside of this course includes answer checking so you can ensure that you've fully mastered each concept before learning the next concept.
Objectives
Mission Outline
1. Augmenting Pandas With SQLite
2. Pandas Types vs. SQLite Types
3. Setting Appropriate Types
4. Computing Primarily in SQL
5. Computing Primarily in Pandas
6. Reading in SQL Results Using Chunks
7. Next Steps
8. Takeaways