MISSION 176

Hash Tables

In the past few lessons, we learned about algorithms and data structures specifically arrays and how to sort and search them. We found that we could search for a value in an unsorted array in O(n) time using a linear search. But, there are many times when we want faster search performance than the O(n) time of a linear search, or even the O(log n) time of a binary search.

In this lesson, we'll implement our own hash table, and use it to store and look up data. There is a Python implementation of a hash table when you create a dictionary, but creating your own will help you understand the performance trade-offs. Additionally, you'll need to implement your own hash table in cases where you want additional performance, or where you're working across multiple machines. In cases like this, several machines may be processing data, and storing it to a centralized hash table. This requires custom logic not available in a default Python dictionary. Custom hash tables are also useful when you're tuning for different performance characteristics than the Python implementation.

To learn how to implement our own hash table without using the Python implementation, we'll be using a dataset of movie quotes during this lesson. The quotes are taken from the scripts of 1068 movies. The lines are either IMDB memorable quotes, or lines surrounding memorable quotes.

In addition to learning about hash tables for fast lookup and insertion times, you’ll get to apply what you’ve learned from within your browser; there's no need to use your own machine to do the exercises. The Python environment inside of this course includes answer-checking to ensure you've fully mastered each concept before learning the next.

Objectives

• Learn how a hash table and has functions work
• Learn how to implement a hash table in Python
• How to profile hash tables

Lesson Outline

1. Hash Tables
2. Hash Tables
3. Hash Functions
4. Fitting Values Into An Array
5. Creating A Hash Table
6. Hash Collisions
7. Retrieving Values From Hash Tables
8. Overwriting Values
9. Profiling Hash Tables
10. Profiling Array Length
11. Hash Tables Versus Lists
12. Takeaways