Introduction to MapReduce

In the last lesson on I/O bound programs, we learned about I/O bounds, threads, and locks. We went through a few exercises where we turned code into its threaded equivalent to see if we gained any speed. Unfortunately, even when we ran multiple threads, we didn't see the performance gains we intuitively expected. We also learned about the idea of thread safety -- how if two threads write to the same resource at the same time, they can cause conflicts. We saw an example of this when multiple threads wrote to the system standard output at the same time. This caused issues with output appearing out of order, or newlines not showing up after a string.

In this lesson, we will be discussing how to overcome these limitations of threads to improve coder performance. We will be covering concepts such as the Global Interpreter Lock, multiprocessing, and more. In this lesson, we'll be analyzing 7,000 emails that the US State Department released from Hillary Clinton's private email server. 

As you work through overcoming the limitations of threads, you’ll get to apply what you’ve learned from within your browser so that there's no need to use your own machine to do the exercises. The Python environment inside of this course includes answer checking so you can ensure that you've fully mastered each concept before learning the next concept.


  • Learn what the difference is between a process and a thread.
  • Learn what the Python GIL is and how it works
  • How to use a multiprocessing library in Python

Lesson Outline

1. The GIL
2. Python Interpreters
3. Clinton Emails
4. How The GIL Works
5. Processes
6. Multiprocessing
7. Multiple Cores
8. Inter-Process Communication
9. Worker Pools
10. Deadlocks
11. Next Steps
12. Takeaways

Take a Look Inside