Optimizing Code Performance On Large Datasets
In our Optimizing Code Performance On Large Datasets course, you’ll learn how to improve the performance of your code by optimizing CPU and I/O performance, and learn how to parallelize your code for improved performance.
You'll learn concepts such as CPU and I/O bounds and how they limit your code performance. You'll also practice analyzing data and parallel and how multithreading can help you overcome the limits of CPU and I/O bounds.
Then you'll learn the difference between a process and a thread and why the Python GIL is. You'll also be exposed to a multiprocessing library in Python and analyzing a dataset of movie quotes while analyzing data with parallel processing.
At the end of the course, you'll complete a project using threads and processes to optimize code and analyze Wikipedia pages more quickly. This project is a chance for you to combine the skills you learned in this course and use parallel processing to analyze pages on the web. It would also make a great portfolio project to show off your data engineering and parallel processing skills.
By the end of this course, you'll be able to:
Learn how to Optimize Code Performance
CPU Bound Programs
Learn how to process data more quickly by being aware of CPU bounds.
I/O Bound Programs
Learn the difference between processes and threads, and when to use processes.
Overcoming the Limitations of Threads
Learn the difference between processes and threads and when to use processes.
Quickly Analyzing Data with Parallel Processing
Learn how to combine processes and threads to quickly analyze a dataset of movie quotes.
Analyzing Wikipedia Pages
Use threads and processes to analyze Wikipedia pages more quickly.