MISSION 168

CPU Bound Programs

In the previous course on large data sets in pandas, we covered the idea of memory limitations, and figured out some strategies for overcoming them with pandas. As a quick refresher, a memory limitation is when a dataset won't fit into the memory available on your computer. When this happens, we need to rely on workarounds — approaches like processing the data in batches or relying on tools like SQLite that keep the data on disk instead of in memory while doing processing.

In this course, we'll cover the idea of program bounds and how they affect code performance. A program bound is similar to a limitation in that it affects how you're able to process your data. However, a program bound isn't a hard limitation -- if your program is bound, your computer will still be able to eventually process the data. A program bound mostly limits how quickly the program can be executed.

In this lesson, we'll learn more about CPU-bound programs, and how we can understand and improve CPU performance. While learning more about CPU-bound programs, we'll be working with a data set of search terms and matching products.

As you work through understanding CPU performance, you’ll get to apply what you’ve learned from within your browser so that there's no need to use your own machine to do the exercises. The Python environment inside of this course includes answer checking so you can ensure that you've fully mastered each concept before learning the next concept.

Objectives

  • Learn about the difference between I/O and CPU bounds
  • Learn abouit CPU bounds and their impact on your code
  • Learn about Big-O notation for algorithm run times

Mission Outline

1. Bounds vs Limitations
2. The Dataset
3. Finding duplicate values
4. Big O notation
5. O(n^2)
6. Timing code runs
7. Stable time estimates
8. Refactoring
9. Alternate profiling strategies
10. Alternate profiling strategies
11. Practicing writing efficient algorithms
12. Big O Notation practice
13. Next Steps
14. Takeaways

improving-code-performance

Course Info:

Intermediate

The median completion time for this course is 4.7 hours. View Details

This course requires a premium subscription and includes four missions, and one guided project.  It is the fourth course in the Data Engineer path.

START LEARNING FREE

Take a Look Inside

MISSION:

CPU Bound Programs


Learn how to process data more quickly be being aware of CPU bounds.

​OBJECTIVES:

​Mission Outline

  • 1
    Bounds vs. Limitations
  • 2
    The Dataset
  • 3
    The Dataset
  • 4
    Finding duplicate values
  • 5
    Finding duplicate values
  • 6
    Big O notation
  • 7
    O(n^2)
  • 8
    Timing code runs
  • 9
    Stable time estimates
  • 10
    Refactoring
  • 11
    Alternate profiling strategies
  • 12
    Alternate profiling strategies
  • 13
    Practicing writing efficient algorithms
  • 14
    Big O Notation practice
  • 15
    Next Steps
  • 16
    Takeaways
profile-pic

Office hours made a big impact on helping me find a job. Having someone senior to guide you is very helpful. Before Dataquest, I was wasting time on the wrong things, it got me on the right track.

Dong Zhou Senior Software Developer at Schrodinger