Python programming skills are critical for data engineering. But for many critical data analysis and processing tasks, using stock Python isn't the most efficient approach. That's where NumPy, a one of the most popular Python libraries for data work, comes in.

NumPy is popular for working with data primarily because it provides support for multi-dimensional arrays, and allows you to perform a wide variety of calculations on those arrays. 

In this course, you'll learn how to manipulate data using NumPy. You'll also learn why NumPy is so much more efficient than Python alone for this sort of work, particularly if you're working with large amounts of data (as data engineers often are).

As with all Dataquest courses, this is a fully interactive learning experience. As you learn about NumPy, you'll be writing and running real code in your browser to solve real data engineering problems.

By the end of this course, you will:

  • Learn how to manipulate n-dimensional arrays
  • Learn how to perform numeric calculations with n-dimensional arrays
  • Learn why NumPy is so much more efficient than pure Python

Learn NumPy for Data Engineering

Introduction to NumPy

Learn the basics of NumPy arrays and how they work with Python.

Arithmetic with NumPy Arrays

Learn to do efficient calculations using NumPy arrays

Broadcasting NumPy Arrays

Perform arithmetic operations between 1-dimensional arrays and 2-dimensional arrays and learn to change the shape of a ndarray.

Datasets and Boolean Indexing

Learn how different sorts of datasets work in NumPy and how to do Boolean Indexing to select the data you want.

NumPy Datatypes

Learn about NumPy's limitations and how to evaluate the memory consumption of a ndarray.