Boolean Indexing with NumPy

In the previous NumPy lesson, we learned how to use NumPy and vectorized operations to analyze taxi trip data from the city of New York. We learned that NumPy makes it quick and easy to select data, and includes a number of functions and methods that make it easy to calculate statistics across the different axes (or dimensions).

However, what if we also wanted to find out how many trips were taken in each month? Or which airport is the busiest? For this, we will learn a new technique: Boolean indexing. Boolean indexing allows you to filter a DataFrame based on a given condition using a Boolean vector or Boolean mask comprised of either true or false values.

In addition to learning about Boolean indexing and Boolean masks, you'll also learn about Boolean arrays as well as other NumPy concepts.

As you learn these concepts, you will continue to analyze New York City taxi trip data. At the end of this mission, you will use what you've learned to find out which is the most popular airport in New York City. You will also calculate summary statistics for taxi trips using a clean data set.


  • Learn to create boolean arrays based on data values.
  • Learn to use boolean arrays to select specific rows and columns.
  • Learn to use boolean indexing to perform data analysis.

Mission Outline

1. Reading CSV files with NumPy
2. Reading CSV files with NumPy Continued
3. Boolean Arrays
4. Boolean Indexing with 1D ndarrays
5. Boolean Indexing with 2D ndarrays
6. Assigning Values in ndarrays
7. Assignment Using Boolean Arrays
8. Assignment Using Boolean Arrays Continued
9. Challenge: Which is the most popular airport?
10. Challenge: Calculating Statistics for Trips on Clean Data
11. Next Steps
12. Takeaways


Course Info:


The median completion time for this course is 6.77 hours. View Details

This course includes five missions and one guided project.  It is the third course in the Data Analyst in Python path and Data Scientist in Python path.


Take a Look Inside