In the previous NumPy lesson, we learned how to use NumPy and vectorized operations to analyze taxi trip data from the city of New York. We learned that NumPy makes it quick and easy to select data, and includes a number of functions and methods that make it easy to calculate statistics across the different axes (or dimensions).

However, what if we also wanted to find out how many trips were taken in each month? Or which airport is the busiest? For this, we will learn a new technique: Boolean indexing. Boolean indexing allows you to filter a DataFrame based on a given condition using a Boolean vector or Boolean mask comprised of either true or false values.

In addition to learning about Boolean indexing and Boolean masks, you'll also learn about Boolean arrays as well as other NumPy concepts.

As you learn these concepts, you will continue to analyze New York City taxi trip data. At the end of this lesson, you will use what you've learned to find out which is the most popular airport in New York City. You will also calculate summary statistics for taxi trips using a clean data set.

Objectives

  • Learn to create boolean arrays based on data values.
  • Learn to use boolean arrays to select specific rows and columns.
  • Learn to use boolean indexing to perform data analysis.

Lesson Outline

1. Reading CSV files with NumPy
2. Reading CSV files with NumPy Continued
3. Boolean Arrays
4. Boolean Indexing with 1D ndarrays
5. Boolean Indexing with 2D ndarrays
6. Assigning Values in ndarrays
7. Assignment Using Boolean Arrays
8. Assignment Using Boolean Arrays Continued
9. Challenge: Which is the most popular airport?
10. Challenge: Calculating Statistics for Trips on Clean Data
11. Next Steps
12. Takeaways