Tutorial: Filtering Pandas DataFrames

By Mehdi Lotfinejad Updated March 6, 2023

Originally published February 24, 2022

The Pandas library is a fast, powerful, and easy-to-use tool for working with data. It helps us cleanse, explore, analyze, and visualize data by providing game-changing capabilities. Having data as a Pandas DataFrame allows us to slice and dice data in various ways and filter the DataFrame's rows effortlessly. This tutorial will go over the primary methods for getting data out of a DataFrame using different filtering techniques.

What is a DataFrame?

Before discussing different filtering DataFrames techniques, let’s review a DataFrame. Technically a DataFrame builds upon the Pandas Series object that a Series is a one-dimensional array of indexed data. Intuitively a DataFrame is similar to a spreadsheet in many ways; it can have multiple columns with varying types of data and rows labeled with row indices.

Creating a DataFrame

First let’s create a dummy DataFrame containing the personal details of a company’s employees using the following snippet:

import pandas as pd
import numpy as np
from IPython.display import display

data = pd.DataFrame({'EmployeeName': ['Callen Dunkley', 'Sarah Rayner', 'Jeanette Sloan','Kaycee Acosta', 'Henri Conroy', 'Emma Peralta','Martin Butt', 'Alex Jensen', 'Kim Howarth', 'Jane Burnett'],
                    'Department': ['Accounting', 'Engineering', 'Engineering', 'HR', 'HR','HR','Data Science','Data Science', 'Accounting', 'Data Science'],
                    'HireDate': [2010, 2018, 2012, 2014, 2014, 2018, 2020, 2018, 2020, 2012],
                    'Gender': ['M','F','F','F','M','F','M','M','M','F'],
                    'DoB':['04/09/1982', '14/04/1981','06/05/1997','08/01/1986','10/10/1988','12/11/1992', '10/04/1991','16/07/1995','08/10/1992','11/10/1979'],
                    'Weight':[78,80,66,67,90,57,115,87,95,57],
                    'Height':[176,160,169,157,185,164,195,180,174,165]
                  })
display(data)

    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}


  
      
      EmployeeName
      Department
      HireDate
      Gender
      DoB
      Weight
      Height
    

  
      0
      Callen Dunkley
      Accounting
      2010
      M
      04/09/1982
      78
      176
    

      1
      Sarah Rayner
      Engineering
      2018
      F
      14/04/1981
      80
      160
    

      2
      Jeanette Sloan
      Engineering
      2012
      F
      06/05/1997
      66
      169
    

      3
      Kaycee Acosta
      HR
      2014
      F
      08/01/1986
      67
      157
    

      4
      Henri Conroy
      HR
      2014
      M
      10/10/1988
      90
      185
    

      5
      Emma Peralta
      HR
      2018
      F
      12/11/1992
      57
      164
    

      6
      Martin Butt
      Data Science
      2020
      M
      10/04/1991
      115
      195
    

      7
      Alex Jensen
      Data Science
      2018
      M
      16/07/1995
      87
      180
    

      8
      Kim Howarth
      Accounting
      2020
      M
      08/10/1992
      95
      174
    

      9
      Jane Burnett
      Data Science
      2012
      F
      11/10/1979
      57
      165
    

	EmployeeName	Department	HireDate	Gender	DoB	Weight	Height
0	Callen Dunkley	Accounting	2010	M	04/09/1982	78	176
1	Sarah Rayner	Engineering	2018	F	14/04/1981	80	160
2	Jeanette Sloan	Engineering	2012	F	06/05/1997	66	169
3	Kaycee Acosta	HR	2014	F	08/01/1986	67	157
4	Henri Conroy	HR	2014	M	10/10/1988	90	185
5	Emma Peralta	HR	2018	F	12/11/1992	57	164
6	Martin Butt	Data Science	2020	M	10/04/1991	115	195
7	Alex Jensen	Data Science	2018	M	16/07/1995	87	180
8	Kim Howarth	Accounting	2020	M	08/10/1992	95	174
9	Jane Burnett	Data Science	2012	F	11/10/1979	57	165

Run the code above. It creates a DataFrame with the above data. Let’s take a look at the columns’ data types by running the following statement:

print(data.dtypes)

EmployeeName    object
Department      object
HireDate         int64
Gender          object
DoB             object
Weight           int64
Height           int64
dtype: object

The Pandas data type for string values is object that is why the data type of columns like EmployeeName or Department containing strings is object. The DataFrame's columns each have a specific data type; in other words, the DataFrame's columns don't share the same data type. The other useful DataFrame property is .values, which returns an array of lists, each list representing a DataFrame’s row.

print(data.values)

[['Callen Dunkley' 'Accounting' 2010 'M' '04/09/1982' 78 176]
 ['Sarah Rayner' 'Engineering' 2018 'F' '14/04/1981' 80 160]
 ['Jeanette Sloan' 'Engineering' 2012 'F' '06/05/1997' 66 169]
 ['Kaycee Acosta' 'HR' 2014 'F' '08/01/1986' 67 157]
 ['Henri Conroy' 'HR' 2014 'M' '10/10/1988' 90 185]
 ['Emma Peralta' 'HR' 2018 'F' '12/11/1992' 57 164]
 ['Martin Butt' 'Data Science' 2020 'M' '10/04/1991' 115 195]
 ['Alex Jensen' 'Data Science' 2018 'M' '16/07/1995' 87 180]
 ['Kim Howarth' 'Accounting' 2020 'M' '08/10/1992' 95 174]
 ['Jane Burnett' 'Data Science' 2012 'F' '11/10/1979' 57 165]]

If you’re interested to know the number of rows and columns of the DataFrame, you can use the .shape property as follows:

print(data.shape)

(10, 7)

The tuple above represents the number of rows and columns, respectively.

Slicing and Dicing a DataFrame from Every Possible Angle

So far, we’ve created and inspected a DataFrame. As mentioned earlier, there are different ways for filtering a DataFrame. In this section, we will discuss the most prevalent and efficient methods. However, it would be better to look at how we can select specific columns, first. To grab the entire of specific columns, we can use the following syntax:

DataFrame.ColumnName  # if the column name does not contain any spaces
OR 
DataFrame["ColumnName"]

For example, let’s grab the Department column of the DataFrame.

display(data.Department)

0      Accounting
1     Engineering
2     Engineering
3              HR
4              HR
5              HR
6    Data Science
7    Data Science
8      Accounting
9    Data Science
Name: Department, dtype: object

Running the statement above returns the Series containing the Department column’s data.

Also, we can select multiple columns by passing a list of column names, as follows:

display(data[['EmployeeName','Department','Gender']])

    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}


  
      
      EmployeeName
      Department
      Gender
    

  
      0
      Callen Dunkley
      Accounting
      M
    

      1
      Sarah Rayner
      Engineering
      F
    

      2
      Jeanette Sloan
      Engineering
      F
    

      3
      Kaycee Acosta
      HR
      F
    

      4
      Henri Conroy
      HR
      M
    

      5
      Emma Peralta
      HR
      F
    

      6
      Martin Butt
      Data Science
      M
    

      7
      Alex Jensen
      Data Science
      M
    

      8
      Kim Howarth
      Accounting
      M
    

      9
      Jane Burnett
      Data Science
      F
    

	EmployeeName	Department	Gender
0	Callen Dunkley	Accounting	M
1	Sarah Rayner	Engineering	F
2	Jeanette Sloan	Engineering	F
3	Kaycee Acosta	HR	F
4	Henri Conroy	HR	M
5	Emma Peralta	HR	F
6	Martin Butt	Data Science	M
7	Alex Jensen	Data Science	M
8	Kim Howarth	Accounting	M
9	Jane Burnett	Data Science	F

Running the statement above gives us a DataFrame that is a subset of the original DataFrame.

The `.loc` and `.iloc` Methods

We can use slicing techniques to extract specific rows from a DataFrame. The best way to slice a DataFrame and select particular columns is to use the .loc and .iloc methods. The twin methods create a subset of a DataFrame using label-based or integer-based indexing, respectively.

DataFrame.loc[row_indexer,column_indexer]
DataFrame.iloc[row_indexer,column_indexer]

The first position in both methods specifies the row indexer, and the second specifies the column indexer separated with a comma. Before we want to practice working with the .loc and .iloc methods, let’s set the EmployeeName column as the index of the DataFrame. Doing this is not required but helps us discuss all the capabilities of these methods. To do so, use the set_index() method to make the EmployeeName column the index of the DataFrame:

data = data.set_index('EmployeeName')
display(data)

    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}


  
      
      Department
      HireDate
      Gender
      DoB
      Weight
      Height
    

      EmployeeName
      
      
      
      
      
      
    

  
      Callen Dunkley
      Accounting
      2010
      M
      04/09/1982
      78
      176
    

      Sarah Rayner
      Engineering
      2018
      F
      14/04/1981
      80
      160
    

      Jeanette Sloan
      Engineering
      2012
      F
      06/05/1997
      66
      169
    

      Kaycee Acosta
      HR
      2014
      F
      08/01/1986
      67
      157
    

      Henri Conroy
      HR
      2014
      M
      10/10/1988
      90
      185
    

      Emma Peralta
      HR
      2018
      F
      12/11/1992
      57
      164
    

      Martin Butt
      Data Science
      2020
      M
      10/04/1991
      115
      195
    

      Alex Jensen
      Data Science
      2018
      M
      16/07/1995
      87
      180
    

      Kim Howarth
      Accounting
      2020
      M
      08/10/1992
      95
      174
    

      Jane Burnett
      Data Science
      2012
      F
      11/10/1979
      57
      165
    

	Department	HireDate	Gender	DoB	Weight	Height
Callen Dunkley	Accounting	2010	M	04/09/1982	78	176
Sarah Rayner	Engineering	2018	F	14/04/1981	80	160
Jeanette Sloan	Engineering	2012	F	06/05/1997	66	169
Kaycee Acosta	HR	2014	F	08/01/1986	67	157
Henri Conroy	HR	2014	M	10/10/1988	90	185
Emma Peralta	HR	2018	F	12/11/1992	57	164
Martin Butt	Data Science	2020	M	10/04/1991	115	195
Alex Jensen	Data Science	2018	M	16/07/1995	87	180
Kim Howarth	Accounting	2020	M	08/10/1992	95	174
Jane Burnett	Data Science	2012	F	11/10/1979	57	165

Now, instead of having integer-based indexes, we have label-based indexes, which makes it more apt to try all the capabilities of .loc and .iloc methods for slicing and dicing the DataFrame.

NOTE

The .loc method is label-based, so you have to specify rows and columns based on their labels. On the other hand, the iloc method is integer index-based, so you have to select rows and columns by their integer index.

The following statements return the identical results practically:

display(data.loc[['Henri Conroy', 'Kim Howarth']])
display(data.iloc[[4,8]])

    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

      Department
      HireDate
      Gender
      DoB
      Weight
      Height
    
      EmployeeName
      
      Henri Conroy
      HR
      2014
      M
      10/10/1988
      90
      185
    
      Kim Howarth
      Accounting
      2020
      M
      08/10/1992
      95
      174

	Department	HireDate	Gender	DoB	Weight	Height
Henri Conroy	HR	2014	M	10/10/1988	90	185
Kim Howarth	Accounting	2020	M	08/10/1992	95	174

    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

      Department
      HireDate
      Gender
      DoB
      Weight
      Height
    
      EmployeeName
      
      Henri Conroy
      HR
      2014
      M
      10/10/1988
      90
      185
    
      Kim Howarth
      Accounting
      2020
      M
      08/10/1992
      95
      174

	Department	HireDate	Gender	DoB	Weight	Height
Henri Conroy	HR	2014	M	10/10/1988	90	185
Kim Howarth	Accounting	2020	M	08/10/1992	95	174

In the above code, the .loc method gets a list of the employee names as row labels and returns a DataFrame containing the rows associated with the two employees. The .iloc method does the same thing, but instead of getting the labeled indexes, it gets the integer indexes of the rows. Now, let’s extract a sub-DataFrame that contains the name, department, and gender values of the first three employees using the .loc and .iloc methods:

display(data.iloc[:3,[0,2]])
display(data.loc['Callen Dunkley':'Jeanette Sloan',['Department','Gender']])

    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

      Department
      Gender
    
      EmployeeName
      
      Callen Dunkley
      Accounting
      M
    
      Sarah Rayner
      Engineering
      F
    
      Jeanette Sloan
      Engineering
      F

	Department	Gender
Callen Dunkley	Accounting	M
Sarah Rayner	Engineering	F
Jeanette Sloan	Engineering	F

    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

      Department
      Gender
    
      EmployeeName
      
      Callen Dunkley
      Accounting
      M
    
      Sarah Rayner
      Engineering
      F
    
      Jeanette Sloan
      Engineering
      F

	Department	Gender
Callen Dunkley	Accounting	M
Sarah Rayner	Engineering	F
Jeanette Sloan	Engineering	F

Running the statements above returns the identical results as follows:

EmployeeName     Department Gender                      
Callen Dunkley   Accounting      M
Sarah Rayner    Engineering      F
Jeanette Sloan  Engineering      F

The .loc and .iloc methods are pretty similar; the only difference is how referring to columns and rows in a DataFrame. The following statements return all the rows of the DataFrame:

display(data.iloc[:,[0,2]])
display(data.loc[:,['Department','Gender']])

    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

      Department
      Gender
    
      EmployeeName
      
      Callen Dunkley
      Accounting
      M
    
      Sarah Rayner
      Engineering
      F
    
      Jeanette Sloan
      Engineering
      F
    
      Kaycee Acosta
      HR
      F
    
      Henri Conroy
      HR
      M
    
      Emma Peralta
      HR
      F
    
      Martin Butt
      Data Science
      M
    
      Alex Jensen
      Data Science
      M
    
      Kim Howarth
      Accounting
      M
    
      Jane Burnett
      Data Science
      F

	Department	Gender
Callen Dunkley	Accounting	M
Sarah Rayner	Engineering	F
Jeanette Sloan	Engineering	F
Kaycee Acosta	HR	F
Henri Conroy	HR	M
Emma Peralta	HR	F
Martin Butt	Data Science	M
Alex Jensen	Data Science	M
Kim Howarth	Accounting	M
Jane Burnett	Data Science	F

    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

      Department
      Gender
    
      EmployeeName
      
      Callen Dunkley
      Accounting
      M
    
      Sarah Rayner
      Engineering
      F
    
      Jeanette Sloan
      Engineering
      F
    
      Kaycee Acosta
      HR
      F
    
      Henri Conroy
      HR
      M
    
      Emma Peralta
      HR
      F
    
      Martin Butt
      Data Science
      M
    
      Alex Jensen
      Data Science
      M
    
      Kim Howarth
      Accounting
      M
    
      Jane Burnett
      Data Science
      F

	Department	Gender
Callen Dunkley	Accounting	M
Sarah Rayner	Engineering	F
Jeanette Sloan	Engineering	F
Kaycee Acosta	HR	F
Henri Conroy	HR	M
Emma Peralta	HR	F
Martin Butt	Data Science	M
Alex Jensen	Data Science	M
Kim Howarth	Accounting	M
Jane Burnett	Data Science	F

We’re able to do slicing for columns too. The statement below returns the Weight and Height columns of every second row of the DataFrame:

display(data.loc[::2,'Weight':'Height'])
display(data.iloc[::2,4:6])

    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

      Weight
      Height
    
      EmployeeName
      
      Callen Dunkley
      78
      176
    
      Jeanette Sloan
      66
      169
    
      Henri Conroy
      90
      185
    
      Martin Butt
      115
      195
    
      Kim Howarth
      95
      174

	Weight	Height
Callen Dunkley	78	176
Jeanette Sloan	66	169
Henri Conroy	90	185
Martin Butt	115	195
Kim Howarth	95	174

    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

      Weight
      Height
    
      EmployeeName
      
      Callen Dunkley
      78
      176
    
      Jeanette Sloan
      66
      169
    
      Henri Conroy
      90
      185
    
      Martin Butt
      115
      195
    
      Kim Howarth
      95
      174

	Weight	Height
Callen Dunkley	78	176
Jeanette Sloan	66	169
Henri Conroy	90	185
Martin Butt	115	195
Kim Howarth	95	174

We perform slicing on the DataFrame’s rows and columns in the code above. The syntax is similar to slicing a list in Python. We need to specify the start index, the stop index, and the step size.

EmployeeName    Weight  Height                  
Callen Dunkley      78     176
Jeanette Sloan      66     169
Henri Conroy        90     185
Martin Butt        115     195
Kim Howarth         95     174

Notice that in slicing with labels using the .loc method, both the start and the stop labels are included. However, in slicing using .iloc method the stop index is excluded.

Filtering a DataFrame in Pandas

So far, we’ve learned how to slice and dice a DataFrame, but how can we take data that meets specific criteria? Using Boolean masks is the common method for filtering a DataFrame in Pandas. First let’s see what is a Boolean mask:

print(data.HireDate > 2015)

EmployeeName
Callen Dunkley    False
Sarah Rayner       True
Jeanette Sloan    False
Kaycee Acosta     False
Henri Conroy      False
Emma Peralta       True
Martin Butt        True
Alex Jensen        True
Kim Howarth        True
Jane Burnett      False
Name: HireDate, dtype: bool

By running the code above, we can see which entries in the HireDate column are greater than 2015. The Boolean values indicate whether the condition is met or not. Now, we can use the boolean mask to get the subset of the DataFrame containing the employees who are hired after 2015:

display(data[data.HireDate > 2015])

    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}


  
      
      Department
      HireDate
      Gender
      DoB
      Weight
      Height
    

      EmployeeName
      
      
      
      
      
      
    

  
      Sarah Rayner
      Engineering
      2018
      F
      14/04/1981
      80
      160
    

      Emma Peralta
      HR
      2018
      F
      12/11/1992
      57
      164
    

      Martin Butt
      Data Science
      2020
      M
      10/04/1991
      115
      195
    

      Alex Jensen
      Data Science
      2018
      M
      16/07/1995
      87
      180
    

      Kim Howarth
      Accounting
      2020
      M
      08/10/1992
      95
      174
    

	Department	HireDate	Gender	DoB	Weight	Height
Sarah Rayner	Engineering	2018	F	14/04/1981	80	160
Emma Peralta	HR	2018	F	12/11/1992	57	164
Martin Butt	Data Science	2020	M	10/04/1991	115	195
Alex Jensen	Data Science	2018	M	16/07/1995	87	180
Kim Howarth	Accounting	2020	M	08/10/1992	95	174

We also can use the .loc method to filter the rows along with returning certain columns, as follows:

display(data.loc[data.HireDate > 2015, ['Department', 'HireDate']])

    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

      Department
      HireDate
    
      EmployeeName
      
      Sarah Rayner
      Engineering
      2018
    
      Emma Peralta
      HR
      2018
    
      Martin Butt
      Data Science
      2020
    
      Alex Jensen
      Data Science
      2018
    
      Kim Howarth
      Accounting
      2020

	Department	HireDate
Sarah Rayner	Engineering	2018
Emma Peralta	HR	2018
Martin Butt	Data Science	2020
Alex Jensen	Data Science	2018
Kim Howarth	Accounting	2020

We're able to combine two or more criteria using logical operators. Let's list those male employees working in Data Science team:

display(data.loc[(data.Gender == "M") & (data.Department=='Data Science'), 
                ['Department', 'Gender', 'HireDate']]
        )

    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

      Department
      Gender
      HireDate
    
      EmployeeName
      
      Martin Butt
      Data Science
      M
      2020
    
      Alex Jensen
      Data Science
      M
      2018

	Department	Gender	HireDate
Martin Butt	Data Science	M	2020
Alex Jensen	Data Science	M	2018

What if we want to get all employees with heights between 170 cm and 190 cm?

display(data.loc[data.Height.between(170,190), 
                ['Department', 'Gender', 'Height']]
        )

    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

      Department
      Gender
      Height
    
      EmployeeName
      
      Callen Dunkley
      Accounting
      M
      176
    
      Henri Conroy
      HR
      M
      185
    
      Alex Jensen
      Data Science
      M
      180
    
      Kim Howarth
      Accounting
      M
      174

	Department	Gender	Height
Callen Dunkley	Accounting	M	176
Henri Conroy	HR	M	185
Alex Jensen	Data Science	M	180
Kim Howarth	Accounting	M	174

In the between() method both ends of the range are inclusive. The other helpful method is isin(), which creates a Boolean mask for values contained in a list of specified values. Let’s try the method to retrieve the employees who are working in either Human Resources or Accounting departments:

display(data.loc[data.Department.isin(['HR', 'Accounting']),['Department', 'Gender', 'HireDate']])

    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

      Department
      Gender
      HireDate
    
      EmployeeName
      
      Callen Dunkley
      Accounting
      M
      2010
    
      Kaycee Acosta
      HR
      F
      2014
    
      Henri Conroy
      HR
      M
      2014
    
      Emma Peralta
      HR
      F
      2018
    
      Kim Howarth
      Accounting
      M
      2020

	Department	Gender	HireDate
Callen Dunkley	Accounting	M	2010
Kaycee Acosta	HR	F	2014
Henri Conroy	HR	M	2014
Emma Peralta	HR	F	2018
Kim Howarth	Accounting	M	2020

To filter a DataFrame, we can use regular expressions to search for a pattern rather than the exact content. The following example shows how we can use a regular expression to get those employees whose last names end with either ‘tt’ or ‘th’:

display(data.loc[data.index.str.contains(r'tt$|th$'),
                ['Department', 'Gender']])

    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}

      Department
      Gender
    
      EmployeeName
      
      Martin Butt
      Data Science
      M
    
      Kim Howarth
      Accounting
      M
    
      Jane Burnett
      Data Science
      F

	Department	Gender
Martin Butt	Data Science	M
Kim Howarth	Accounting	M
Jane Burnett	Data Science	F

Conclusion

In this tutorial, we discussed different ways of slicing and dicing a DataFrame and filtering rows with specific criteria. we addressed how to take subsets of data through selecting columns, slicing rows, and filtering based on conditions, methods or regular expressions.

Tutorial: Filtering Pandas DataFrames

What is a DataFrame?

Creating a DataFrame

Slicing and Dicing a DataFrame from Every Possible Angle

The `.loc` and `.iloc` Methods

Filtering a DataFrame in Pandas

Conclusion

Related Topics

Polars Tutorial: Faster DataFrame Analysis in Python

How to Use Jupyter Notebook: A Beginner’s Tutorial

Online Python Practice Exercises for 2026

Join 1M+ data learners on Dataquest.

Create a free account

Choose a learning path

Complete exercises and projects

Advance your career

Start learning today

Tutorial: Filtering Pandas DataFrames

What is a DataFrame?

Creating a DataFrame

Slicing and Dicing a DataFrame from Every Possible Angle

The .loc and .iloc Methods

Filtering a DataFrame in Pandas

Conclusion

Related Topics

Polars Tutorial: Faster DataFrame Analysis in Python

How to Use Jupyter Notebook: A Beginner’s Tutorial

Online Python Practice Exercises for 2026

Join 1M+ data learners on Dataquest.

Create a free account

Choose a learning path

Complete exercises and projects

Advance your career

Start learning today

The `.loc` and `.iloc` Methods