# Tutorial: How to Use the Apply Method in Pandas

The `apply()`

method is one of the most common methods of data preprocessing. It simplifies applying a function on each element in a pandas **Series** and each row or column in a pandas **DataFrame**. In this tutorial, we'll learn how to use the `apply()`

method in pandas — you'll need to know the fundamentals of Python and lambda functions. If you aren't familiar with these or need to brush up your Python skills, you might like to try our free Python Fundamentals course.

Let’s dive right in.

## Applying a Function on a Pandas Series

Series form the basis of pandas. They are essentially one-dimensional arrays with axis labels called indices.

There are different ways of creating a Series object (e.g., we can initialize a Series with lists or dictionaries). Let’s define a Series object with two lists containing student names as indices and their heights in centimeters as data:

```
import pandas as pd
import numpy as np
from IPython.display import display
students = pd.Series(data=[180, 175, 168, 190],
index=['Vik', 'Mehdi', 'Bella', 'Chriss'])
display(students)
print(type(students))
```

```
Vik 180
Mehdi 175
Bella 168
Chriss 190
dtype: int64
```

The code above returns the content of the `students`

object and its data type.

The data type of the `students`

object is *Series*, so we can apply any functions on its data using the `apply()`

method. Let’s see how we can convert the heights of the students from centimeters to feet:

```
def cm_to_feet(h):
return np.round(h/30.48, 2)
print(students.apply(cm_to_feet))
```

```
Vik 5.91
Mehdi 5.74
Bella 5.51
Chriss 6.23
dtype: float64
```

The students' heights are converted to feet with two decimal places. To do so, we first defined a function that does the conversion, then pass the function name without parentheses to the `apply()`

method. The `apply()`

method takes each element in the Series and applies the `cm_to_feet()`

function on it.

## Applying a Function on a Pandas DataFrame

In this section, we're going to learn how to use the `apply()`

method to manipulate columns and rows in a DataFrame.

First, let’s create a dummy DataFrame containing the personal details of a company’s employees using the following snippet:

```
data = pd.DataFrame({'EmployeeName': ['Callen Dunkley', 'Sarah Rayner', 'Jeanette Sloan', 'Kaycee Acosta', 'Henri Conroy', 'Emma Peralta', 'Martin Butt', 'Alex Jensen', 'Kim Howarth', 'Jane Burnett'],
'Department': ['Accounting', 'Engineering', 'Engineering', 'HR', 'HR', 'HR', 'Data Science', 'Data Science', 'Accounting', 'Data Science'],
'HireDate': [2010, 2018, 2012, 2014, 2014, 2018, 2020, 2018, 2020, 2012],
'Sex': ['M', 'F', 'F', 'F', 'M', 'F', 'M', 'M', 'M', 'F'],
'Birthdate': ['04/09/1982', '14/04/1981', '06/05/1997', '08/01/1986', '10/10/1988', '12/11/1992', '10/04/1991', '16/07/1995', '08/10/1992', '11/10/1979'],
'Weight': [78, 80, 66, 67, 90, 57, 115, 87, 95, 57],
'Height': [176, 160, 169, 157, 185, 164, 195, 180, 174, 165],
'Kids': [2, 1, 0, 1, 1, 0, 2, 0, 3, 1]
})
display(data)
```

EmployeeName | Department | HireDate | Sex | Birthdate | Weight | Height | Kids | |
---|---|---|---|---|---|---|---|---|

0 | Callen Dunkley | Accounting | 2010 | M | 04/09/1982 | 78 | 176 | 2 |

1 | Sarah Rayner | Engineering | 2018 | F | 14/04/1981 | 80 | 160 | 1 |

2 | Jeanette Sloan | Engineering | 2012 | F | 06/05/1997 | 66 | 169 | 0 |

3 | Kaycee Acosta | HR | 2014 | F | 08/01/1986 | 67 | 157 | 1 |

4 | Henri Conroy | HR | 2014 | M | 10/10/1988 | 90 | 185 | 1 |

5 | Emma Peralta | HR | 2018 | F | 12/11/1992 | 57 | 164 | 0 |

6 | Martin Butt | Data Science | 2020 | M | 10/04/1991 | 115 | 195 | 2 |

7 | Alex Jensen | Data Science | 2018 | M | 16/07/1995 | 87 | 180 | 0 |

8 | Kim Howarth | Accounting | 2020 | M | 08/10/1992 | 95 | 174 | 3 |

9 | Jane Burnett | Data Science | 2012 | F | 11/10/1979 | 57 | 165 | 1 |

**NOTE**

In this section, we'll work on dummy requests initiated by the company’s HR team. We'll learn how to use the `apply()`

method by going through different scenarios. We'll explore a new use case in each scenario and solve it using the `apply()`

method.

### Scenario 1

Let's assume that the HR team wants to send an invitation email that starts with a friendly greeting to all the employees (e.g., *Hey, Sarah!*). They asked you to create two columns for storing the employees' first and last names separately, making referring to the employees’ first names easy. To do so, we can use a lambda function that splits a string into a list after breaking it by the specified separator; the default separator character of the `split()`

method is any white space. Let's look at the code:

```
data['FirstName'] = data['EmployeeName'].apply(lambda x : x.split()[0])
data['LastName'] = data['EmployeeName'].apply(lambda x : x.split()[1])
display(data)
```

EmployeeName | Department | HireDate | Sex | Birthdate | Weight | Height | Kids | FirstName | LastName | |
---|---|---|---|---|---|---|---|---|---|---|

0 | Callen Dunkley | Accounting | 2010 | M | 04/09/1982 | 78 | 176 | 2 | Callen | Dunkley |

1 | Sarah Rayner | Engineering | 2018 | F | 14/04/1981 | 80 | 160 | 1 | Sarah | Rayner |

2 | Jeanette Sloan | Engineering | 2012 | F | 06/05/1997 | 66 | 169 | 0 | Jeanette | Sloan |

3 | Kaycee Acosta | HR | 2014 | F | 08/01/1986 | 67 | 157 | 1 | Kaycee | Acosta |

4 | Henri Conroy | HR | 2014 | M | 10/10/1988 | 90 | 185 | 1 | Henri | Conroy |

5 | Emma Peralta | HR | 2018 | F | 12/11/1992 | 57 | 164 | 0 | Emma | Peralta |

6 | Martin Butt | Data Science | 2020 | M | 10/04/1991 | 115 | 195 | 2 | Martin | Butt |

7 | Alex Jensen | Data Science | 2018 | M | 16/07/1995 | 87 | 180 | 0 | Alex | Jensen |

8 | Kim Howarth | Accounting | 2020 | M | 08/10/1992 | 95 | 174 | 3 | Kim | Howarth |

9 | Jane Burnett | Data Science | 2012 | F | 11/10/1979 | 57 | 165 | 1 | Jane | Burnett |

In the code above, we applied the lambda function on the `EmployeeName`

column, which is technically a Series object. The lambda function splits the employees' full names into first and last names. Thus, the code creates two more columns that contain the first and last names of employees.

### Scenario 2

Now, let's assume that the HR team wants to know every employee's age and the average age of the employees because they want to determine if an employee's age influences job satisfaction and work engagement.

To get the job done, the first step is to define a function that gets an employee's date of birth and returns their age:

```
from datetime import datetime, date
def calculate_age(birthdate):
birthdate = datetime.strptime(birthdate, '%d/%m/%Y').date()
today = date.today()
return today.year - birthdate.year - (today.month < birthdate.month)
```

The `calculate_age()`

function gets a person’s date of birth in a proper format and, after performing a simple calculation on it, returns their age.

The next step is to apply the function on the `Birthdate`

column of the DataFrame using the `apply()`

method, as follows:

```
data['Age'] = data['Birthdate'].apply(calculate_age)
display(data)
```

EmployeeName | Department | HireDate | Sex | Birthdate | Weight | Height | Kids | FirstName | LastName | Age | |
---|---|---|---|---|---|---|---|---|---|---|---|

0 | Callen Dunkley | Accounting | 2010 | M | 04/09/1982 | 78 | 176 | 2 | Callen | Dunkley | 39 |

1 | Sarah Rayner | Engineering | 2018 | F | 14/04/1981 | 80 | 160 | 1 | Sarah | Rayner | 40 |

2 | Jeanette Sloan | Engineering | 2012 | F | 06/05/1997 | 66 | 169 | 0 | Jeanette | Sloan | 24 |

3 | Kaycee Acosta | HR | 2014 | F | 08/01/1986 | 67 | 157 | 1 | Kaycee | Acosta | 36 |

4 | Henri Conroy | HR | 2014 | M | 10/10/1988 | 90 | 185 | 1 | Henri | Conroy | 33 |

5 | Emma Peralta | HR | 2018 | F | 12/11/1992 | 57 | 164 | 0 | Emma | Peralta | 29 |

6 | Martin Butt | Data Science | 2020 | M | 10/04/1991 | 115 | 195 | 2 | Martin | Butt | 30 |

7 | Alex Jensen | Data Science | 2018 | M | 16/07/1995 | 87 | 180 | 0 | Alex | Jensen | 26 |

8 | Kim Howarth | Accounting | 2020 | M | 08/10/1992 | 95 | 174 | 3 | Kim | Howarth | 29 |

9 | Jane Burnett | Data Science | 2012 | F | 11/10/1979 | 57 | 165 | 1 | Jane | Burnett | 42 |

The single-line statement above applies the `calculate_age()`

function on each element of the `Birthdate`

column and stores the returned values in the `Age`

column.

The last step is to calculate the average age of the employees, as follows:

`print(data['Age'].mean())`

`32.8`

### Scenario 3

The HR manager of the company is exploring options for healthcare coverage for all employees. Potential providers require information about the employees. Since the DataFrame contains the weight and height of each employee, let’s assume the HR manager asked you to provide a Body Mass Index (BMI) for every employee so she can get quotes from potential healthcare providers.

To do the task, first, we need to define a function that calculates the Body Mass Index (BMI). The formula for the BMI is weight in kilograms divided by height in meters squared. Because the employees’ heights are measured in centimeters, we need to divide the heights by 100 to obtain the heights in meters. Let’s implement the function:

```
def calc_bmi(weight, height):
return np.round(weight/(height/100)**2, 2)
```

The next step is to apply the function on the DataFrame:

`data['BMI'] = data.apply(lambda x: calc_bmi(x['Weight'], x['Height']), axis=1)`

The lambda function takes each row's weight and height values, then applies the `calc_bmi()`

function on them to calculate their BMIs. The `axis=1`

argument means to iterate over rows in the DataFrame.

`display(data)`

EmployeeName | Department | HireDate | Sex | Birthdate | Weight | Height | Kids | FirstName | LastName | Age | BMI | |
---|---|---|---|---|---|---|---|---|---|---|---|---|

0 | Callen Dunkley | Accounting | 2010 | M | 04/09/1982 | 78 | 176 | 2 | Callen | Dunkley | 39 | 25.18 |

1 | Sarah Rayner | Engineering | 2018 | F | 14/04/1981 | 80 | 160 | 1 | Sarah | Rayner | 40 | 31.25 |

2 | Jeanette Sloan | Engineering | 2012 | F | 06/05/1997 | 66 | 169 | 0 | Jeanette | Sloan | 24 | 23.11 |

3 | Kaycee Acosta | HR | 2014 | F | 08/01/1986 | 67 | 157 | 1 | Kaycee | Acosta | 36 | 27.18 |

4 | Henri Conroy | HR | 2014 | M | 10/10/1988 | 90 | 185 | 1 | Henri | Conroy | 33 | 26.30 |

5 | Emma Peralta | HR | 2018 | F | 12/11/1992 | 57 | 164 | 0 | Emma | Peralta | 29 | 21.19 |

6 | Martin Butt | Data Science | 2020 | M | 10/04/1991 | 115 | 195 | 2 | Martin | Butt | 30 | 30.24 |

7 | Alex Jensen | Data Science | 2018 | M | 16/07/1995 | 87 | 180 | 0 | Alex | Jensen | 26 | 26.85 |

8 | Kim Howarth | Accounting | 2020 | M | 08/10/1992 | 95 | 174 | 3 | Kim | Howarth | 29 | 31.38 |

9 | Jane Burnett | Data Science | 2012 | F | 11/10/1979 | 57 | 165 | 1 | Jane | Burnett | 42 | 20.94 |

The last step is to categorize the employees according to the BMI measurement. A BMI of less than 18.5 is Group One, between 18.5 and 24.9 is Group Two, between 25 and 29.9 is Group Three, and over 30 is Group Four. To implement the solution, we will define a function that returns the various BMI indicators, then apply it on the `BMI`

column of the DataFrame to see each employee falls into which category:

```
def indicator(bmi):
if (bmi < 18.5):
return 'Group One'
elif (18.5 <= bmi < 25):
return 'Group Two'
elif (25 <= bmi < 30):
return 'Group Three'
else:
return 'Group Four'
data['BMI_Indicator'] = data['BMI'].apply(indicator)
display(data)
```

EmployeeName | Department | HireDate | Sex | DoB | Weight | Height | Kids | FirstName | LastName | Age | BMI | BMI_Indicator | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|

0 | Callen Dunkley | Accounting | 2010 | M | 04/09/1982 | 78 | 176 | 2 | Callen | Dunkley | 39 | 25.18 | Group Three |

1 | Sarah Rayner | Engineering | 2018 | F | 14/04/1981 | 80 | 160 | 1 | Sarah | Rayner | 40 | 31.25 | Group Four |

2 | Jeanette Sloan | Engineering | 2012 | F | 06/05/1997 | 66 | 169 | 0 | Jeanette | Sloan | 24 | 23.11 | Group Two |

3 | Kaycee Acosta | HR | 2014 | F | 08/01/1986 | 67 | 157 | 1 | Kaycee | Acosta | 36 | 27.18 | Group Three |

4 | Henri Conroy | HR | 2014 | M | 10/10/1988 | 90 | 185 | 1 | Henri | Conroy | 33 | 26.30 | Group Three |

5 | Emma Peralta | HR | 2018 | F | 12/11/1992 | 57 | 164 | 0 | Emma | Peralta | 29 | 21.19 | Group Two |

6 | Martin Butt | Data Science | 2020 | M | 10/04/1991 | 115 | 195 | 2 | Martin | Butt | 30 | 30.24 | Group Four |

7 | Alex Jensen | Data Science | 2018 | M | 16/07/1995 | 87 | 180 | 0 | Alex | Jensen | 26 | 26.85 | Group Three |

8 | Kim Howarth | Accounting | 2020 | M | 08/10/1992 | 95 | 174 | 3 | Kim | Howarth | 29 | 31.38 | Group Four |

9 | Jane Burnett | Data Science | 2012 | F | 11/10/1979 | 57 | 165 | 1 | Jane | Burnett | 42 | 20.94 | Group Two |

### Scenario 4

Let’s assume the new year is around the corner and the company management has announced that those employees who have more than ten years of experience will get an extra bonus. The HR manager wants to know who is qualified to get the bonus.

To prepare the requested information, you need to apply the following lambda function on the `HireDate`

column, which returns `True`

if the difference between the current year and the hire year is greater than or equal to ten years otherwise `False`

.

```
mask = data['HireDate'].apply(lambda x: date.today().year - x >= 10)
print(mask)
```

```
0 True
1 False
2 True
3 False
4 False
5 False
6 False
7 False
8 False
9 True
Name: HireDate, dtype: bool
```

Running the code above creates a pandas Series that contains `True`

or `False`

values, called a Boolean mask.

To display the qualified employees, we use the Boolean mask to filter the DataFrame rows. Let’s run the following statement and see the result:

`display(data[mask])`

EmployeeName | Department | HireDate | Sex | DoB | Weight | Height | Kids | FirstName | LastName | Age | BMI | |
---|---|---|---|---|---|---|---|---|---|---|---|---|

0 | Callen Dunkley | Accounting | 2010 | M | 04/09/1982 | 78 | 176 | 2 | Callen | Dunkley | 39 | 25.18 |

2 | Jeanette Sloan | Engineering | 2012 | F | 06/05/1997 | 66 | 169 | 0 | Jeanette | Sloan | 24 | 23.11 |

9 | Jane Burnett | Data Science | 2012 | F | 11/10/1979 | 57 | 165 | 1 | Jane | Burnett | 42 | 20.94 |

### Scenario 5

Let’s assume that tomorrow is Mother’s Day, and the company has planned a Mother’s Day gift for all its female employees who have children. The HR team asked you to prepare a list of the employees who are eligible for the gift. To do the task, we need to write a simple lambda function that considers the `Sex`

and `Kids`

columns to provide the desired result, as follows:

`data[data.apply(lambda x: True if x ['Gender'] == 'F' and x['Kids'] > 0 else False, axis=1)]`

EmployeeName | Department | HireDate | Sex | Birthdate | Weight | Height | Kids | FirstName | LastName | Age | BMI | |
---|---|---|---|---|---|---|---|---|---|---|---|---|

1 | Sarah Rayner | Engineering | 2018 | F | 14/04/1981 | 80 | 160 | 1 | Sarah | Rayner | 40 | 31.25 |

3 | Kaycee Acosta | HR | 2014 | F | 08/01/1986 | 67 | 157 | 1 | Kaycee | Acosta | 36 | 27.18 |

9 | Jane Burnett | Data Science | 2012 | F | 11/10/1979 | 57 | 165 | 1 | Jane | Burnett | 42 | 20.94 |

Running the code above returns the list of employees who will receive the gifts.

The lambda function returns `True`

if a female employee has at least one child; otherwise, it returns `False`

. The result of applying the lambda function on the DataFrame is a Boolean mask that we directly used to filter the DataFrame’s rows.

## Conclusion

In this tutorial, we learned what the `apply()`

method does and how to use it by going through different examples. The apply() method is a powerful and efficient way to apply a function on every value of a Series or DataFrame in pandas. Since the `apply()`

method uses C extensions for Python, it performs faster when iterating through all the rows of a pandas DataFrame. However, it isn't a general rule as it's slower when performing the same operation through a column.