The Basics of Python For Loops: A Tutorial
When you're working with data in Python, for loops can be a powerful tool. But they can also be a little bit confusing when you're just starting out. In this tutorial, we're going to dive headfirst into for loops and learn how they can be used to do all sorts of interesting things when you're doing data cleaning or data analysis in Python.
Please check out our Comprehnsive Python Guide for a structured learning experience.
What Are For Loops?
In the context of most data science work, Python for loops are used to loop through an iterable object (like a list, tuple, set, etc.) and perform the same action for each entry. For example, a for loop would allow us to iterate through a list, performing the same action on each item in the list.
(An interable object, by the way, is any Python object we can iterate through, or "loop" through, and return a single element at a time. Lists, for example, are iterable and return a single list entry at a time, in the order entries are listed. Strings are iterable and return one character at a time, in the order the characters appear. Etc.)
You create a for loop by first defining the iterable object you'd like to loop through, and then defining the actions you'd like to perform on each item in that iterable object. For example, when iterating through a list, you first specify the list you'd like to iterate through, and then specify what action you'd like to perform on each list item.
Let's look at a quick example: if we had a list of names stored in Python, we could use a for loop to iterate through that list, printing each name until it reached the end. Below, we'll create our list of names, and then write a for loop that iterates through it, printing each entry on the list in sequence.
our_list = ['Lily', 'Brad', 'Fatima', 'Zining']
for name in our_list:
print(name)
Lily
Brad
Fatima
Zining
This code in this simple loop raises a question, though: where did the variable name
come from? We haven't defined it previously in our code! But because for loops iterate through lists, tuples, etc. in sequence, this variable can actually be called almost anything. Python will interpret any variable name we put in that spot as referring to each list entry in sequence as the loop executes.
So, in the code above:
name
points to'Lily'
on the first iteration of the loop...- ...then
'Brad'
on the second iteration of the loop... - ...and so on.
This will be the case regardless what we call that variable. So if, for example, we rewrite our code to replace name
with x
, we'll get the same exact result:
for x in our_list:
print(x)
Lily
Brad
Fatima
Zining
Note that this technique works with any iterable object. For example, strings are iterable, and we can use the same sort of for loop to iterate through each character in a string:
for letter in 'Lily':
print(letter)
L
i
l
y
Using For Loops with Lists of Lists
In actual data analysis work, it's unlikely that we're going to be working with short, simple lists like the one above, though. Generally, we'll have to work with data sets in a table format, with multiple rows and columns. This kind of data can be stored in Python as a list of lists, where each row of a table is stored as a list within the list of lists, and we can use for loops to iterate through these as well.
To learn how to do this, let's take a look at a more realistic scenario and explore this small data table that contains some US prices and US EPA range estimates for several electric cars.
vehicle | range | price |
---|---|---|
Tesla Model 3 LR | 310 | 49900 |
Hyundai Ioniq EV | 124 | 30315 |
Chevy Bolt | 238 | 36620 |
We can express this same data set as a list of lists, like so:
ev_data = [['vehicle', 'range', 'price'],
['Tesla Model 3 LR', '310', '49900'],
['Hyundai Ioniq EV', '124', '30315'],
['Chevy Bolt', '238', '36620']]
You may have noticed that in the list above, our range and price numbers are actually stored as strings rather than integers. It's not uncommon that you'll get data stored in this way, but for analysis, we'd want to convert those strings into integers so we can do some calculations with them. Let's use a for loop to interate through our list of lists, selecting the price
entry in each list and changing it from a string to an integer.
To do that, we need to do a few things. First, we need to skip the first row in our table, since those are the column names and we will get an error if we attempt to convert a non-numerical string like 'range'
into an integer. We can do this using list slicing to select each row after the first row using ev_data[1:]
. (If you need to brush up on this, or any other aspects of lists, check out our interactive course on Python programming fundamentals).
Then, we'll loop through the list of lists, and for each iteration we'll select the element in the range
column, which is the second column in our table. We'll assign the value found in this column to a variable called 'range'
. To do this, we'll use the index number 1
(in Python, the first entry in an iterable is at index 0
, the second entry is at index 1
, etc.).
Finally, we'll convert the range numbers to integers using Python's built-in 'int()
function, and replace the original strings with these integers in our data set.
for row in ev_data[1:]: # loop through each row in ev_data starting with row 2 (index 1)
ev_range = row[1] # each car's range is found in column 2 (index 1)
ev_range = int(ev_range) # convert each range number from a string to an integer
row[1] = ev_range # assign range, which is now an integer, back to index 1 in each row
print(ev_data)
[['vehicle', 'range', 'price'], ['Tesla Model 3 LR', 310, '49900'], ['Hyundai Ioniq EV', 124, '30315'], ['Chevy Bolt', 238, '36620']]
Now that we've got those values stored as integers, we can also use a for loop to do some calculations. Let's say, for example, that we want to figure out the average range of an EV on this list. We'd need to add the range numbers together, and then divide them by the total number of cars in our list.
Again, we can use a for loop to select the specific column we need within our data set. We'll start by creating a variable called total_range
where we can store the sum of the ranges. Then we'll write another for loop, again skipping the header row, and again identifying the second column (index 1) as the range value.
After that, all we need to do is add this value to total_range
within our for loop, and then calculate the value using total_range
divided by the number of cars after the loop has completed.
(Note that we'll calculate the number of cars by counting the length of our list, minus the header row, in the code below. With a list as short as ours, we could also simply divide by 3, since the number of cars is very easy to count, but that would break our calculation if additional car data was added to the list. For that reason, it's better to use len()
to calculate the length of our car list in code so that if additional entries are added to our data set in the future, we can re-run this code and it will still produce the correct answer.)
total_range = 0 # create a variable to store the total range number
for row in ev_data[1:]: # loop through each row in ev_data starting with row 2 (index 1)
ev_range = row[1] # each car's range is found in column 2 (index 1)
total_range += ev_range # add this number to the number stored in total_range
number_of_cars = len(ev_data[1:]) # calculate the length of our list, minus the header row
print(total_range / number_of_cars) # print the average range
224.0
Python for loops are powerful, and you can nest more complex instructions inside of them. To demonstrate this, let's repeat the above two steps for our 'price'
column, this time within a single For Loop.
total_price = 0 # create a variable to store the total range number
for row in ev_data[1:]: # loop through each row in ev_data starting with row 2 (index 1)
price = row[2] # each car's price is found in column 3 (index 2)
price = int(price) # convert each price number from a string to an integer
row[2] = price # assign price, which is now an integer, back to index 2 in each row
total_price += price # add each car's price to total_price
number_of_cars = len(ev_data[1:]) # calculate the length of our list, minus the header row
print(total_price / number_of_cars) # print the average price
38945.0
We can also nest other elements, like If Else statements and even other for loops, within for loops.
For example, imagine we wanted to find every car with a range of greater than 200 miles in our list. We can start by creating a new empty list to hold our long-range car data. Then, we'll use a for loop to iterate through ev_data
, the list of lists containing car data we created earlier, appending a car's row to our long-range list only if the its range value is above 200:
long_range_car_list = [] # creating a new list to store our long range car data
for row in ev_data[1:]: # iterate through ev_data, skipping the header row
ev_range = row[1] # assign the range number, which is at index 1 in the row, to the range variable
if ev_range > 200: # append the whole row to long-range list if range is higher than 200
long_range_car_list.append(row)
print(long_range_car_list)
[['Tesla Model 3 LR', 310, 49900], ['Chevy Bolt', 238, 36620]]
These operations would also be simple to perform by hand with such a tiny data set, of course. But these same techniques will work on data sets with thousands and thousands of rows, which can make cleaning, sorting, and analyzing huge datasets into very quick work.
Other Useful Techniques: Range, Break, and Continue
You can get a surprising amount of mileage out of for loops just by mastering the techniques described above, but let's dive even deeper and learn a few other things that may be helpful, even if you use them a bit less frequently in the context of data science work.
Range
For loops can be used in tandem with Python's range()
function to iterate through each number in a specified range. For example:
for x in range(5, 9):
print(x)
5
6
7
8
Note that Python doesn't include the maximum value of a range in the range count, which is why the number 9 doesn't appear above. If we wanted this code to count from 5 to 9 including 9, we'd need to change range(5, 9)
to range(5, 10)
:
for x in range(5, 10):
print(x)
5
6
7
8
9
If you only specify a single number in your range()
function, Python will treat that as the maximum value, and assign a default minimum value of zero:
for x in range(3):
print(x)
0
1
2
You can even add a third argument to the range()
function to specify that you'd like to count in increments of a specific number. As you can see above, the default value is 1, but if you add a third argument of 3, for example, you can use range()
with a for loop to count up in threes:
for x in range(0, 9, 3):
print(x)
0
3
6
Break
By default, a Python for loop will loop through each possible iteration of the interable object you've assigned it. Normally when we're using a for loop, that's fine, because we want to perform the same action on each item in our list (for example).
Sometimes, though, we may want to stop your loop if a certain condition is met. In that circumstance, the break
statement is useful. When used with an if statement inside of a for loop, break
allows us to break out of that loop before its conclusion.
Let's take a look at a quick example first, using the list of names we created earlier called our_list
):
for name in our_list:
break
print(name)
When we run this code, nothing is printed. That's because the break
statement comes before print(name)
in our for loop. When Python sees break
, it stops executing the for loop and code that appears after break
in the loop doesn't get run.
Let's add an if statement to this loop, so that we break out of the loop when Python gets to the name Zining:
for name in our_list:
if name == 'Zining':
break
print(name)
Lily
Brad
Fatima
Here, we can see that the name Zining wasn't printed. Here's what's happening with each loop iteration:
- Python checks to see if the first name is 'Zining'. It isn't, so it continues executing the code below our if statement, and prints the first name.
- Python checks to see if the second name is 'Zining'. It isn't, so it continues executing the code below our if statement, and prints the second name.
- Python checks to see if the third name is 'Zining'. It isn't, so it continues executing the code below our if statement, and prints the third name.
- Python checks to see if the fourth name is 'Zining'. It is, so
break
is executed and the for loop ends.
Let's return to the code we wrote for collecting long-range EV car data and work through one more example. We'll insert a break statement that stops the look as soon as it encounters the string 'Tesla'
:
long_range_car_list = [] # creating our empty long-range car list again
for row in ev_data[1:]: # iterate through ev_data as before looking for cars with a range > 200
ev_range = row[1]
if ev_range > 200:
long_range_car_list.append(row)
if 'Tesla' in row[0]: # but if 'Tesla' appears in the vehicle column, end the loop
break
print(long_range_car_list)
[['Tesla Model 3 LR', 310, 49900]]
In the code above, we can see that the Tesla was still added to long_range_car_list
, because we appended it to that list before the if statement where we used break
. The Chevy Bolt was not added to our list, because although it does have a range of more than 200 miles, break
ended the loop before Python reached the Chevy Bolt row.
(Remember, for loops execute in sequential order. If the Bolt was listed before the Tesla in our original data set, it would have been included in long_range_car_list
).
Continue
When we're looping through an iterable object like a list, we might also encounter situations where we'd like to skip a particular row or rows. For simple situations like skipping a header row, we can use list slicing, but if we want to skip rows based on more complex conditions, this quickly becomes impractical. Instead, we can use the continue
statement to skip a single iteration ("loop") of a for loop and move to the next.
When Python sees continue
while executing a for loop on a list, for example, it will stop at that point and move on to the next item on the list. Any code that comes below the continue
will not be executed.
Let's go back our list of names (our_names
) and use continue
with an if statement to end a loop iteration before printing if the name is 'Brad':
for name in our_list:
if name == 'Brad':
continue
print(name)
Lily
Fatima
Zining
Above, we can see that Brad's name was skipped, and the rest of the names in our list were printed in sequence. That illustrates the difference between break
and continue
in a nutshell:
break
ends the loop entirely. When Python executesbreak
, the for loop is over.continue
ends a specific iteration of the loop and moves to the next item in the list. When Python executescontinue
it moves immediately to the next loop iteration, but it does not end the loop entirely.
To get some more practice with continue
, let's make a list of short-range EVs, using continue
to take a slightly different approach. Instead of identifying the EVs with less than 200 miles of range, we'll write a for loop that adds every EV to our short-range list, but with a continue
statement before we append to the new list that runs if the range is greater than 200:
short_range_car_list = [] # creating our empty short-range car list
for row in ev_data[1:]: # iterate through ev_data as before
ev_range = row[1]
if ev_range > 200: # if the car has a range of > 200
continue # end the loop here; do not execute the code below, continue to the next row
short_range_car_list.append(row) # append the row to our short-range car list
print(short_range_car_list)
[['Hyundai Ioniq EV', 124, 30315]]
That's probably not the most efficient and readable way to create our short-range car list, but it does demonstrate how continue
works, so let's walk through precisely what's happening here.
On its first loop, Python is looking at the Tesla row. That car does have an EV range of more than 200 miles, so Python sees the if statement is true, and executes the continue
nested inside that if statement, which makes it immediately jump to the next row of ev_data
to begin its next loop.
On the second loop, Python is looking at the next row, which is the Hyundai row. That car has a range of under 200 miles, so Python sees that the conditional if statement is not met, and executes the rest of the code in the for loop, appending the Hyundai row to short_range_car_list
.
On the third and final loop, Python is looking at the Chevy row. That car has a range of more than 200 miles, which means the conditional if statement is true. Thus, Python once again executes the nested continue
, which concludes the loop and, since there are no more rows of data in our data set, ends the for loop entirely.
Additional Resources
Hopefully at this point, you're feeling comfortable with for loops in Python, and you have an idea of how they can be useful for common data science tasks like data cleaning, data preparation, and data analysis.
Ready to take the next step? Here are some additional resources to check out:
- Advanced Python For Loops Tutorial - Learn to use for loops with NumPy, Pandas, and other more advanced techniques in this "sequel" to this tutorial.
- Python tutorials — Our ever-expanding list of Python tutorials for data science.
- Data Science Courses — Take your studies to the next level with fully interactive programming, data science, and stats courses, right in your browser.
- Python's official documentation on For Loops - The official documentation doesn't go into as much depth as this tutorial, but it does review the basics of For Loops explain some related concepts like While Loops.
- Dataquest's Python Fundamentals for Data Science course - Our Python fundamentals course offers a from-scratch introduction to coding in Python for data science. It covers lists, loops, and a whole lot more, and you can code iteractively right from within your browser.
- Dataquest's Intermediate Python for Data Science course - When you feel like you've mastered For Loops and other core Python concepts, this is another interactive course that'll help you take your Python skills to the next level.
- Free Data Sets for Practice - Practice For Loops on your own by grabbing a free data set from one of these sources and applying your new skills to large, real-world data sets. The data sets in the first section (for data visualization) should work particularly well for practice projects since they should already be relatively clean.
Best of luck, and happy looping!