When you’re working with data in Python, for loops can be a powerful tool. But they can also be a little bit confusing when you’re just starting out. In this tutorial, we’re going to dive headfirst into for loops and learn how they can be used to do all sorts of interesting things when you’re doing data cleaning or data analysis in Python.
This tutorial is for Python beginners, but if you’ve never written a line of code before, you may want to start out by working through the beginning of our free-to-start Python Fundamentals course, as we won’t be covering basic syntax here.
What Are For Loops?
In the context of most data science work, Python for loops are used to loop through an iterable object (like a list, tuple, set, etc.) and perform the same action for each entry. For example, a for loop would allow us to iterate through a list, performing the same action on each item in the list.
(An interable object, by the way, is any Python object we can iterate through, or “loop” through, and return a single element at a time. Lists, for example, are iterable and return a single list entry at a time, in the order entries are listed. Strings are iterable and return one character at a time, in the order the characters appear. Etc.)
You create a for loop by first defining the iterable object you’d like to loop through, and then defining the actions you’d like to perform on each item in that iterable object. For example, when iterating through a list, you first specify the list you’d like to iterate through, and then specify what action you’d like to perform on each list item.
Let’s look at a quick example: if we had a list of names stored in Python, we could use a for loop to iterate through that list, printing each name until it reached the end. Below, we’ll create our list of names, and then write a for loop that iterates through it, printing each entry on the list in sequence.
our_list = ['Lily', 'Brad', 'Fatima', 'Zining'] for name in our_list: print(name)
Lily Brad Fatima Zining
This code in this simple loop raises a question, though: where did the variable
name come from? We haven’t defined it previously in our code! But because for loops iterate through lists, tuples, etc. in sequence, this variable can actually be called almost anything. Python will interpret any variable name we put in that spot as referring to each list entry in sequence as the loop executes.
So, in the code above:
'Lily'on the first iteration of the loop…
'Brad'on the second iteration of the loop…
- …and so on.
This will be the case regardless what we call that variable. So if, for example, we rewrite our code to replace
x, we’ll get the same exact result:
for x in our_list: print(x)
Lily Brad Fatima Zining
Note that this technique works with any iterable object. For example, strings are iterable, and we can use the same sort of for loop to iterate through each character in a string:
for letter in 'Lily': print(letter)
L i l y
Using For Loops with Lists of Lists
In actual data analysis work, it’s unlikely that we’re going to be working with short, simple lists like the one above, though. Generally, we’ll have to work with data sets in a table format, with multiple rows and columns. This kind of data can be stored in Python as a list of lists, where each row of a table is stored as a list within the list of lists, and we can use for loops to iterate through these as well.
To learn how to do this, let’s take a look at a more realistic scenario and explore this small data table that contains some US prices and US EPA range estimates for several electric cars.
|Tesla Model 3 LR||310||49900|
|Hyundai Ioniq EV||124||30315|
We can express this same data set as a list of lists, like so:
ev_data = [['vehicle', 'range', 'price'], ['Tesla Model 3 LR', '310', '49900'], ['Hyundai Ioniq EV', '124', '30315'], ['Chevy Bolt', '238', '36620']]
You may have noticed that in the list above, our range and price numbers are actually stored as strings rather than integers. It’s not uncommon that you’ll get data stored in this way, but for analysis, we’d want to convert those strings into integers so we can do some calculations with them. Let’s use a for loop to interate through our list of lists, selecting the
price entry in each list and changing it from a string to an integer.
To do that, we need to do a few things. First, we need to skip the first row in our table, since those are the column names and we will get an error if we attempt to convert a non-numerical string like
'range' into an integer. We can do this using list slicing to select each row after the first row using
ev_data[1:]. (If you need to brush up on this, or any other aspects of lists, check out our interactive course on Python programming fundamentals).
Then, we’ll loop through the list of lists, and for each iteration we’ll select the element in the
range column, which is the second column in our table. We’ll assign the value found in this column to a variable called
'range'. To do this, we’ll use the index number
1 (in Python, the first entry in an iterable is at index
0, the second entry is at index
Finally, we’ll convert the range numbers to integers using Python’s built-in
'int() function, and replace the original strings with these integers in our data set.
for row in ev_data[1:]: # loop through each row in ev_data starting with row 2 (index 1) ev_range = row # each car's range is found in column 2 (index 1) ev_range = int(ev_range) # convert each range number from a string to an integer row = ev_range # assign range, which is now an integer, back to index 1 in each row print(ev_data)
[['vehicle', 'range', 'price'], ['Tesla Model 3 LR', 310, '49900'], ['Hyundai Ioniq EV', 124, '30315'], ['Chevy Bolt', 238, '36620']]
Now that we’ve got those values stored as integers, we can also use a for loop to do some calculations. Let’s say, for example, that we want to figure out the average range of an EV on this list. We’d need to add the range numbers together, and then divide them by the total number of cars in our list.
Again, we can use a for loop to select the specific column we need within our data set. We’ll start by creating a variable called
total_range where we can store the sum of the ranges. Then we’ll write another for loop, again skipping the header row, and again identifying the second column (index 1) as the range value.
After that, all we need to do is add this value to
total_range within our for loop, and then calculate the value using
total_range divided by the number of cars after the loop has completed.
(Note that we’ll calculate the number of cars by counting the length of our list, minus the header row, in the code below. With a list as short as ours, we could also simply divide by 3, since the number of cars is very easy to count, but that would break our calculation if additional car data was added to the list. For that reason, it’s better to use
len() to calculate the length of our car list in code so that if additional entries are added to our data set in the future, we can re-run this code and it will still produce the correct answer.)
total_range = 0 # create a variable to store the total range number for row in ev_data[1:]: # loop through each row in ev_data starting with row 2 (index 1) ev_range = row # each car's range is found in column 2 (index 1) total_range += ev_range # add this number to the number stored in total_range number_of_cars = len(ev_data[1:]) # calculate the length of our list, minus the header row print(total_range / number_of_cars) # print the average range
Python for loops are powerful, and you can nest more complex instructions inside of them. To demonstrate this, let’s repeat the above two steps for our
'price' column, this time within a single For Loop.
total_price = 0 # create a variable to store the total range number for row in ev_data[1:]: # loop through each row in ev_data starting with row 2 (index 1) price = row # each car's price is found in column 3 (index 2) price = int(price) # convert each price number from a string to an integer row = price # assign price, which is now an integer, back to index 2 in each row total_price += price # add each car's price to total_price number_of_cars = len(ev_data[1:]) # calculate the length of our list, minus the header row print(total_price / number_of_cars) # print the average price
We can also nest other elements, like If Else statements and even other for loops, within for loops.
For example, imagine we wanted to find every car with a range of greater than 200 miles in our list. We can start by creating a new empty list to hold our long-range car data. Then, we’ll use a for loop to iterate through
ev_data, the list of lists containing car data we created earlier, appending a car’s row to our long-range list only if the its range value is above 200:
long_range_car_list =  # creating a new list to store our long range car data for row in ev_data[1:]: # iterate through ev_data, skipping the header row ev_range = row # assign the range number, which is at index 1 in the row, to the range variable if ev_range > 200: # append the whole row to long-range list if range is higher than 200 long_range_car_list.append(row) print(long_range_car_list)
[['Tesla Model 3 LR', 310, 49900], ['Chevy Bolt', 238, 36620]]
These operations would also be simple to perform by hand with such a tiny data set, of course. But these same techniques will work on data sets with thousands and thousands of rows, which can make cleaning, sorting, and analyzing huge datasets into very quick work.
Other Useful Techniques: Range, Break, and Continue
You can get a surprising amount of mileage out of for loops just by mastering the techniques described above, but let’s dive even deeper and learn a few other things that may be helpful, even if you use them a bit less frequently in the context of data science work.
For loops can be used in tandem with Python’s
range() function to iterate through each number in a specified range. For example:
for x in range(5, 9): print(x)
5 6 7 8
Note that Python doesn’t include the maximum value of a range in the range count, which is why the number 9 doesn’t appear above. If we wanted this code to count from 5 to 9 including 9, we’d need to change
range(5, 9) to
for x in range(5, 10): print(x)
5 6 7 8 9
If you only specify a single number in your
range() function, Python will treat that as the maximum value, and assign a default minimum value of zero:
for x in range(3): print(x)
0 1 2
You can even add a third argument to the
range() function to specify that you’d like to count in increments of a specific number. As you can see above, the default value is 1, but if you add a third argument of 3, for example, you can use
range() with a for loop to count up in threes:
for x in range(0, 9, 3): print(x)
0 3 6
By default, a Python for loop will loop through each possible iteration of the interable object you’ve assigned it. Normally when we’re using a for loop, that’s fine, because we want to perform the same action on each item in our list (for example).
Sometimes, though, we may want to stop your loop if a certain condition is met. In that circumstance, the
break statement is useful. When used with an if statement inside of a for loop,
break allows us to break out of that loop before its conclusion.
Let’s take a look at a quick example first, using the list of names we created earlier called
for name in our_list: break print(name)
When we run this code, nothing is printed. That’s because the
break statement comes before
print(name) in our for loop. When Python sees
break, it stops executing the for loop and code that appears after
break in the loop doesn’t get run.
Let’s add an if statement to this loop, so that we break out of the loop when Python gets to the name Zining:
for name in our_list: if name == 'Zining': break print(name)
Lily Brad Fatima
Here, we can see that the name Zining wasn’t printed. Here’s what’s happening with each loop iteration:
- Python checks to see if the first name is ‘Zining’. It isn’t, so it continues executing the code below our if statement, and prints the first name.
- Python checks to see if the second name is ‘Zining’. It isn’t, so it continues executing the code below our if statement, and prints the second name.
- Python checks to see if the third name is ‘Zining’. It isn’t, so it continues executing the code below our if statement, and prints the third name.
- Python checks to see if the fourth name is ‘Zining’. It is, so
breakis executed and the for loop ends.
Let’s return to the code we wrote for collecting long-range EV car data and work through one more example. We’ll insert a break statement that stops the look as soon as it encounters the string
long_range_car_list =  # creating our empty long-range car list again for row in ev_data[1:]: # iterate through ev_data as before looking for cars with a range > 200 ev_range = row if ev_range > 200: long_range_car_list.append(row) if 'Tesla' in row: # but if 'Tesla' appears in the vehicle column, end the loop break print(long_range_car_list)
[['Tesla Model 3 LR', 310, 49900]]
In the code above, we can see that the Tesla was still added to
long_range_car_list, because we appended it to that list before the if statement where we used
break. The Chevy Bolt was not added to our list, because although it does have a range of more than 200 miles,
break ended the loop before Python reached the Chevy Bolt row.
(Remember, for loops execute in sequential order. If the Bolt was listed before the Tesla in our original data set, it would have been included in
When we’re looping through an iterable object like a list, we might also encounter situations where we’d like to skip a particular row or rows. For simple situations like skipping a header row, we can use list slicing, but if we want to skip rows based on more complex conditions, this quickly becomes impractical. Instead, we can use the
continue statement to skip a single iteration (“loop”) of a for loop and move to the next.
When Python sees
continue while executing a for loop on a list, for example, it will stop at that point and move on to the next item on the list. Any code that comes below the
continue will not be executed.
Let’s go back our list of names (
our_names) and use
continue with an if statement to end a loop iteration before printing if the name is ‘Brad’:
for name in our_list: if name == 'Brad': continue print(name)
Lily Fatima Zining
Above, we can see that Brad’s name was skipped, and the rest of the names in our list were printed in sequence. That illustrates the difference between
continue in a nutshell:
breakends the loop entirely. When Python executes
break, the for loop is over.
continueends a specific iteration of the loop and moves to the next item in the list. When Python executes
continueit moves immediately to the next loop iteration, but it does not end the loop entirely.
To get some more practice with
continue, let’s make a list of short-range EVs, using
continue to take a slightly different approach. Instead of identifying the EVs with less than 200 miles of range, we’ll write a for loop that adds every EV to our short-range list, but with a
continue statement before we append to the new list that runs if the range is greater than 200:
short_range_car_list =  # creating our empty short-range car list for row in ev_data[1:]: # iterate through ev_data as before ev_range = row if ev_range > 200: # if the car has a range of > 200 continue # end the loop here; do not execute the code below, continue to the next row short_range_car_list.append(row) # append the row to our short-range car list print(short_range_car_list)
[['Hyundai Ioniq EV', 124, 30315]]
That’s probably not the most efficient and readable way to create our short-range car list, but it does demonstrate how
continue works, so let’s walk through precisely what’s happening here.
On its first loop, Python is looking at the Tesla row. That car does have an EV range of more than 200 miles, so Python sees the if statement is true, and executes the
continue nested inside that if statement, which makes it immediately jump to the next row of
ev_data to begin its next loop.
On the second loop, Python is looking at the next row, which is the Hyundai row. That car has a range of under 200 miles, so Python sees that the conditional if statement is not met, and executes the rest of the code in the for loop, appending the Hyundai row to
On the third and final loop, Python is looking at the Chevy row. That car has a range of more than 200 miles, which means the conditional if statement is true. Thus, Python once again executes the nested
continue, which concludes the loop and, since there are no more rows of data in our data set, ends the for loop entirely.
Hopefully at this point, you’re feeling comfortable with for loops in Python, and you have an idea of how they can be useful for common data science tasks like data cleaning, data preparation, and data analysis.
Ready to take the next step? Here are some additional resources to check out:
- Advanced Python For Loops Tutorial – Learn to use for loops with NumPy, Pandas, and other more advanced techniques in this “sequel” to this tutorial.
- Python tutorials — Our ever-expanding list of Python tutorials for data science.
- Data Science Courses — Take your studies to the next level with fully interactive programming, data science, and stats courses, right in your browser.
- Python’s official documentation on For Loops – The official documentation doesn’t go into as much depth as this tutorial, but it does review the basics of For Loops explain some related concepts like While Loops.
- Dataquest’s Python Fundamentals for Data Science course – Our Python fundamentals course offers a from-scratch introduction to coding in Python for data science. It covers lists, loops, and a whole lot more, and you can code iteractively right from within your browser.
- Dataquest’s Intermediate Python for Data Science course – When you feel like you’ve mastered For Loops and other core Python concepts, this is another interactive course that’ll help you take your Python skills to the next level.
- Free Data Sets for Practice – Practice For Loops on your own by grabbing a free data set from one of these sources and applying your new skills to large, real-world data sets. The data sets in the first section (for data visualization) should work particularly well for practice projects since they should already be relatively clean.
Best of luck, and happy looping!