Tutorial: Why Functions Modify Lists and Dictionaries in Python

python-data-science-tutorial-mutable-data-types

Python’s functions (both the built-in ones and custom functions we write ourselves) are crucial tools for working with data. But what they do with our data can be a little confusing, and if we’re not aware of what’s going on, it could cause serious errors in our analysis.

In this tutorial, we’re going to take a close look at how Python treats different data types when they’re being manipulated inside of functions, and learn how to ensure that our data is being changed only when we want it to be changed.

Memory Isolation in Functions

To understand how Python handles global variables inside functions, let’s do a little experiment. We’ll create two global variables, number_1 and number_2, and assign them to the integers 5 and 10. Then, we’ll use those global variables as the arguments in a function that performs some simple math. We’ll also use the variable names as the function’s parameter names. Then, we’ll see whether all of the variable usage inside our function has affected the global value of these varaibles.

number_1 = 5
number_2 = 10

def multiply_and_add(number_1, number_2):
    number_1 = number_1 * 10
    number_2 = number_2 * 10
    return number_1 + number_2

a_sum = multiply_and_add(number_1, number_2)
print(a_sum)
print(number_1)
print(number_2)
150
5
10

As we can see above, the function worked correctly, and the values of the global variables number_1 and number_2 did not change, even though we used them as arguments and parameter names in our function. This is because Python stores variables from a function in a different memory location from global variables. They are isolated. Thus, the variable number_1 can have one value (5) globally, and a different value (50) inside the function, where it is isolated.

(Incidentally, if you’re confused about the difference between parameters and arguments, Python’s documentation on the subject is quite helpful.)

What About Lists and Dictionaries?

Lists

We’ve seen that what we do to a variable like number_1 above inside a function doesn’t affect its global value. But number_1 is an integer, which is a pretty basic data type. What happens if we try the same experiment with a different data type, like a list? Below, we’ll create a function called duplicate_last() that will duplicate the final entry in any list we pass it as an argument.

initial_list = [1, 2, 3]

def duplicate_last(a_list):
    last_element = a_list[-1]
    a_list.append(last_element)
    return a_list

new_list = duplicate_last(a_list = initial_list)
print(new_list)
print(initial_list)
[1, 2, 3, 3]
[1, 2, 3, 3]

As we can see, here the global value of initial_list was updated, even though its value was only changed inside the function!

Dictionaries

Now, let’s write a function that takes a dictionary as an argument to see if a global dictionary variable will be modified when it’s manipulated inside a function as well.

To make this a bit more realistic, we’ll be using data from the AppleStore.csv data set that’s used in our Python Fundamentals course (the data is available for download here).

In the snippet below, we’re starting with a dictionary that contains counts for the number of apps with each age rating in the dataset (so there are 4,433 apps rated “4+”, 987 apps rated “9+”, etc.). Let’s imagine we want to calculate a percentage for each age rating, so we can get a picture of which age ratings are the most common among apps in the App Store.

To do this, we’ll write a function called make_percentages() that will take a dictionary as an argument and convert the counts to percentages. We’ll need to start a count at zero and then iterate over each value in the dictionary, adding them to the count so we get the total number of ratings. Then we’ll to iterate over the dictionary again and do some math to each value to calculate the percentage.

content_ratings = {'4+': 4433, '9+': 987, '12+': 1155, '17+': 622}

def make_percentages(a_dictionary):
    total = 0
    for key in a_dictionary:
        count = a_dictionary[key]
        total += count

    for key in a_dictionary:
        a_dictionary[key] = (a_dictionary[key] / total) * 100

    return a_dictionary

Before we look at the output, let’s quickly review what’s happening above. After assigning our dictionary of app age ratings to the variable content_ratings, we create a new function called make_percentages() that takes a single argument: a_dictionary.

To figure what percentage of apps fall into each age rating, we’ll need to know the total number of apps, so we first set a new variable called total to 0 and then loop through each key in a_dictionary, adding it to total.

Once that’s finished, all we need to do is loop through a_dictionary again, dividing each entry by the total and then multiplying the result by 100. This will give us a dictionary with percentages.

But what happens when we use our global content_ratings as the argument for this new function?

c_ratings_percentages = make_percentages(content_ratings)
print(c_ratings_percentages)
print(content_ratings)
{'4+': 61.595109073224954, '9+': 13.714047519799916, '12+': 16.04835348061692, '17+': 8.642489926358204}
{'4+': 61.595109073224954, '9+': 13.714047519799916, '12+': 16.04835348061692, '17+': 8.642489926358204}

Just as we saw with lists, our global content_ratings variable has been changed, even though it was only modified inside of the make_percentages() function we created.

So what’s actually happening here? We’ve bumped up against the difference between mutable and immutable data types.

Mutable and Immutable Data Types

In Python, data types can be either mutable (changeable) or immutable (unchangable). And while most of the data types we’ve worked with in introductory Python are immutable (including integers, floats, strings, Booleans, and tuples), lists and dictionaries are mutable. That means a global list or dictionary can be changed even when it’s used inside of a function, just like we saw in the examples above.

To understand the difference between mutable (changable) and immutable (unchangable), it’s helpful to look at how Python actually treats these variables.

Let’s start by considering a simple variable assignment:

a = 5

The variable name a acts like a pointer toward 5, and it helps us retrieve 5 whenever we want.

img

5 is an integer, and integers are immutable data types. If a data type is immutable, it means it can’t be updated once it’s been created. If we do a += 1, we’re not actually updating 5 to 6. In the animation below, we can see that:

  • a initially points toward 5.
  • a += 1 is run, and this moves the pointer from 5 to 6, it doesn’t actually change the number 5.

Mutable data types like lists and dictionaries behave differently. They can be updated. So, for example, let’s make a very simple list:

list_1 = [1, 2]

If we append a 3 to the end of this list, we’re not simply pointing list_1 toward a different list, we’re directly updating the existing list:

img

Even if we create multiple list variables, as long as they point to the same list, they’ll all be updated when that list is changed, as we can see in the code below:

list_1 = [1, 2]
list_2 = list_1
list_1.append(3)
print(list_1)
print(list_2)
[1, 2, 3]
[1, 2, 3]

Here’s an animated visualization of what’s actually happening in the code above:

img

This explains why our global variables were changed when we were experimenting with lists and dictionaries earlier. Because lists and dictionaries are mutable, changing them (even inside a function) changes the list or dictionary itself, which isn’t the case for immutable data types.

Keeping Mutable Data Types Unchanged

Generally speaking, we don’t want our functions to be changing global variables, even when they contain mutable data types like lists or dictionaries. That’s because in more complex analyses and programs, we might be using many different functions frequently. If all of them are changing the lists and dictionaries they’re working on, it can become quite difficult to keep track of what’s changing what.

Thankfully there’s an easy way to get around this: we can make a copy of the list or dictionary using a built-in Python method called .copy().

If you haven’t learned about methods yet, don’t worry. They’re covered in our intermediate Python course, but for this tutorial, all you need to know is that .copy() works like .append():

list.append() # adds something to a list
list.copy() # makes a copy of a list

Let’s take another look at that function we wrote for lists, and update it so that what happens inside our function doesn’t change initial_list. All we need to do is change the argument we pass to our function from initial_list to initial_list.copy()

initial_list = [1, 2, 3]

def duplicate_last(a_list):
    last_element = a_list[-1]
    a_list.append(last_element)
    return a_list

new_list = duplicate_last(a_list = initial_list.copy()) # making a copy of the list
print(new_list)
print(initial_list)
[1, 2, 3, 3]
[1, 2, 3]

As we can see, this has fixed our problem. Here’s why: using .copy() creates a separate copy of the list, so that instead of pointing to initial_list itself, a_list points to a new list that starts as a copy of initial_list. Any changes that are made to a_list after that point are made to that separate list, not initial_list itself, thus the global value of initial_list is unchanged.

img

This solution still isn’t perfect, though, because we’ll have to remember to add .copy() every time we pass an argument to our function or risk accidentally changing the global value of initial_list. If we don’t want to have to worry about that, we can actually create that list copy inside the function itself:

initial_list = [1, 2, 3]

def duplicate_last(a_list):
    copy_list = a_list.copy() # making a copy of the list
    last_element = copy_list[-1]
    copy_list.append(last_element)
    return copy_list

new_list = duplicate_last(a_list = initial_list)
print(new_list)
print(initial_list)
[1, 2, 3, 3]
[1, 2, 3]

With this approach, we can safely pass a mutable global variable like initial_list to our function, and the global value won’t be changed because the function itself makes a copy and then performs its operations on that copy.

The .copy() method works for dictionaries, too. As with lists, we can simply add .copy() to the argument that we pass our function to create a copy that’ll be used for the function without changing the original variable:

content_ratings = {'4+': 4433, '9+': 987, '12+': 1155, '17+': 622}

def make_percentages(a_dictionary):
    total = 0
    for key in a_dictionary:
        count = a_dictionary[key]
        total += count

    for key in a_dictionary:
        a_dictionary[key] = (a_dictionary[key] / total) * 100

    return a_dictionary


c_ratings_percentages = make_percentages(content_ratings.copy()) # making a copy of the dictionary
print(c_ratings_percentages)
print(content_ratings)
{'4+': 61.595109073224954, '9+': 13.714047519799916, '12+': 16.04835348061692, '17+': 8.642489926358204}
{'4+': 4433, '9+': 987, '12+': 1155, '17+': 622}

But again, using that method means we need to remember to add .copy() every time we pass a dictionary into our make_percentages() function. If we’re going to be using this function frequently, it might be better to implement the copying into the function itself so that we don’t have to remember to do this.

Below, we’ll use .copy() inside the function itself. This will ensure that we can use it without changing the global variables we pass to it as arguments, and we don’t need to remember to add .copy() to each argument we pass.

content_ratings = {'4+': 4433, '9+': 987, '12+': 1155, '17+': 622}

def make_percentages(a_dictionary):
    copy_dict = a_dictionary.copy() # create a copy of the dictionary
    total = 0
    for key in a_dictionary:
        count = a_dictionary[key]
        total += count

    for key in copy_dict: #use the copied table so original isn't changed
        copy_dict[key] = (copy_dict[key] / total) * 100

    return copy_dict

c_ratings_percentages = make_percentages(content_ratings)
print(c_ratings_percentages)
print(content_ratings)
{'4+': 61.595109073224954, '9+': 13.714047519799916, '12+': 16.04835348061692, '17+': 8.642489926358204}
{'4+': 4433, '9+': 987, '12+': 1155, '17+': 622}

As we can see, modifying our function to create a copy of our dictionary and then change the counts to percentages only in that copy has allowed us to perform the operation we wanted without actually changing content_ratings.

Conclusions

In this tutorial, we looked at the difference between mutable data types, which can change, and immutable data types, which cannot. We learned how we can use the method .copy() to make copies of mutable data types like lists and dictionaries so that we can work with them in functions without changing their global values.

Click Here to Leave a Comment Below

Leave a Comment:

Share On Facebook
Share On Twitter
Share On Linkedin
Share On Reddit