August 3, 2022

Python Dictionary Comprehension Tutorial (with 39 Code Examples)

You may have already mastered dictionaries in Python and learned a lot about list comprehension. Now it is time to combine these skills and learn something new: dictionary comprehension in Python.

What Is Dictionary Comprehension in Python?

Before that, though, let's remind ourselves what is comprehension in Python. It simply means applying a specific kind of operation on each element of an iterator (such as a list, dictionary, or tuple). We can, of course, make this logic more complex by introducing, for instance, conditional statements.

Let's make a simple example straight away! We will create a dictionary, containing the ages of some people.

d = {"Alex": 29, "Katherine": 24, "Jonathan": 22}
print(d)
    {'Alex': 29, 'Katherine': 24, 'Jonathan': 22}

What if we want to add one year to each age and save the result in a new dictionary? We can do it with a for loop.

# Dictionary to store new ages
new_d = {}

# Add one year to each age in the dictionary
for name, age in d.items(): # .items() method displays dictionary keys and values as a list of tuple pairs (name and age in this case)
    new_d[name] = age + 1
print(new_d)
    {'Alex': 30, 'Katherine': 25, 'Jonathan': 23}

However, the for loop can be rewritten in just one line using dictionary comprehension.

# Add one year to each age in the dictionary using comprehension
new_d = {name: age + 1 for name, age in d.items()}

print(new_d)
    {'Alex': 30, 'Katherine': 25, 'Jonathan': 23}

Does it look better? If you are used to list comprehension, you may also observe that the code is pretty similar. Let's break it into pieces.

We have names as keys and ages as values. Our new dictionary should have the same structure as the old one, {name: age}. Here, the name does not change but the age does. We create a new dictionary new_d by opening curly brackets, and inside write name, followed by a colon, :, and age. Now stop, and think about where name and age come from, and how we should modify age. They are from the original dictionary, so we can loop through it using the items() method and a for keyword in which the first variable is a key, and the second is a value. Thus, we write this for loop inside the brackets. Now, the only thing that is left is to modify the age value of the new dictionary by adding 1.

This is my usual thought process when I use dictionary comprehension. This example is very simple and with enough experience, you can do it almost automatically, but once the code becomes slightly more complex, it is a good idea to think exactly what your end result should look like and how you may achieve it.

Furthermore, it is an excellent idea to write a for loop before, make sure that it behaves the way you designed it, and then rewrite it using dictionary comprehension.

More Realistic Examples

Ok, enough with simple unrealistic examples. In real life, we may need to create some random dictionaries for testing purposes or just because we need some randomness for a statistical technique, but now we will analyze a real-world dataset from Kaggle and apply this Python dictionary technique. Note that the original dataset was modified by removing the tbd string from the user_review column.

We will be using a dataset which contains top video games on Metacritic in 1995-2021.

from csv import reader

# Open and read the dataset
all_games = open("all_games.csv")
all_games = reader(all_games)
all_games = list(all_games)

This dataset contains the following information:

  1. Name
  2. Platform
  3. Release Date
  4. Summary of the story
  5. Meta Score
  6. User Score

Let's say, we are interested in creating a dictionary with the name as a key and the platform as a value.

# Dictionary to store video game platforms
platform_dict = {}

# Create dictionary of platforms
for game in all_games[1:]:  # Do not include the header
    name = game[0]
    platform = game[1]
    platform_dict[name] = platform

# Print five items of the dictionary (need to transform it into a list before)
print(list(platform_dict.items())[:5])
    [('The Legend of Zelda: Ocarina of Time', ' Nintendo 64'), ("Tony Hawk's Pro Skater 2", ' Nintendo 64'), ('Grand Theft Auto IV', ' PC'), ('SoulCalibur', ' Xbox 360'), ('Super Mario Galaxy', ' Wii')]

Notice that each platform has a space before its name, so let's remove it. It is doable with either a for loop or dictionary comprehension.

# Using a for loop
for name, platform in platform_dict.items():
    platform_dict[name] = platform.strip()

Above is one way of removing spaces from the values. We used the str.strip method which removes leading and trailing characters that we specify or whitespaces if no argument is given. Now try to rewrite the code using dictionary comprehension before reading the answer.

platform_dict = {name: platform.strip() for name, platform in platform_dict.items()}

That is a much fancier way to write the same code!

Sometimes we may need to extract each column of a dataset, convert them into lists and then use each element of one list as a dictionary key and each element of another list as a dictionary value. This process would allow a straightforward way to access the dictionary elements, filter and manipulate them. This Is it possible? Yes, it is but let's first transform each column into a separate list.

# Initialize empty lists to store column values
name = []
platform = []
date = []
summary = []
meta_score = []
user_score = []

# Iterate over columns and append values to lists
for game in all_games[1:]:
    name.append(game[0])
    platform.append(game[1].replace(" ", ""))
    date.append(game[2])
    summary.append(game[3])
    meta_score.append(float(game[4]))
    user_score.append(float(game[5]))

Now that we have all the different lists, we can create a dictionary out of two of them. We already have a dictionary of platforms, but now let's extract the release year (not the date!). We will be using the zip() function that allows iterating over several iterables at the same time.

# Dictionary to store dates
year_dict = {}

# Populate dictionary with game's names and release years
for key, value in zip(name, date):
    year_dict[key] = value[-4:]

# Print release year of Grand Theft Auto IV
print(f"Grand Theft Auto IV was released in {year_dict['Grand Theft Auto IV']}.")
    Grand Theft Auto IV was released in 2008.

Now try doing the same but with dictionary comprehension before looking at the answer.

# Dictionary with game's names and release years using dictionary comprehension
year_dict = {key: value[-4:] for key, value in zip(name, date)}

# Print release year of Red Dead Redemption 2
print(f"Red Dead Redemption 2 was released in {year_dict['Red Dead Redemption 2']}.")
    Red Dead Redemption 2 was released in 2019.

As we can see, dictionary comprehension in Python enables shorter and less complex code. Note that we are using string slicing (i.e., [-4:]) to extract just the year from the date.

Conditional Statements in Dictionary Comprehension

Now you should be able to comprehend the basic logic of dictionary comprehension in Python. Thus, let's complicate things a little bit.

We can use conditional statements to filter some information before creating a dictionary! Let’s recreate the dictionary of video games released after 2014.

# Video games released after 2014
after_2014 = {key: value[-4:] for key, value in zip(name, date) if int(value[-4:]) > 2014}

# Print the first five items of the dictionary
print(list(after_2014.items())[:5])
    [('Red Dead Redemption 2', '2019'), ('Disco Elysium: The Final Cut', '2021'), ('The Legend of Zelda: Breath of the Wild', '2017'), ('Super Mario Odyssey', '2017'), ('The House in Fata Morgana - Dreams of the Revenants Edition -', '2021')]

In the above code snippet, we reused the same code but added an if conditional statement to apply a filter. We wanted to filter by the value so we extracted the year, converted the number into an integer, and then compared it to the integer 2014. If the year was before 2014, then that dictionary element was not included in the dictionary.

We can, of course, make the logic more complicated by introducing a logical operator (and, or, not). For example, if we want all the games that were released between 2012 and 2018 (both inclusive), then the code is the following.

# Video games released between 2014 and 2018 (both inclusive)
y_2012_2018 = {
    key: value[-4:]
    for key, value in zip(name, date)
    if int(value[-4:]) >= 2014 and int(value[-4:]) <= 2018
}

# Print the first five items of the dictionary
print(list(y_2012_2018.items())[:5])
    [('Red Dead Redemption 2', '2018'), ('Grand Theft Auto V', '2015'), ('The Legend of Zelda: Breath of the Wild', '2017'), ('Super Mario Odyssey', '2017'), ('The Last of Us Remastered', '2014')]

At this point, the code becomes rather complicated to understand so if we need additional logical operators, it is better to resort to a standard for loop.

Let's make another example, and filter by meta scores which are either lower than 25 or higher than 97.

# Games with meta score below 25 or above 97
meta_25_or_97 = {
    key: value
    for key, value in zip(name, meta_score)
    if float(value) < 25 or float(value) > 97
}
print(meta_25_or_97)
    {'The Legend of Zelda: Ocarina of Time': 99.0, "Tony Hawk's Pro Skater 2": 98.0, 'Grand Theft Auto IV': 98.0, 'SoulCalibur': 98.0, 'NBA Unrivaled': 24.0, 'Terrawars: New York Invasion': 24.0, 'Gravity Games Bike: Street Vert Dirt': 24.0, 'Postal III': 24.0, 'Game Party Champions': 24.0, 'Legends of Wrestling II': 24.0, 'Pulse Racer': 24.0, 'Fighter Within': 23.0, 'FlatOut 3: Chaos & Destruction': 23.0, 'Homie Rollerz': 23.0, "Charlie's Angels": 23.0, 'Rambo: The Video Game': 23.0, 'Fast & Furious: Showdown': 22.0, 'Drake of the 99 Dragons': 22.0, 'Afro Samurai 2: Revenge of Kuma Volume One': 21.0, 'Infestation: Survivor Stories (The War Z)': 20.0, 'Leisure Suit Larry: Box Office Bust': 20.0}

To practice, create a dictionary with games with user scores that are between 6 and 8 (inclusive). Also, make sure that the value in the dictionary is a float number.

Use Cases

The next four sectons will demonstrate different use cases of dictionary comprehension:

  1. How to use comprehension in a nested dictionary
  2. How to sort a dictionary
  3. How to flatten a dictionary
  4. How to compute the word frequency in a string

Nested Dictionary Comprehension

Sometimes, we need to work with a nested dictionary, i.e., when we have a dictionary inside another dictionary. Let's first of all, create a nested dictionary.

# Create a nested dictionary
nested_d = {}
for n, p, dt in zip(name, platform, date):
    nested_d[n] = {"platform": p, "date": dt}

# Print the first five items of the nested dictionary
print(list(nested_d.items())[:5])
    [('The Legend of Zelda: Ocarina of Time', {'platform': 'Nintendo64', 'date': 'November 23, 1998'}), ("Tony Hawk's Pro Skater 2", {'platform': 'Nintendo64', 'date': 'August 21, 2001'}), ('Grand Theft Auto IV', {'platform': 'PC', 'date': 'December 2, 2008'}), ('SoulCalibur', {'platform': 'Xbox360', 'date': 'July 2, 2008'}), ('Super Mario Galaxy', {'platform': 'Wii', 'date': 'November 12, 2007'})]

There is a way to rewrite the above for loop using dictionary comprehension. Do it for practice.

In order to not complicate things, we will be using only the platform in the inner value. Very often, this is the format the data is delivered to us via API.

# Create another nested dictionary
nested_d = {}
for n, dt in zip(name, date):
    nested_d[n] = {"date": dt}

# Print the first five items of the dictionary
print(list(nested_d.items())[:5])
    [('The Legend of Zelda: Ocarina of Time', {'date': 'November 23, 1998'}), ("Tony Hawk's Pro Skater 2", {'date': 'August 21, 2001'}), ('Grand Theft Auto IV', {'date': 'December 2, 2008'}), ('SoulCalibur', {'date': 'July 2, 2008'}), ('Super Mario Galaxy', {'date': 'November 12, 2007'})]

Let's say we want to extract the years and convert them into integers while keeping the same nested dictionary format. One way of doing that is by using a for loop.

# Extract release years and transform them into integers inside the nested dictionary
for (title, outer_value) in nested_d.items():
    for (inner_key, date) in outer_value.items():
        try:
            outer_value.update({inner_key: int(date[-4:])})
        except:
            print(date)
nested_d.update({title: outer_value})

# Print the first five items of the dictionary
print(list(nested_d.items())[:5])
    [('The Legend of Zelda: Ocarina of Time', {'date': 1998}), ("Tony Hawk's Pro Skater 2", {'date': 2001}), ('Grand Theft Auto IV', {'date': 2008}), ('SoulCalibur', {'date': 2008}), ('Super Mario Galaxy', {'date': 2007})]

It is, of course, possible to rewrite the above for loop using dictionary comprehension. In order to avoid errors, the code cell below will recreate the lists and the nested dictionary before running dictionary comprehension.

# Initialize empty lists to store column values
name = []
platform = []
date = []
summary = []
meta_score = []
user_score = []

# Iterate over columns and append values to lists
for game in all_games[1:]:
    name.append(game[0])
    platform.append(game[1].replace(" ", ""))
    date.append(game[2])
    summary.append(game[3])
    meta_score.append(float(game[4]))
    user_score.append(float(game[5]))

# Create a nested dictionary
nested_d = {}
for n, dt in zip(name, date):
    nested_d[n] = {"date": dt}
# The previous for loop using nested dictionary comprehension
dict_comprehension = {
    title: {inner_key: int(date[-4:]) for (inner_key, date) in outer_value.items()}
    for (title, outer_value) in nested_d.items()
}

# Print the first five items of the dictionary
print(list(dict_comprehension.items())[:5])
    [('The Legend of Zelda: Ocarina of Time', {'date': 1998}), ("Tony Hawk's Pro Skater 2", {'date': 2001}), ('Grand Theft Auto IV', {'date': 2008}), ('SoulCalibur', {'date': 2008}), ('Super Mario Galaxy', {'date': 2007})]

The outer for loop here is the second for loop, while the inner for loop is the first for loop. The code gets pretty complicated at this point, so in many cases, it is just not worth it to keep trying using dictionary comprehension. Remember that one of the main principles of Python is readability.

Sorting a Dictionary with Comprehension

Another use of dictionary comprehension in Python is dictionary sorting. For example, let's say we want to sort the dictionary with game titles and years by year from the first to the latest year? Let's first transform years from str data type into int data type (using comprehension!).

# Convert years from strings into intergers
year_dict_int = {key: int(value) for key, value in year_dict.items()}

# Sort by year
sorted_year = {key: value for key, value in sorted(year_dict_int.items(), key=lambda x: x[1])}

# Print the first five items of the dictionary sorted by year
print(list(sorted_year.items())[:5])
    [('Full Throttle', 1995), ("Sid Meier's Civilization II", 1996), ('Diablo', 1996), ('Super Mario 64', 1996), ('Wipeout XL', 1996)]

We used the sorted function which accepts the key argument that we need to tell the function which element we want to sort on. In this case, we have two options: sort on dictionary keys or values. Values come second, so the index should be 1, while for keys it would be 0. Keys are usually assigned to lambda functions which are anonymous functions often used in Python. Do you think it is possible to apply a for loop to achieve the same result?

Flattening List of Dictionaries

Sometimes we face a list of dictionaries, and we want one single dictionary. It can also be done with dictionary comprehension.

# Generate list of dictionaries
list_d = [
    {"Full Throttle": 1995},
    {"Sid Meier's Civilization II": 1996},
    {"Diablo": 1996},
]

# Flatten list of dictionary
list_d_flat = {
    title: year for dictionary in list_d for title, year in dictionary.items()
}

# Print flattened dictionary
print(list_d_flat)
    {'Full Throttle': 1995, "Sid Meier's Civilization II": 1996, 'Diablo': 1996}

For sure, we can also add some conditional statements in the above code as we did for the previous dictionary comprehensions. For example, below, we filter out all years which are not 1996.

# Flatten list of dictionary
list_d_flat_1996 = {
    title: year
    for dictionary in list_d
    for title, year in dictionary.items()
    if year == 1996
}

# Print flattened dictionary with only the games released in 1996
print(list_d_flat_1996)
    {"Sid Meier's Civilization II": 1996, 'Diablo': 1996}

Are you able to recreate the above dictionary comprehension with a for loop? Trust me, it is one of the best ways to understand the logic behind dictionary comprehension.

Word Frequency

One of the steps in Natural Language Processing is counting the word occurences in a text. A natural way to represent this data is by using a dictionary, where the key is the word, and the value is the number of times this word appears in the text. It is also a job for dictionary comprehension!

# Quote by Henry Van Dyke
quote = """time is too slow for those who wait too swift for those who fear too long for 
those who grieve too short for those who rejoicebut for those who love, time is eternity"""

# Count frequency of words in the quote
frequency_dict = {word: quote.split(" ").count(word) for word in quote.split(" ")}

print(frequency_dict)
    {'time': 2, 'is': 2, 'too': 4, 'slow': 1, 'for': 5, 'those': 4, 'who': 5, 'wait': 1, 'swift': 1, 'fear': 1, 'long': 1, '\nthose': 1, 'grieve': 1, 'short': 1, 'rejoicebut': 1, 'love,': 1, 'eternity': 1}

It would have taken us three lines of code if we used a for loop (but try recreating it).

Wrapping Up

There are so many ways we can exploit dictionary comprehension in Python to improve the code, that it would take too much paper to list them all. I suggest you start using them straight away whenever you see the possibility (but keep readability in mind) and read the code of other people to get ideas on the use cases of the code.

At some point, it will become pretty natural to write dictionary comprehension straight away but to reach this point you have to write code, and do projects. Reading tutorials and doing code exercises is helpful to grasp the concept, but projects are a real game-changer in your data science or coding career.

In this tutorial, we have learned:

  • What is dictionary comprehension in Python
  • How to create dictionaries using this technique
  • How to use conditional statements in a dictionary comprehension
  • What is nested dictionary comprehension
  • How to sort a dictionary
  • How to flatten a dictionary
  • And how to count word occurrences in a string

I hope that you have learned something new today. Feel free to connect with me on LinkedIn or GitHub. Happy coding!

Artur Sannikov

About the author

Artur Sannikov

I am a Molecular Biology student at the University of Padua, Italy interested in bioinformatics and data analysis.