October 3, 2018

Python Dictionary Tutorial: Analyze Craft Beer with Dictionaries

Python offers a variety of data structures to hold our information — Python dictionaries being one of the most useful because they are quick, easy to use, and flexible.

As a beginning programmer, you can use this Python tutorial to become familiar with dictionaries and their common uses so that you can start incorporating them immediately into your own code.

When performing data analysis, we often have data that is an unusable or hard-to-use form. Python dictionaries can help here by making it easier to read and change our data.

For this tutorial, we will use the Craft Beers data sets from Kaggle. There is one data set describing beer characteristics, and another that stores geographical information on breweries. For the purposes of this article, our data will be stored in the beers and breweries variables, each of which is a list of lists.

The tables below give a quick look at what the data look like. This table contains the first row from the beers data set.

abv ibu id name style brewery_id ounces
0 0.05 1436 Pub Beer American Pale Lager 408 12.0

This table contains the first row from the breweries data set.

name city state
0 Northgate Brewing Minneapolis MN

Prerequisite Knowledge

This article assumes basic knowledge of Python. To fully understand the article, you should be comfortable working with lists and for loops. If you're not there yet, check out our interactive Python fundamentals course, which covers these topics with no experience required.

In this tutorial, we'll cover:

  • Key terms and concepts related to Python dictionaries
    • Dictionary rules
  • Basic dictionary operations
    • creation and deletion
    • access and insertion
    • membership checking
  • Looping techniques
  • Dictionary comprehensions
  • Dictionary advantages and disadvantages

Getting into our role

Let's imagine we're a reviewer for a beer enthusiast magazine. We want to know ahead of time what each brewery will have before we arrive to review, so that we can gather useful background information. Our data sets hold information on beers and breweries, but the data themselves are not immediately accessible.

The data are currently in the form of a list of lists. To access individual data rows, you must use a numbered index. To get the first data row of breweries, you look at the 2nd item (the column names are first).

breweries[1]

breweries[1] is a list, so you can also index from it as well. Getting the third item in this list would look like:

breweries[1][2]

If you didn't know that breweries was data on breweries, you'd have a hard time understanding what the indexing is trying to do. Imagine writing this code and looking at it again 6 months in the future. You're more than likely to forget, so it would be good for us to reformat the data in a more readable way.

Key Terms and Concepts

Python dictionaries are made up of key-value pairs. Looking at the key in a Python dictionary is akin to the looking for a particular word in a physical dictionary. The value is the corresponding data that is associated with the key, comparable a word's definition in a physical dictionary. The key is what we look up, and it's the value that we're actually interested in. Python Dictionary Example When we're talking about Python dictionaries, we say that values are mapped to keys. In the example above, if we look up the word "programmer" in the English dictionary, we'll see: "a person who writes computer programs." The word "programmer" is the key mapped to the definition of the word.

Python Dictionary Rules for Keys and Values

Python dictionaries are immensely flexible because they allow anything to be stored as a value, from primitive types like strings and floats to more complicated types like objects and even other dictionaries (more on this later).

By contrast, there are limitations to what can be used as a key. A key is required to be an immutable object in Python, meaning that it cannot be alterable. This rule allows strings, integers, and tuples as keys, but excludes lists and dictionaries since they are mutable, or able to be altered. The rationale is simple: if any changes happen to a key without you knowing, you won't be able to access the value anymore, rendering the dictionary useless. Thus, only immutable objects are allowed to be keys.

A key must also be unique within a dictionary. The key-value structuring of a dictionary is what makes it so powerful, and throughout this post we'll delve into its basic operations, use cases, and their advantages and disadvantages.

Basic Dictionary Operations

Creation and Deletion

Let's start with how to create a dictionary. First, we will learn how to make an empty dictionary since you'll often find that you want to start with an empty one and populate with data as needed.

To create an empty dictionary, we can either use the dict() function with no inputs, or assign a pair of curly brackets with nothing in between to a variable. We can confirm that both methods will produce the same result.

empty = {}
also_empty = dict()
empty == also_empty
>>> True

Now, an empty dictionary isn't of much use to anybody, so we must add our own key-value pairs. We will cover this later in the article, but know that we can start with empty dictionaries and populate them after they're created. This will allow us to add in more information when as we need to.

empty["First key"] = "First value"
empty["First key"]
>>> "First value"

Alternatively, you can also create a dictionary and pre-populate it with key-value pairs. There are two ways to do this. The first is to use brackets containing the key-value pairs. Each key and value are separated by a :, while individual pairs are separated by a comma.

While you can fit everything on one line, it's better to split up your key-value pairs among different lines to improve readability.

data = {
"beer_data": beers,
"brewery_data": breweries
}

The above code creates a single dictionary, data, where our keys are descriptive strings and the values are our data sets. This single dictionary allows us to access both data sets by name.

The second way to create a Python dictionary is through the dict() method. You can supply the keys and values either as keyword arguments or as a list of tuples. We will recreate the data dictionary from above using the dict() methods and providing the key-value pairs appropriately.

# Using keyword arguments
data2 = dict(beer_data=beers, brewery_data=breweries)
# Using a list of tuples
tuple_list = [("brewery_data", breweries), ("beer_data", beers)]
data3 = dict(tuple_list)

We can confirm that each of the data dictionaries are equivalent in Python's eyes.

data == data2 == data3
>>> True

With each option, the key and value pairs must be formatted in a particular way, so it's easy to get mixed up. The diagram below helps to sort out where keys and values are in each.

Python Dictionary Creation Methods

We now have three dictionaries storing the exact same information, so it's best if we just keep one. Dictionaries themselves don't have a method for deletion, but Python provides the del statement for this purpose.

del data2
del data3

After creating your dictionaries, you'll almost certainly need to add and remove items from them. Python provides a simple, readable syntax for these operations.

Data access and insertion

The current state of our beers and breweries dictionary is still dire — each of the data sets originally was a list of lists, which makes it difficult to access specific data rows.

We can achieve better structure by reorganizing each of the data sets into its own dictionary and creating some helpful key-value pairs to describe the data within the dictionary. The raw data itself is mixed. The first row in each list of lists is a list of strings containing the column names, but the rest contains the actual data. It'll be better to separate the columns from the data so we can be more explicit.

Thus, for each data set, we'll create a dictionary with three keys mapped to the following values:

  1. The raw data itself
  2. The list containing the column names
  3. The list of lists containing the rest of the data
beer_details = {
"raw_data": beers,
"columns": beers[0],
"data": beers[1:]
}
brewery_details = {
"raw_data": breweries,
"columns": breweries[0],
"data": breweries[1:]
}

Now, we can reference the columns key explicitly to list the column names of the data instead of indexing the first item in raw_data.

Similarly, we are now able to explicitly ask for just the data from the data key. So far, we've learned how to create empty and prepopulated dictionaries, but we do not know how to read information from them once they've been made.

To access items within a Python dictionary, we need bracket notation. We reference the dictionary itself followed by a pair of brackets with the key that we want to look up. For example below, we read the column names from brewery_details.

brewery_details["columns"]
>>> ['', 'name', 'city', 'state']

This action should feel similar to us looking up a word in a English dictionary. We "looked up" a key and got the information we wanted back in the mapped value.

In addition to looking up key-value pairs, sometimes we'll actually want to change the value associated with a key in our dictionaries. This operation also uses bracket notation. To change a dictionary value, we first access the key and then reassign it using an = expression. We saw in the code above that one of the brewery columns is an empty string, but this first column actually contains a unique ID for each brewery! We will reassign the first column name to a more informative name.

# reassigning the first column of the breweries data set
brewery_details["columns"][0] = 'brewery_id'
# confirming that our reassignment worked
brewery_details["columns"][0]
>>> "brewery_id"

If the series of brackets looks confusing, don't fret. We have taken advantage of nesting. We know that the columns key in brewery_details is mapped to a list, so we can treat brewery_details["columns"] as a list (i.e. we can use list indexing).

Nesting can get confusing if we lose track of what each level represents, but we visualize this nesting below to clarify. Python Dictionary Nesting It's also common practice to nest dictionaries within dictionaries because it creates self-documenting code. That is to say, it is evident what the code is doing just by reading it, without any comments to help.

Self-documenting code is immensely useful because it is easier and faster to understand at a moment's read through. We want to preserve this self-documenting quality, so we will nest the beer_details and brewery_details dictionaries into a centralized dictionary. The end result will be nested Python dictionaries that are easier to read from than the original raw data itself.

# datasets is now a dictionary whose values are other dictionaries
datasets = {
"beer": beer_details,
"breweries": brewery_details
}
# This structure allows us to make self-documenting inquiries to both data sets
datasets["beer"]["columns"]
>>> ['', 'abv', 'ibu', 'id', 'name', 'style', 'brewery_id', 'ounces']
# Compare the above to how our older data dictionary would have been written
data["beer_data"][0]
>>> ['', 'abv', 'ibu', 'id', 'name', 'style', 'brewery_id', 'ounces']

The information embedded in the code is clear if we nest dictionaries within dictionaries. We've created a structure that easily describes the intent of the programmer. The following illustration breaks down the dictionary nesting.

Nesting 2

From here on out, we'll use datasets to manage our data sets and perform more data reorganization. The beer and brewery dictionaries we made are a good start, but we can do more. We'd like to create a new key-value pair to contain a string description of what each data set contains in case we forget.

We can create dictionaries and read and change values of present key-value pairs, but we don't know how to insert a new key-value pair. Thankfully, inserting pairs is similar to reassigning dictionary values. If we assign a value to a key that doesn't exist in a dictionary, Python will take the new key and value and create the pair within the dictionary.

# The description key currently does not exist in either the inner dictionary
datasets["beer"]["description"] = "Contains data on beers and their qualities"
datasets["breweries"]["description"] = "Contains data on breweries and their locations"

While Python makes it easy to insert new pairs into the dictionary, it stops users if they try to access keys that don't exist. If you try to access a key that doesn't exist, Python will throw an error and stop your code from running.

# The key best_beer doesn't currently exist, so we cannot access it
datasets["beer"]["best_beer"]
>>> KeyError: 'best_beer'

As your Python dictionaries get more complex, it's easier to lose track of which keys are present. If you leave your code for a week and forget what's in your dictionary, you'll constantly run into KeyErrors.

Thankfully, Python provides us with an easy way to check the present keys in a dictionary. This process is called membership checking.

Membership checking

If we want to check if a key exists within a dictionary, we can use the in operator. You can also check on whether a key doesn't exist by using not in. The resulting code reads almost like natural English, which also means it is easier to understand at first glance.

"beer" in datasets
>>> True
"wine" in datasets
>>> False
"wine" not in datasets
>>> True

Using in for membership checking has great utility in conjunction with if-else statements. This combination allows you to set up conditional logic that will prevent you from getting KeyErrors and enable you to make more sophisticated code. We won't delve too deeply into this concept, but there's resources at the end for the curious.

Section Summary

At this point, we know how to create, read, update, and delete data from our Python dictionaries. We transformed our two raw data sets into dictionaries with greater readability and ease of use. With these basic dictionary operations, we can start performing more complex operations. For example, it is extremely common to want to loop over the key-value pairs and perform some operation on each pair.

Looping Techniques

When we created the description key for each of the data sets, we made two individual statements to create each key-value pair. Since we performed the same operation, it would be more efficient to use loops. Python provides three main methods to use dictionaries in loops: keys(), values(), and items(). Using keys() and values() allows us to loop over those parts of the dictionary.

for key in datasets.keys():
    print(key)
>>> beer
>>> breweries
for val in datasets.values():
    print(type(val))
>>> <class 'dict'
>>>> <class 'dict'>

The items() method combines both into one. When used in a loop, items() returns the key-value pairs as tuples. The first element of this tuple is the key, while the second is the value. We can use destructuring to get these elements into properly informative variable names. The first variable key will take the key in the tuple, while val will get the mapped value.

for key, val in datasets.items():
    print(f'The {key} data set has {len(val["data"])} rows.')
>>> The beer data set has 2410 rows.
>>> The breweries data set has 558 rows.

The above loop tells us that the beer data set is much bigger than the brewery data set. We would expect breweries to sell multiple types of beers, so there should be more beers than breweries overall. Our loop confirms this thought.

Currently, each of the data rows are a list, so referencing these elements by number is undesirable. Instead, we'll turn each of the data rows into its own dictionary, with the column name mapped to its actual value. This would make analyzing the data easier in the long run. We should do this operation on both data sets, so we'll leverage our looping techniques.

# Perform this operation for both beers and breweries data setsfor k, v in datasets.items():
# Initialize a key-value pair to hold our reformatted data
v["data_as_dicts"] = []
# For every data row, create a new dictionary based on column names
for row in v["data"]:
    data_row_dict = dict(zip(v["columns"], row))
    v["data_as_dicts"].append(data_row_dict)

There's a lot going on above, so we'll slowly break it down.

  1. We loop through datasets to ensure we transform both of the beer and breweries data.
  2. Then, we create a new key called data_as_dicts mapped to an empty array which will hold our new dictionaries.
  3. Then we start iterating over all the data, contained in the data key. zip() is a function that takes two or more lists and makes tuples based off these lists.
  4. We take advantage of the zip() output and use dict() to create new data in our preferred form: column names mapped to their actual value.
  5. Finally, we append it to data_as_dicts list. The end result is better formatted data that is easier to read and come back to repeatedly.

We can look at the end result below.

# The first data row in the beers data set
datasets["beer"]["data_as_dicts"][0]
>>> {'': '0',
'abv': '0.05',
'brewery_id': '408',
'ibu': '',
'id': '1436',
'name': 'Pub Beer',
'ounces': '12.0',
'style': 'American Pale Lager'}
# The first data row in its original form
datasets["beer"]["raw_data"][0]
>>> ['0', '0.05', '408', '', '1436', 'Pub Beer', '12.0', 'American Pale Lager']

Section summary

In this section, we learned how to use dictionaries with the for loop. Using loops, we reformatted each data row into dictionaries for enhanced readability. Our future selves will thank us later when we look back at the code we've written.

We're now set up to perform our final operation: matching all the beers to their respective breweries. Each of the beers has a brewery that it originates from, given by the brewery_id key in both data sets. We will create a whole new data set that matches all the beers to their brewery.

We could use loops to accomplish this, but we have access to an advanced Python dictionary operation that could turn this data transformation from a multi-line loop to a single line of code.

Dictionary comprehensions

Each beer in the beers data set was associated with a brewery_id, which is linked to a single brewery in breweries. Using this ID, we can pair up all of the beers with their brewery.

It's generally a better idea to transform the raw data and place it in a new variable rather than alter the raw data itself. Thus, we'll create another dictionary within datasets to hold our pairing. In this new dictionary, the brewery name itself is the key, the mapped value will be a list containing the names of all of the beers the brewery offers, and we will match them based on the brewery_id data element.

We can perform this matching just fine with the looping techniques we learned previously, but there still remains one last dictionary aspect to teach. Instead of a loop, we can perform the matching succinctly using dictionary comprehension.

A "comprehension" in computer science terms means to perform some task or function on all items of a collection (like a list). A dictionary comprehension is similar to a list comprehension in terms of syntax, but instead creates dictionaries from a base list.

If you need a refresher on list comprehensions, you can check out this tutorial here. To give a quick example, we'll use a dictionary comprehension to create a dictionary from a list of numbers.

nums = [1, 2, 3, 4, 5]
dict_comprehension = {
str(n) : "The corresponding key is" + str(n) for n in nums}
for val in dict_comprehension.values():
    print(val)
>>> The corresponding key is 1
The corresponding key is 2
The corresponding key is 3
The corresponding key is 4
The corresponding key is 5

We will dissect the dictionary comprehension code below:

Dictionary Anatomy

To create a dictionary comprehension, we wrap 3 elements around an opening and closing bracket:

  1. A base list
  2. What the key should be for each item from the base list
  3. What the value should be for each item from the base list

nums forms the base list that the key-value pairs of dict_comprehension are based off of. The keys are stringified versions of each number (to differentiate it from list indexing), while the values are a string describing what the key is.

This pet example is useless by itself, but serves to illustrate the somewhat complicated syntax of a dictionary comprehension. Now that we know how a dictionary comprehension is composed, we will see its real utility when we apply it to our beer and breweries data set. We only need a two aspects of the breweries data set to perform the matching:

  1. The brewery name
  2. The brewery ID

To start off, we'll create a list of tuples containing the name and ID for each brewery. Thanks to the reformatted data in the data_as_dicts key, this code is easy to write in a list comprehension.

# This list comprehension captures all of the brewery IDs and names from store
brewery_id_name_pairs = [
(row["brewery_id"], row["name"]) for row in datasets["breweries"]["data_as_dicts"]]

brewery_id_name_pairs is now a list of tuples and will form the base list of the dictionary comprehension. With this base list, we will use to the name of the brewery name as our key and a list comprehension as the value.

brewery_to_beers = {
pair[1] : [b["name"] for b in datasets["beer"]["data_as_dicts"] if b["brewery_id"] == pair[0]] for pair in brewery_id_name_pairs
}

Before we discuss how this monster works, it's worth taking some time to see what the actual result is.

# Confirming that a dictionary comprehension creates a dictionary
type(brewery_to_beers)
>>> <class 'dict'>
# Let's see what the Angry Orchard Cider Company (a personal favorite) makes
brewery_to_beers["Angry Orchard Cider Company"]
>>> ["Angry Orchard Apple Ginger", "Angry Orchard Crisp Apple", "Angry Orchard Crisp Apple"]

As we did with the simple example, we will highlight the crucial parts of this unwieldy (albeit interesting) dictionary comprehension.

Comprehension Anatomy 2

If we break apart the code and highlight the specific parts, the structure behind the code becomes more clear. The key is taken from the appropriate part of the brewery_id_name_pair. It is the mapped value that takes up most of the logic here. The value is a list comprehension with conditional logic.

In plain English, the list comprehension will store any beers from the beer data when the beer's associated brewery_id matches the current brewery in the iteration. Another illustration below lays out the code for the list comprehension by its purpose. List Comprehension Value

Since we based the dictionary comprehension off of a list of all the breweries, the end result is what we wanted: a new dictionary that maps brewery names to all the beers that it sells! Now, we can just consult brewery_to_beers when we arrive at a brewery and find out instantly what they have!

This section had some complicated code, but it's wholly within your grasp. If you're still having trouble, keep reviewing the syntax and try to make your own Python dictionary comprehensions. Before long, you'll have them in your coding arsenal.

We've covered a lot of ground on how to use dictionaries in this tutorial, but it's important to take a step back and look at why we might want to use (or not use) them.

Python Dictionary Advantages and Disadvantages

We've mentioned many times throughout that dictionaries increase the readability of our code. Being able to write out our own keys gives us flexibility and adds a layer of self-documentation. The less time it takes to understand what your code is doing, the easier it is to understand and debug and the faster you can implement your analyses.

Aside from the human-centered advantages, there are also speed advantages. Looking up a key in a Python dictionary is fast. Computer scientists can measure how long a computer task (i.e. looking up a key or running an algorithm) will take by seeing how many operations it will take to finish. They describe these times with Big-O notation. Some tasks are fast and are done in constant time while more hefty tasks may require an exponential amount of operations and are done in polynomial time.

In this case, looking up a key is done in constant time. Compare this to searching for the same item in a large list. The computer must look through each item in the list, so the time taken will scale with the length of the list. We call this linear time. If your list is exceptionally large, then looking for one item will take much longer than just assigning it to a key-value pair in a dictionary and looking for the key.

On a deeper level, a dictionary is an implementation of a hash table, an explanation of which is outside the scope of this article. What's important to know is that the benefits we dictionary are essentially the benefits of the hash table itself: speedy key look ups and membership checks.

We mentioned earlier that dictionaries are unordered, making them unsuitable data structures for where order matters, relative to other python structures, take up a lot more space, especially when you have a large amount of keys. Given how cheap memory is, this disadvantage doesn't usually make itself apparent, but it's good to know about the overhead produced by dictionaries.

We've only discussed vanilla Python dictionaries, but there are other implementations in Python that add additional functionality. I've included a link for further reading at the end. I hope that after reading this article, you will be more comfortable using dictionaries and finding a use for them in your own programming. Perhaps you have even found a beer you might want to try in the future!

Further Reading

We've covered the basics of dictionaries, but we didn't cover all the methods available to us.

Christian Pascual

About the author

Christian Pascual

Christian is a PhD student studying biostatistics in California. He enjoys making statistics and programming more accessible to a wider audience. Outside of school, he enjoys going to the gym, language learning, and woodworking.

Learn data skills for free

Headshot Headshot

Join 1M+ learners

Try free courses