January 21, 2022

Tutorial: Everything You Need to Know About Python Sets

Python Set Tutorial

In this tutorial, we will explore Python sets in detail: what a Python set is, when and why to use it, how to create it, how to modify it, and what operations we can perform on Python sets.

What Is a Set in Python?

A set is a built-in Python data structure used to store a collection of unique items, potentially of mixed data types, in a single variable:

  • Unordered – the items of a set don’t have any defined order
  • Unindexed – we cannot access the items with [i] as with lists
  • Mutable – a set can be modified to integers or tuples
  • Iterable – we can loop over the items of a set

Note that while a Python set itself is mutable (we can remove items from it or add new ones), its items must be immutable data types, like integers, floats, tuples, strings.

The main applications of Python sets include the following:

  • Removing duplicates
  • Checking set membership
  • Performing mathematical set operations like union, intersection, difference, and symmetric difference

Creating a Python Set

We can create a Python set in two ways:

  1. By using the built-in set() function with an iterable object passed in (such as a list, tuple, or string)
  2. By placing all the items separated by a comma inside a pair of curly braces {}

In both cases, it’s important to remember that the future items of a Python set (i.e., individual elements of the iterable object or the items placed inside the curly braces) can be iterable themselves (e.g., tuples), but they cannot be a mutable type, such as a list, dictionary, or another set.

Let’s see how it all works:

# First way: using the set() function on an iterable object
set1 = set([1, 1, 1, 2, 2, 3])          # from a list
set2 = set(('a', 'a', 'b', 'b', 'c'))   # from a tuple
set3 = set('anaconda')                  # from a string

# Second way: using curly braces
set4 = {1, 1, 'anaconda', 'anaconda', 8.6, (1, 2, 3), None}

print('Set1:', set1)
print('Set2:', set2)
print('Set3:', set3)
print('Set4:', set4)

# Incorrect way: trying to create a set with mutable items (a list and a set)
set5 = {1, 1, 'anaconda', 'anaconda', 8.6, [1, 2, 3], {1, 2, 3}}
print('Set5:', set5)
Set1: {1, 2, 3}
Set2: {'a', 'c', 'b'}
Set3: {'a', 'c', 'n', 'o', 'd'}
Set4: {1, 8.6, (1, 2, 3), 'anaconda', None}
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-1-c58d89e9ff44> in <module>()
     13 
     14 # Incorrect way: trying to create a set with mutable items (a list and a set)
---> 15 set5 = {1, 1, 'anaconda', 'anaconda', 8.6, [1, 2, 3], {1, 2, 3}}
     16 print('Set5:', set5)

TypeError: unhashable type: 'list'

We can make the following observations here:

  • The duplicates have been removed from the sequence in each case.
  • The initial order of the items has changed.
  • set4 contains elements of different data types.
  • An attempt to create a Python set with mutable items (a list and a set) resulted in a TypeError.

A particular case occurs when we need to create an empty Python set. Since empty curly braces {} create an empty Python dictionary, we can’t use this approach to create an empty set in Python. Using the set() function is still valid in this case:

empty1 = {}
empty2 = set()

print(type(empty1))
print(type(empty2))

<class ‘dict’>
<class ‘set’>

Checking Set Membership

To check whether a certain item is present or not in a Python set, we use the operator keywords in or a combination of keywords not in:

myset = {1, 2, 3}
print(1 in myset)
print(1 not in myset)

True
False

Accessing Values in a Python Set

Since a Python set is unordered and unindexed, we cannot access its items by indexing or slicing. One way to do so is by looping through the set:

myset = {'a', 'b', 'c', 'd'}

for item in myset:
    print(item)

a
c
d
b

The order of the output values can differ from the succession shown in the original set.

Modifying a Python Set

Adding Items to a Python Set

We can add a single immutable item to a Python set using the add() method or several immutable items using the update() method. The latter takes tuples, lists, strings, or other sets of immutable items as its argument and then adds each single unique item from them (or each single unique character, in the case of strings) to the set:

# Initial set
myset = set()

# Adding a single immutable item
myset.add('a')
print(myset)

# Adding several items
myset.update({'b', 'c'})        # a set of immutable items
print(myset)
myset.update(['d', 'd', 'd'])   # a list of immutable items
print(myset)
myset.update(['e'], ['f'])      # several lists of immutable items
print(myset)
myset.update('fgh')             # a string
print(myset)
myset.update([[1, 2], [3, 4]])  # an attempt to add a list of mutable items (lists)
print(myset)
{'a'}
{'a', 'c', 'b'}
{'a', 'c', 'd', 'b'}
{'a', 'c', 'e', 'f', 'b', 'd'}
{'a', 'c', 'e', 'f', 'b', 'd', 'g', 'h'}
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-5-49f8fc8d30f3> in <module>()
     15 myset.update('fgh')             # a string
     16 print(myset)
---> 17 myset.update([[1, 2], [3, 4]])  # an attempt to add a list of mutable items (lists)
     18 print(myset)

TypeError: unhashable type: 'list'

Removing Items from a Python Set

To remove an item/items from a Python set, we can opt for one of four methods:

  1. discard() – removes a particular item or does nothing if that item is absent in the set
  2. remove() – removes a particular item or raises KeyError if that item is absent in the set
  3. pop() – removes and returns a random item or raises KeyError if the set is empty
  4. clear() – clears the set (removes all the items)

Let’s look at some examples:

# Initial set
myset = {1, 2, 3, 4}
print(myset)

# Removing a particular item using the discard() method
myset.discard(1)  # the item was present in the set
print(myset)
myset.discard(5)  # the item was absent in the set
print(myset)

# Removing a particular item using the remove() method
myset.remove(4)   # the item was present in the set
print(myset)
myset.remove(5)   # the item was absent in the set
print(myset)
{1, 2, 3, 4}
{2, 3, 4}
{2, 3, 4}
{2, 3}
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-6-fbd3ef668cda> in <module>()
     12 myset.remove(4)   # the item was present in the set
     13 print(myset)
---> 14 myset.remove(5)   # the item was absent in the set
     15 print(myset)

KeyError: 5
# Taking the set from the code above
myset = {2, 3}

# Removing and returning a random item
print(myset.pop())  # the removed and returned item
print(myset)        # the updated set

# Removing all the items
myset.clear()
print(myset)

# An attempt to remove and return a random item from an empty set
myset.pop()
print(myset)
2
{3}
set()
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-7-796b4efb6424> in <module>()
     11 
     12 # An attempt to remove and return a random item from an empty set
---> 13 myset.pop()
     14 print(myset)

KeyError: 'pop from an empty set'

Built-in Python Functions for Sets

Some of the built-in Python functions applicable to lists and other collection-like data structures can be useful also with Python sets for different purposes. Let’s consider the most helpful functions:

  • len() – returns the set size (the number of items in the set)
  • min() and max() – return the smallest/largest item in the set and are mostly used for the sets with numeric values
    Note: This becomes a bit more complicated in the case of tuples or strings as the set items. For the strings, the comparison follows a lexicographical principle (i.e., the ASCII values of the characters of two or more strings are compared from left to right). Instead, the tuples are compared by the items with the same index, also from left to right. Using the min() and max() functions on Python sets with the items of mixed data types raises TypeError.
  • sum() – returns the sum of all items of the set containing only numeric values
# A set with numeric items
myset = {5, 10, 15}
print('Set:', myset)
print('Size:', len(myset))
print('Min:', min(myset))
print('Max:', max(myset))
print('Sum:', sum(myset))
print('\n')

# A set with string items
myset = {'a', 'A', 'b', 'Bb'}
print('Set:', myset)
print('Min:', min(myset))
print('Max:', max(myset))
print('\n')

# A set with tuple items
myset = {(1, 2), (1, 0), (2, 3)}
print('Set:', myset)
print('Min:', min(myset))
print('Max:', max(myset))

Set: {10, 5, 15}
Size: 3
Min: 5
Max: 15
Sum: 30

Set: {‘Bb’, ‘a’, ‘b’, ‘A’}
Min: A
Max: b

Set: {(1, 2), (1, 0), (2, 3)}
Min: (1, 0)
Max: (2, 3)

Notice that in the set {'b', 'a', 'A', 'Bb'}, the minimum value is A, not a. This happens because, lexicographically, all capital letters are lower than all minuscule letters.

  • all() – returns True if all items of the set evaluate to True, or if the set is empty
  • any() – returns True if at least one item of the set evaluates to True (for an empty set, returns False)
    Note: The values that evaluate to True are those values that don’t evaluate to False. In the context of Python set items, the values that evaluate to False are 0, 0.0, '', False, None, and ().
print(all({1, 2}))
print(all({1, False}))
print(any({1, False}))
print(any({False, False}))

True
False
True
False

  • sorted() – returns a sorted list of items of the set
myset = {4, 2, 5, 1, 3}
print(sorted(myset))

myset = {'c', 'b', 'e', 'a', 'd'}
print(sorted(myset))

[1, 2, 3, 4, 5]
[‘a’, ‘b’, ‘c’, ‘d’, ‘e’]

Performing Mathematical Set Operations on Python Sets

This group of operations includes union, intersection, difference, and symmetric difference. Each of them can be carried out by operator or by method.

Let’s practice mathematical set operations on the following two Python sets:

a = {1, 2, 3, 4, 5}
b = {4, 5, 6, 7}

Set Union

The union of two (or more) Python sets returns a new set of all the unique items from both (all) sets. It can be performed using the | operator or the union() method:

print(a | b)
print(b | a)
print(a.union(b))
print(b.union(a))

{1, 2, 3, 4, 5, 6, 7}
{1, 2, 3, 4, 5, 6, 7}
{1, 2, 3, 4, 5, 6, 7}
{1, 2, 3, 4, 5, 6, 7}

As we can see, for the union operation, the order of sets doesn’t matter: we can write a | b or b | a with the identical result, and the same can be said about using the method.

The syntax for the union operation on more than two Python sets is the following: a | b | c or a.union(b, c).

Notice that the white space before and after the operator in the examples above (and also in all the upcoming examples) is added simply for readability.

Set Intersection

The intersection of two (or more) Python sets returns a new set of the items common to both (all) sets. It can be performed using the & operator or the intersection() method:

print(a & b)
print(b & a)
print(a.intersection(b))
print(b.intersection(a))

{4, 5}
{4, 5}
{4, 5}
{4, 5}

Again, in this case, the order of sets doesn’t matter: a & b or b & a will yield the same result, and the same is true when using the method.

The syntax for the intersection operation on more than two Python sets is the following: a & b & c or a.intersection(b, c).

Set Difference

The difference of two (or more) Python sets returns a new set containing all the items from the first (left) set that are absent in the second (right) set. In the case of more than two sets, the operation is performed from left to right. For this set operation, we can use the - operator or the difference() method:

print(a - b)
print(b - a)
print(a.difference(b))
print(b.difference(a))

{1, 2, 3}
{6, 7}
{1, 2, 3}
{6, 7}

Here the order of sets matters: a - b (or a.difference(b)) returns all the items that are in a but not in b, while b - a (or b.difference(a)) returns all the items that are in b but not in a.

The syntax for the difference operation on more than two Python sets is the following: a - b - c or a.difference(b, c). In such cases, we first compute a - b, then find the difference between the resulting set and the next one to the right, which is c, and so on.

Set Symmetric Difference

The symmetric difference of two Python sets returns a new set of items present in either the first or second set, but not both. In other words, the symmetric difference of two sets is the difference between the set union and set intersection, and this makes sense also for the symmetric difference of multiple sets. We can perform this operation using the ^ operator or the symmetric_difference() method:

print(a ^ b)
print(b ^ a)
print(a.symmetric_difference(b))
print(b.symmetric_difference(a))

{1, 2, 3, 6, 7}
{1, 2, 3, 6, 7}
{1, 2, 3, 6, 7}
{1, 2, 3, 6, 7}

For the symmetric difference operation, the order of sets doesn’t matter: a ^ b or b ^ a will yield the same result, and we can say the same about using the method.

The syntax for the symmetric difference operation on more than two Python sets is the following: a ^ b ^ c. However, this time, we cannot use the symmetric_difference() method since it takes exactly one argument and otherwise raises TypeError:

a = {1, 2, 3, 4, 5}
b = {4, 5, 6, 7}
c = {7, 8, 9}

a.symmetric_difference(b, c)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_6856/4105073859.py in <module>
      3 c = {7, 8, 9}
      4 
----> 5 a.symmetric_difference(b, c)

TypeError: set.symmetric_difference() takes exactly one argument (2 given)

Other Set Operations on Python Sets

There are several other useful methods and operators for working with two or more Python sets:

  • intersection_update() (or the &= operator) — rewrites the current set with the intersection of this set with another one (or multiple sets)
  • difference_update() (or the -= operator) — rewrites the current set with the difference of this set with another one (or multiple sets)
  • symmetric_difference_update() (or the ^= operator) — rewrites the current set with the symmetric difference of this set with another one (or multiple sets)
  • isdisjoint() (no corresponding operator) — returns True if two sets don’t have any items in common, meaning that the intersection of these sets is an empty set
  • issubset() (or the <= operator) — returns True if another set contains each item of the current set, including the case when both sets are identical — if we want to exclude the latter case, we can’t use this method; instead, we need to use the < (strictly smaller) operator
  • issuperset() (or the >= operator) — returns True if the current set contains each item of another set including the case when both sets are identical — if we want to exclude the latter case, we can’t use this method; instead, we need to use the > (strictly greater) operator

Conclusion

To conclude, let’s review what we’ve learned about Python sets in this tutorial:

  • The main characteristics of a Python set
  • The main applications of Python sets
  • Two ways of creating a Python set
  • How to create an empty Python set
  • How to check whether a certain item is present or not in a Python set
  • How to access values in a Python set
  • Two methods of adding new items to a Python set
  • Four methods of removing items from a Python set
  • Which built-in Python functions are applicable to Python sets
  • How to perform main mathematical set operations on two or more Python sets by method or by operator
  • What other operations we can perform on two or more Python sets

Now you should be familiar with all the nuances of creating, modifying, and using sets in Python.

Elena Kosourova

About the author

Elena Kosourova

Elena is a Petroleum Geologist and community manager at Dataquest. You can find her chatting online with data enthusiasts and writing Python tutorials.

Learn data skills for free

Headshot Headshot

Join 1M+ learners

Try free courses