November 1, 2022

How to Use Python Data Classes in 2022 (A Beginner’s Guide)

Dataclasses

In Python, a data class is a class that is designed to only hold data values. They aren’t different from regular classes, but they usually don’t have any other methods. They are typically used to store information that will be passed between different parts of a program or a system.

However, when creating classes to work only as data containers, writing the __init__ method repeatedly can generate a great amount of work and potential errors.

The dataclasses module, a feature introduced in Python 3.7, provides a way to create data classes in a simpler manner without the need to write methods. 

In this article, we’ll see how to take advantage of this module to quickly create new classes that already come not only with __init__, but several other methods already implemented so we don’t need to implement them manually. Also, we can do that with just a few lines of code.

We expect you to have some intermediate python experience, including an understanding of how to create classes and object-oriented programming in general.

Using the dataclasses Module

As a starting example, let’s say we’re implementing a class to store data about a certain group of people. For each person, we’ll have attributes such as name, age, height, and email address. This is what a regular class looks like:

class Person():
    def __init__(self, name, age, height, email):
        self.name = name
        self.age = age
        self.height = height
        self.email = email

If we use the dataclasses module, however, we need to import dataclass to use it as a decorator in the class we’re creating. When we do that, we no longer need to write the init function, only specify the attributes of the class and their types. Here’s the same Person class, implemented in this way:

from dataclasses import dataclass

@dataclass
class Person():
    name: str
    age: int
    height: float
    email: str

We can also set default values to the class attributes:

@dataclass
class Person():
    name: str = 'Joe'
    age: int = 30
    height: float = 1.85
    email: str = '[email protected]'

print(Person())
    Person(name='Joe', age=30, height=1.85, email='[email protected]')

As a reminder, Python doesn’t accept a non-default attribute after default in both class and functions, so this would throw an error:

@dataclass
class Person():
    name: str = 'Joe'
    age: int = 30
    height: float = 1.85
    email: str 

    ---------------------------------------------------------------------------

    TypeError                                 Traceback (most recent call last)

    ~\AppData\Local\Temp/ipykernel_5540/741473360.py in <module>
          1 @dataclass
    ----> 2 class Person():
          3     name: str = 'Joe'
          4     age: int = 30
          5     height: float = 1.85

    ~\anaconda3\lib\dataclasses.py in dataclass(cls, init, repr, eq, order, unsafe_hash, frozen)
       1019 
       1020     # We're called as @dataclass without parens.
    -> 1021     return wrap(cls)
       1022 
       1023 

    ~\anaconda3\lib\dataclasses.py in wrap(cls)
       1011 
       1012     def wrap(cls):
    -> 1013         return _process_class(cls, init, repr, eq, order, unsafe_hash, frozen)
       1014 
       1015     # See if we're being called as @dataclass or @dataclass().

    ~\anaconda3\lib\dataclasses.py in _process_class(cls, init, repr, eq, order, unsafe_hash, frozen)
        925                 if f._field_type in (_FIELD, _FIELD_INITVAR)]
        926         _set_new_attribute(cls, '__init__',
    --> 927                            _init_fn(flds,
        928                                     frozen,
        929                                     has_post_init,

    ~\anaconda3\lib\dataclasses.py in _init_fn(fields, frozen, has_post_init, self_name, globals)
        502                 seen_default = True
        503             elif seen_default:
    --> 504                 raise TypeError(f'non-default argument {f.name!r} '
        505                                 'follows default argument')
        506 

    TypeError: non-default argument 'email' follows default argument

Once the class is defined, it’s easy to instantiate a new object and access its attributes, just like with a standard class:

person = Person('Joe', 25, 1.85, '[email protected]')
print(person.name)
    Joe

So far we’ve used regular data types like string, integer, and float; we can also combine dataclass with the typing modules to create attributes of any kind in the class. For instance, let’s add a house_coordinates attribute to the Person:

from typing import Tuple

@dataclass
class Person():
    name: str
    age: int
    height: float
    email: str
    house_coordinates: Tuple

print(Person('Joe', 25, 1.85, '[email protected]', (40.748441, -73.985664)))
    Person(name='Joe', age=25, height=1.85, email='[email protected]', house_coordinates=(40.748441, -73.985664))

Following the same logic, we can create a data class to hold multiple instances of the Person class:

from typing import List

@dataclass
class People():
    people: List[Person]

Notice that the people attribute in the People class is defined as a list of instances of the Person class. For example, we could instantiate an object of People like this:

joe = Person('Joe', 25, 1.85, '[email protected]', (40.748441, -73.985664))
mary = Person('Mary', 43, 1.67, '[email protected]', (-73.985664, 40.748441))

print(People([joe, mary]))
    People(people=[Person(name='Joe', age=25, height=1.85, email='[email protected]', house_coordinates=(40.748441, -73.985664)), Person(name='Mary', age=43, height=1.67, email='[email protected]', house_coordinates=(-73.985664, 40.748441))])

This allows us to define the attribute as being any type we want, but also a combination of data types.

Representation and Comparisons

As we mentioned earlier, dataclass implements not only the __init__ method, but several others, including the __repr__ method. In a regular class, we use this method to display a representation of an object in the class.

For instance, we’d define the method as in the example below when we call the object:

class Person():
    def __init__(self, name, age, height, email):
        self.name = name
        self.age = age
        self.height = height
        self.email = email

    def __repr__(self):
        return (f'{self.__class__.__name__}(name={self.name}, age={self.age}, height={self.height}, email={self.email})')

person = Person('Joe', 25, 1.85, '[email protected]')
print(person)
    Person(name=Joe, age=25, height=1.85, [email protected])

When using dataclass, however, there’s no need to write any of that:

@dataclass
class Person():
    name: str
    age: int
    height: float
    email: str    

person = Person('Joe', 25, 1.85, '[email protected]')
print(person)
    Person(name='Joe', age=25, height=1.85, email='[email protected]')

Notice that without all that code, the output is equivalent to the one from the standard Python class.

We can always overwrite it if we want to customize the representation of our class:

@dataclass
class Person():
    name: str
    age: int
    height: float
    email: str

    def __repr__(self):
        return (f'''This is a {self.__class__.__name__} called {self.name}.''')

person = Person('Joe', 25, 1.85, '[email protected]')
print(person)
    This is a Person called Joe.

Notice that the output of the representation is customized.

When it comes to comparisons, the dataclasses module makes our lives easier. For example, we can directly compare two instances of a class just like this:

@dataclass
class Person():
    name: str = 'Joe'
    age: int = 30
    height: float = 1.85
    email: str = '[email protected]'

print(Person() == Person())
    True

Notice that we used default attributes to make the example shorter.

In this case, the comparison is valid because the dataclass creates behind the scenes an __eq__ method, which performs the comparison. Without the decorator, we’d have to create this method ourselves.

The same comparison would result in a different outcome if using a standard Python class, even though the classes are in fact equal to each other:

class Person():
    def __init__(self, name='Joe', age=30, height=1.85, email='[email protected]'):
        self.name = name
        self.age = age
        self.height = height
        self.email = email

print(Person() == Person())
    False

Without the use of the dataclass decorator, that class doesn’t test whether two instances are equal. So, by default, Python will use the object’s id to make the comparison, and, as we see below, they are different:

print(id(Person()))
print(id(Person()))
1734438049008
1734438050976

All this means that we’d have to write an __eq__ method that makes this comparison:

class Person():
    def __init__(self, name='Joe', age=30, height=1.85, email='[email protected]'):
        self.name = name
        self.age = age
        self.height = height
        self.email = email

    def __eq__(self, other):
        if isinstance(other, Person):
            return (self.name, self.age,
                    self.height, self.email) == (other.name, other.age,
                                                 other.height, other.email)
        return NotImplemented

print(Person() == Person())
    True

Now we see the two objects are equal to each other, but we had to write more code to get this result.

The @dataclass Parameters 

As we saw above, when using the dataclass decorator, the __init__, __repr__, and __eq__ methods are implemented for us. The creation of all these methods is set by the init, repr, and eq parameters of dataclass. These three parameters are True by default. If one of them is created inside the class, then the parameter is ignored.

However, we have other parameters of dataclass that we should look at before moving on:

  • order: enables sorting of the class as we’ll see in the next section. The default is False.
  • frozen: When True, the values inside the instance of the class can’t be modified after it’s created. The default is False.

There are a few other methods that you can check in the documentation.

Sorting

When working with data, we often need to sort values. In our scenario, we may want to sort our different people based on some attribute. For that, we’ll use the order parameter of the dataclass decorator mentioned above which enables sorting in the class:

@dataclass(order=True)
class Person():
    name: str
    age: int
    height: float
    email: str

When the order parameter is set to True, it automatically generates the __lt__ (less than), __le__ (less or equal), __gt__ (greater than), and __ge__ (greater or equal) methods used for sorting.

Let’s instantiate our joe and mary objects to see if one is greater than the other:

joe = Person('Joe', 25, 1.85, '[email protected]')
mary = Person('Mary', 43, 1.67, '[email protected]')

print(joe > mary)
    False

Python tells us that joe is not greater than mary, but based on what criteria? The class compares the objects as tuples containing their attributes, like this:

print(('Joe', 25, 1.85, '[email protected]') > ('Mary', 43, 1.67, '[email protected]'))
    False

As the letter "J" comes before "M", it says the joe < mary. If the names were the same, it would move to the next element in each tuple. As it is, it’s comparing the objects alphabetically. Although that can make some sense depending on the problem we’re dealing with, we want to be able to control how the objects will be sorted.

To achieve that, we’ll take advantage of two other features of the dataclasses module.

The first is the field function. This function is used to customize one attribute of a data class individually, which allows us to define new attributes that will depend on another attribute and will only be created after the object is instantiated.

In our sorting problem, we’ll use field to create a sort_index attribute in our class. This attribute can only be created after the object is instantiated and is what dataclasses uses for sorting:

from dataclasses import dataclass, field

@dataclass(order=True)
class Person():
    sort_index: int = field(init=False, repr=False)
    name: str
    age: int
    height: float
    email: str

The two arguments that we passed as False state that this attribute isn’t in the __init__ and that it shouldn’t be displayed when we call __repr__. There are other parameters in the field function that you can check in the documentation.

After we’ve referenced this new attribute, we’ll use the second new tool: the __post_int__ method. As it goes by the name, this method is executed right after the __init__ method. We’ll use __post_int__ to define the sort_index, right after the creation of the object. As an example, let’s say we want to compare people based on their age. Here’s how:

@dataclass(order=True)
class Person():
    sort_index: int = field(init=False, repr=False)
    name: str
    age: int
    height: float
    email: str

    def __post_init__(self):
        self.sort_index = self.age

If we make the same comparison, we know that Joe is younger than Mary:

joe = Person('Joe', 25, 1.85, '[email protected]')
mary = Person('Mary', 43, 1.67, '[email protected]')

print(joe > mary)
    False

If we wanted to sort people by height, we’d use this code:

@dataclass(order=True)
class Person():
    sort_index: float = field(init=False, repr=False)
    name: str
    age: int
    height: float
    email: str

    def __post_init__(self):
        self.sort_index = self.height

joe = Person('Joe', 25, 1.85, '[email protected]')
mary = Person('Mary', 43, 1.67, '[email protected]')

print(joe > mary)
    True

Joe is taller than Mary. Notice that we set sort_index as a float.

We were able to implement sorting in our data class without the need to write multiple methods.

Working with Immutable Data Classes

Another parameter of @dataclass that we mentioned above is frozen. When set to True, frozen doesn’t allow us to modify the attributes of an object after it’s created.

With frozen=False, we can easily perform such modification:

@dataclass()
class Person():
    name: str
    age: int
    height: float
    email: str

joe = Person('Joe', 25, 1.85, '[email protected]')

joe.age = 35
print(joe)
    Person(name='Joe', age=35, height=1.85, email='[email protected]')

We created a Person object and then modified the age attribute without any problems.

However, when set to True, any attempt to modify the object throws an error:

@dataclass(frozen=True)
class Person():
    name: str
    age: int
    height: float
    email: str

joe = Person('Joe', 25, 1.85, '[email protected]')

joe.age = 35
print(joe)
    ---------------------------------------------------------------------------

    FrozenInstanceError                       Traceback (most recent call last)

    ~\AppData\Local\Temp/ipykernel_5540/2036839054.py in <module>
          8 joe = Person('Joe', 25, 1.85, '[email protected]')
          9 
    ---> 10 joe.age = 35
         11 print(joe)

    <string> in __setattr__(self, name, value)

    FrozenInstanceError: cannot assign to field 'age'

Notice that the error message states FrozenInstanceError.

There’s a trick that can modify the value of the immutable data class . If our class contains a mutable attribute, this attribute can change even though the class is frozen. This may seem like it doesn’t make sense, but let’s look at an example.

Let’s recall the People class that we created earlier in this article, but now let’s make it immutable:

@dataclass(frozen=True)
class People():
    people: List[Person]

@dataclass(frozen=True)
class Person():
    name: str
    age: int
    height: float
    email: str

We then create two instances of the Person class and use them to create an instance of People that we’ll name two_people:

joe = Person('Joe', 25, 1.85, '[email protected]')
mary = Person('Mary', 43, 1.67, '[email protected]')

two_people = People([joe, mary])
print(two_people)
    People(people=[Person(name='Joe', age=25, height=1.85, email='[email protected]'), Person(name='Mary', age=43, height=1.67, email='[email protected]')])

The people attribute in the People class is a list. We can easily access the values in this list in the two_people object:

print(two_people.people[0])
    Person(name='Joe', age=25, height=1.85, email='[email protected]')

So, even though both Person and People classes are immutable, the list is not, which means we can change the values in it:

two_people.people[0] = Person('Joe', 35, 1.85, '[email protected]')
print(two_people.people[0])
    Person(name='Joe', age=35, height=1.85, email='[email protected]')

Notice that the age is now 35.

We didn’t change the attributes of any object of the immutable classes, but we replaced the first element of the list with a different one, and the list is mutable.

Keep in mind that all the attributes of the class should also be immutable in order to safely work with immutable data classes.

Inheritance with dataclasses

The dataclasses module also supports inheritance, which means we can create a data class that uses the attributes of another data class. Still using our Person class, we’ll create a new Employee class that inherits all the attributes from Person.
So we have Person:

@dataclass(order=True)
class Person():
    name: str
    age: int
    height: float
    email: str

And the new Employee class:

@dataclass(order=True)
class Employee(Person):
    salary: int
    departament: str

Now we can create an object of the Employee class using all the attributes of the Person class:

print(Employee('Joe', 25, 1.85, '[email protected]', 100000, 'Marketing'))
    Employee(name='Joe', age=25, height=1.85, email='[email protected]', salary=100000, departament='Marketing')

From now on we can use everything we saw in this article in the Employee class as well.

Take note of the default attributes. Let’s say we have default attributes in Person, but not in Employee. This scenario, as in the code below, raises an error:

@dataclass
class Person():
    name: str = 'Joe'
    age: int = 30
    height: float = 1.85
    email: str = '[email protected]'

@dataclass(order=True)
class Employee(Person):
    salary: int
    departament: str

print(Employee('Joe', 25, 1.85, '[email protected]', 100000, 'Marketing'))

    ---------------------------------------------------------------------------

    TypeError                                 Traceback (most recent call last)

    ~\AppData\Local\Temp/ipykernel_5540/1937366284.py in <module>
          9 
         10 @dataclass(order=True)
    ---> 11 class Employee(Person):
         12     salary: int
         13     departament: str

    ~\anaconda3\lib\dataclasses.py in wrap(cls)
       1011 
       1012     def wrap(cls):
    -> 1013         return _process_class(cls, init, repr, eq, order, unsafe_hash, frozen)
       1014 
       1015     # See if we're being called as @dataclass or @dataclass().

    ~\anaconda3\lib\dataclasses.py in _process_class(cls, init, repr, eq, order, unsafe_hash, frozen)
        925                 if f._field_type in (_FIELD, _FIELD_INITVAR)]
        926         _set_new_attribute(cls, '__init__',
    --> 927                            _init_fn(flds,
        928                                     frozen,
        929                                     has_post_init,

    ~\anaconda3\lib\dataclasses.py in _init_fn(fields, frozen, has_post_init, self_name, globals)
        502                 seen_default = True
        503             elif seen_default:
    --> 504                 raise TypeError(f'non-default argument {f.name!r} '
        505                                 'follows default argument')
        506 

    TypeError: non-default argument 'salary' follows default argument

If the base class has default attributes, all the attributes in the class derived from it must have default values too.

Conclusion

In this article, we saw how the dataclasses module is a very powerful tool to create data classes in a quick, intuitive way. Although we’ve seen a lot in this article, the module contains many more tools, and there’s always more to learn about it.

So far, we’ve learned how to:

  • Define a class using dataclasses

  • Use default attributes and their rules

  • Create a representation method

  • Compare data classes

  • Sort data classes

  • Use inheritance with data classes

  • Work with immutable data classes

Otávio Simões Silveira

About the author

Otávio Simões Silveira

Otávio is an economist and data scientist from Brazil. In his free time, he writes about Python and Data Science on the internet. You can find him at LinkedIn.

Learn data skills for free

Headshot Headshot

Join 1M+ learners

Try free courses