In the previous tutorial, we learned about OOP: classes, inheritance, and magic methods. Now let’s learn about dataclasses and Pydantic — modern ways to model data without writing boilerplate code.

Regular classes need a lot of repetitive code: __init__, __repr__, __eq__. Dataclasses generate all of this for you. Pydantic adds validation on top. By the end of this tutorial, you will know when to use each one and how they compare.

The Problem: Boilerplate

With a regular class, you write a lot of repeated code just to store three fields:

class User:
    def __init__(self, name, email, active=True):
        self.name = name
        self.email = email
        self.active = active

    def __repr__(self):
        return f"User(name={self.name!r}, email={self.email!r}, active={self.active!r})"

    def __eq__(self, other):
        if not isinstance(other, User):
            return NotImplemented
        return (self.name == other.name
                and self.email == other.email
                and self.active == other.active)

That is 14 lines for a class that just stores three fields. Every new field means updating __init__, __repr__, and __eq__. This is tedious and error-prone. Dataclasses solve this problem.

Basic Dataclass

Add the @dataclass decorator and define your fields with type hints:

from dataclasses import dataclass

@dataclass
class Point:
    """A 2D point with x and y coordinates."""
    x: float
    y: float

Python automatically generates __init__, __repr__, and __eq__ for you:

p1 = Point(3.0, 4.0)
p2 = Point(3.0, 4.0)
p3 = Point(1.0, 2.0)

print(p1)         # Point(x=3.0, y=4.0)
print(p1 == p2)   # True — auto-generated __eq__
print(p1 == p3)   # False
print(p1.x)       # 3.0

Two lines of code instead of fourteen. The @dataclass decorator reads the type-annotated fields and generates the boilerplate methods for you.

Default Values and field()

Add default values just like function parameters. Parameters with defaults must come after parameters without defaults:

@dataclass
class User:
    """A user with name, email, and optional settings."""
    name: str
    email: str
    active: bool = True
    tags: list[str] = field(default_factory=list)

    def display_name(self) -> str:
        """Return formatted display name."""
        status = "active" if self.active else "inactive"
        return f"{self.name} ({status})"

Use field(default_factory=list) for mutable defaults. This creates a new list for each instance. This is critical — never use tags: list[str] = [] as a default. That shares the same list between all instances. This is the same mutable default gotcha from Tutorial #5.

u1 = User("Alex", "alex@example.com")
u2 = User("Sam", "sam@example.com")

u1.tags.append("python")
print(u1.tags)  # ["python"]
print(u2.tags)  # [] — independent list, not shared

The field() function gives you more control over each field. Some useful options:

from dataclasses import dataclass, field

@dataclass
class Config:
    name: str
    debug: bool = False
    tags: list[str] = field(default_factory=list)
    internal_id: int = field(init=False, repr=False)  # Not in constructor or repr
OptionDescription
defaultDefault value
default_factoryFunction that creates the default (for mutable types)
initInclude in __init__ (default: True)
reprInclude in __repr__ (default: True)
compareInclude in __eq__ (default: True)

Frozen Dataclasses (Immutable)

Use frozen=True to make instances immutable. Once created, you cannot change any field:

@dataclass(frozen=True)
class Color:
    """An immutable color with RGB values."""
    red: int
    green: int
    blue: int

    def hex_code(self) -> str:
        """Return the hex color code."""
        return f"#{self.red:02x}{self.green:02x}{self.blue:02x}"

Frozen dataclasses cannot be changed after creation:

red = Color(255, 0, 0)
print(red.hex_code())  # #ff0000

red.red = 128  # AttributeError: cannot assign to field 'red'

Frozen dataclasses are also hashable, so you can use them as dictionary keys or in sets. This is because Python can guarantee they will never change:

colors = {Color(255, 0, 0): "red", Color(0, 255, 0): "green"}
print(colors[Color(255, 0, 0)])  # "red"

Use frozen dataclasses when your data should never change after creation — configuration values, coordinates, colors, and similar objects.

post_init: Run Code After Initialization

Use __post_init__ to calculate derived fields or run validation after __init__ completes:

@dataclass
class Rectangle:
    """A rectangle that calculates area after initialization."""
    width: float
    height: float
    area: float = field(init=False)  # Not a constructor parameter

    def __post_init__(self) -> None:
        """Calculate area after __init__ runs."""
        self.area = self.width * self.height

The init=False means area is not part of the constructor. It is calculated automatically:

rect = Rectangle(5.0, 3.0)
print(rect)       # Rectangle(width=5.0, height=3.0, area=15.0)
print(rect.area)  # 15.0

You can also use __post_init__ for validation:

@dataclass
class Product:
    name: str
    price: float

    def __post_init__(self):
        if self.price < 0:
            raise ValueError(f"Price cannot be negative: {self.price}")

Slots: Better Memory Usage

Use slots=True (Python 3.10+) for better memory usage and faster attribute access:

@dataclass(slots=True)
class Product:
    """A product with slots for better memory usage."""
    name: str
    price: float
    quantity: int = 0

    def total_value(self) -> float:
        """Calculate total value of stock."""
        return self.price * self.quantity

By default, Python stores instance attributes in a dictionary (__dict__). With slots, Python uses a more efficient fixed-size structure. This saves memory when you have thousands of instances.

The trade-off: you cannot add new attributes after creation:

p = Product("Laptop", 999.99, 5)
print(p.total_value())  # 4999.95

p.color = "silver"  # AttributeError — slots prevent this

In benchmarks, slots can save 20-30% memory and provide slightly faster attribute access. Use them when you create many instances of the same class.

Ordering: Make Dataclasses Sortable

Use order=True to auto-generate comparison methods (__lt__, __le__, __gt__, __ge__):

@dataclass(order=True)
class Student:
    """A student that can be sorted by grade."""
    sort_index: float = field(init=False, repr=False)
    name: str
    grade: float

    def __post_init__(self) -> None:
        """Use grade for sorting."""
        self.sort_index = self.grade

The sort_index field controls the sorting order. It is not part of the constructor or the repr. Dataclasses compare fields in order, so sort_index (the first field) determines the sort:

students = [
    Student("Alex", 85.0),
    Student("Sam", 92.0),
    Student("Jordan", 78.0),
]
for s in sorted(students):
    print(f"  {s.name}: {s.grade}")
# Jordan: 78.0
# Alex: 85.0
# Sam: 92.0

Without the sort_index trick, the dataclass would sort by name first (alphabetically), which is not what we want.

Pydantic: Validated Data Models

Pydantic is a library that adds data validation to your models. It checks types and constraints when you create an instance. Install it with:

pip install pydantic

Basic Pydantic Model

from pydantic import BaseModel, Field, field_validator
from typing import Optional

class UserProfile(BaseModel):
    """A validated user profile using Pydantic."""
    name: str
    email: str
    age: int = Field(ge=0, le=150, description="Age in years")
    bio: Optional[str] = None

    @field_validator("name")
    @classmethod
    def name_must_not_be_empty(cls, v: str) -> str:
        if not v.strip():
            raise ValueError("Name cannot be empty.")
        return v.strip()

    @field_validator("email")
    @classmethod
    def email_must_contain_at(cls, v: str) -> str:
        if "@" not in v:
            raise ValueError("Email must contain @.")
        return v.lower()

Pydantic validates data when you create an instance:

# Valid data — works fine
profile = UserProfile(name="Alex", email="Alex@Example.com", age=25)
print(profile.email)  # alex@example.com (lowered by validator)
print(profile.bio)    # None (optional field)

# Invalid data — raises ValidationError
UserProfile(name="Alex", email="invalid", age=25)   # No @ in email
UserProfile(name="Alex", email="a@b.com", age=-1)   # Age below 0
UserProfile(name="   ", email="a@b.com", age=25)    # Empty name

Each invalid call raises a ValidationError with a detailed error message. In a web application, you would catch this and return an error response to the user.

Key differences from dataclasses:

  • Validation happens automatically — Pydantic checks types and constraints at creation time
  • Field(ge=0, le=150) — built-in constraints (ge = greater or equal, le = less or equal)
  • @field_validator — custom validation logic for specific fields
  • Type coercion — Pydantic converts types when possible (the string "25" becomes the integer 25)
  • Optional fields default to None

Nested Models

Pydantic supports nested models naturally:

class Address(BaseModel):
    """A street address."""
    street: str
    city: str
    country: str = "Germany"

class Employee(BaseModel):
    """An employee with a nested address."""
    name: str
    role: str
    address: Address
    skills: list[str] = []

You can create nested models from dictionaries. Pydantic handles the conversion automatically:

data = {
    "name": "Sam",
    "role": "Developer",
    "address": {"street": "Main St 1", "city": "Berlin"},
}
emp = Employee.model_validate(data)
print(emp.address.city)     # Berlin
print(emp.address.country)  # Germany (default)
print(emp.skills)           # [] (default)

model_dump and model_validate

Convert between Pydantic models and dictionaries:

# Model to dict
data = profile.model_dump()
print(data)
# {'name': 'Alex', 'email': 'alex@example.com', 'age': 25, 'bio': None}

# Dict to model
restored = UserProfile.model_validate(data)
print(restored.name)  # Alex

You can also exclude or include specific fields:

data = profile.model_dump(exclude={"bio"})
# {'name': 'Alex', 'email': 'alex@example.com', 'age': 25}

data = profile.model_dump(exclude_none=True)
# {'name': 'Alex', 'email': 'alex@example.com', 'age': 25}

These methods are essential for working with APIs, databases, and JSON files.

JSON Serialization

Pydantic can also convert to and from JSON strings:

json_string = profile.model_dump_json()
# '{"name":"Alex","email":"alex@example.com","age":25,"bio":null}'

profile = UserProfile.model_validate_json(json_string)

Dataclass vs Pydantic vs Regular Class

FeatureRegular ClassDataclassPydantic
Auto __init__NoYesYes
Auto __repr__NoYesYes
Auto __eq__NoYesYes
Type validationNoNoYes
Type coercionNoNoYes
Default valuesManualEasyEasy
ImmutabilityManualfrozen=Truemodel_config
JSON serializationManualManualBuilt-in
External dependencyNoNoYes (pydantic)
PerformanceFastFastSlower (validation overhead)

Use this guide to choose:

  • Regular class — when you need complex behavior, custom initialization logic, or inheritance hierarchies
  • Dataclass — when you mainly store data with some methods, and you do not need validation. Great for internal data structures.
  • Pydantic — when you need validation, type coercion, or work with external data (API requests, config files, database records, user input)

In practice, many Python projects use both: dataclasses for internal data structures and Pydantic for API boundaries and configuration.

Dataclass Inheritance

Dataclasses support inheritance. The child class gets all fields from the parent:

@dataclass
class Person:
    name: str
    age: int

@dataclass
class Employee(Person):
    company: str
    role: str = "Developer"

emp = Employee("Alex", 30, "Acme Corp")
print(emp)  # Employee(name='Alex', age=30, company='Acme Corp', role='Developer')

The child class fields come after the parent class fields in the constructor. Be careful with defaults — if the parent has a field with a default, all child fields must also have defaults (same rule as function parameters).

Converting Between Dataclass and Dict

Dataclasses have a built-in function to convert to dictionaries:

from dataclasses import asdict, astuple

@dataclass
class Point:
    x: float
    y: float

p = Point(3.0, 4.0)
print(asdict(p))    # {'x': 3.0, 'y': 4.0}
print(astuple(p))   # (3.0, 4.0)

To convert back from a dict, unpack it:

data = {'x': 3.0, 'y': 4.0}
point = Point(**data)

Note: asdict creates a deep copy. Nested dataclasses are also converted to dicts. For Pydantic, use model_dump() and model_validate() instead.

Common Mistakes

Mutable Defaults Without field()

# BAD — shared list between all instances!
@dataclass
class Team:
    members: list[str] = []

# GOOD — each instance gets its own list
@dataclass
class Team:
    members: list[str] = field(default_factory=list)

This is the same mutable default gotcha from functions, but it causes an error in dataclasses. Python 3.x actually raises a ValueError if you try to use a mutable default without field().

Confusing Frozen and Regular Dataclasses

@dataclass(frozen=True)
class Config:
    name: str
    debug: bool

config = Config("app", True)
config.debug = False  # AttributeError! Frozen means immutable.

If you need to “change” a frozen dataclass, create a new one:

from dataclasses import replace

new_config = replace(config, debug=False)
print(new_config)  # Config(name='app', debug=False)

The replace() function creates a copy with the specified fields changed. The original is untouched.

Source Code

You can find the code for this tutorial on GitHub:

kemalcodes/python-tutorial — tutorial-10-dataclasses

Run the examples:

python src/py10_dataclasses.py

Run the tests:

python -m pytest tests/test_py10.py -v

What’s Next?

In the next tutorial, we will learn about error handling — try/except, custom exceptions, and the patterns that make your code more robust.