In the previous tutorial, we learned about OOP: classes, inheritance, and magic methods. Now let’s learn about dataclasses and Pydantic — modern ways to model data without writing boilerplate code.
Regular classes need a lot of repetitive code: __init__, __repr__, __eq__. Dataclasses generate all of this for you. Pydantic adds validation on top. By the end of this tutorial, you will know when to use each one and how they compare.
The Problem: Boilerplate
With a regular class, you write a lot of repeated code just to store three fields:
class User:
def __init__(self, name, email, active=True):
self.name = name
self.email = email
self.active = active
def __repr__(self):
return f"User(name={self.name!r}, email={self.email!r}, active={self.active!r})"
def __eq__(self, other):
if not isinstance(other, User):
return NotImplemented
return (self.name == other.name
and self.email == other.email
and self.active == other.active)
That is 14 lines for a class that just stores three fields. Every new field means updating __init__, __repr__, and __eq__. This is tedious and error-prone. Dataclasses solve this problem.
Basic Dataclass
Add the @dataclass decorator and define your fields with type hints:
from dataclasses import dataclass
@dataclass
class Point:
"""A 2D point with x and y coordinates."""
x: float
y: float
Python automatically generates __init__, __repr__, and __eq__ for you:
p1 = Point(3.0, 4.0)
p2 = Point(3.0, 4.0)
p3 = Point(1.0, 2.0)
print(p1) # Point(x=3.0, y=4.0)
print(p1 == p2) # True — auto-generated __eq__
print(p1 == p3) # False
print(p1.x) # 3.0
Two lines of code instead of fourteen. The @dataclass decorator reads the type-annotated fields and generates the boilerplate methods for you.
Default Values and field()
Add default values just like function parameters. Parameters with defaults must come after parameters without defaults:
@dataclass
class User:
"""A user with name, email, and optional settings."""
name: str
email: str
active: bool = True
tags: list[str] = field(default_factory=list)
def display_name(self) -> str:
"""Return formatted display name."""
status = "active" if self.active else "inactive"
return f"{self.name} ({status})"
Use field(default_factory=list) for mutable defaults. This creates a new list for each instance. This is critical — never use tags: list[str] = [] as a default. That shares the same list between all instances. This is the same mutable default gotcha from Tutorial #5.
u1 = User("Alex", "alex@example.com")
u2 = User("Sam", "sam@example.com")
u1.tags.append("python")
print(u1.tags) # ["python"]
print(u2.tags) # [] — independent list, not shared
The field() function gives you more control over each field. Some useful options:
from dataclasses import dataclass, field
@dataclass
class Config:
name: str
debug: bool = False
tags: list[str] = field(default_factory=list)
internal_id: int = field(init=False, repr=False) # Not in constructor or repr
| Option | Description |
|---|---|
default | Default value |
default_factory | Function that creates the default (for mutable types) |
init | Include in __init__ (default: True) |
repr | Include in __repr__ (default: True) |
compare | Include in __eq__ (default: True) |
Frozen Dataclasses (Immutable)
Use frozen=True to make instances immutable. Once created, you cannot change any field:
@dataclass(frozen=True)
class Color:
"""An immutable color with RGB values."""
red: int
green: int
blue: int
def hex_code(self) -> str:
"""Return the hex color code."""
return f"#{self.red:02x}{self.green:02x}{self.blue:02x}"
Frozen dataclasses cannot be changed after creation:
red = Color(255, 0, 0)
print(red.hex_code()) # #ff0000
red.red = 128 # AttributeError: cannot assign to field 'red'
Frozen dataclasses are also hashable, so you can use them as dictionary keys or in sets. This is because Python can guarantee they will never change:
colors = {Color(255, 0, 0): "red", Color(0, 255, 0): "green"}
print(colors[Color(255, 0, 0)]) # "red"
Use frozen dataclasses when your data should never change after creation — configuration values, coordinates, colors, and similar objects.
post_init: Run Code After Initialization
Use __post_init__ to calculate derived fields or run validation after __init__ completes:
@dataclass
class Rectangle:
"""A rectangle that calculates area after initialization."""
width: float
height: float
area: float = field(init=False) # Not a constructor parameter
def __post_init__(self) -> None:
"""Calculate area after __init__ runs."""
self.area = self.width * self.height
The init=False means area is not part of the constructor. It is calculated automatically:
rect = Rectangle(5.0, 3.0)
print(rect) # Rectangle(width=5.0, height=3.0, area=15.0)
print(rect.area) # 15.0
You can also use __post_init__ for validation:
@dataclass
class Product:
name: str
price: float
def __post_init__(self):
if self.price < 0:
raise ValueError(f"Price cannot be negative: {self.price}")
Slots: Better Memory Usage
Use slots=True (Python 3.10+) for better memory usage and faster attribute access:
@dataclass(slots=True)
class Product:
"""A product with slots for better memory usage."""
name: str
price: float
quantity: int = 0
def total_value(self) -> float:
"""Calculate total value of stock."""
return self.price * self.quantity
By default, Python stores instance attributes in a dictionary (__dict__). With slots, Python uses a more efficient fixed-size structure. This saves memory when you have thousands of instances.
The trade-off: you cannot add new attributes after creation:
p = Product("Laptop", 999.99, 5)
print(p.total_value()) # 4999.95
p.color = "silver" # AttributeError — slots prevent this
In benchmarks, slots can save 20-30% memory and provide slightly faster attribute access. Use them when you create many instances of the same class.
Ordering: Make Dataclasses Sortable
Use order=True to auto-generate comparison methods (__lt__, __le__, __gt__, __ge__):
@dataclass(order=True)
class Student:
"""A student that can be sorted by grade."""
sort_index: float = field(init=False, repr=False)
name: str
grade: float
def __post_init__(self) -> None:
"""Use grade for sorting."""
self.sort_index = self.grade
The sort_index field controls the sorting order. It is not part of the constructor or the repr. Dataclasses compare fields in order, so sort_index (the first field) determines the sort:
students = [
Student("Alex", 85.0),
Student("Sam", 92.0),
Student("Jordan", 78.0),
]
for s in sorted(students):
print(f" {s.name}: {s.grade}")
# Jordan: 78.0
# Alex: 85.0
# Sam: 92.0
Without the sort_index trick, the dataclass would sort by name first (alphabetically), which is not what we want.
Pydantic: Validated Data Models
Pydantic is a library that adds data validation to your models. It checks types and constraints when you create an instance. Install it with:
pip install pydantic
Basic Pydantic Model
from pydantic import BaseModel, Field, field_validator
from typing import Optional
class UserProfile(BaseModel):
"""A validated user profile using Pydantic."""
name: str
email: str
age: int = Field(ge=0, le=150, description="Age in years")
bio: Optional[str] = None
@field_validator("name")
@classmethod
def name_must_not_be_empty(cls, v: str) -> str:
if not v.strip():
raise ValueError("Name cannot be empty.")
return v.strip()
@field_validator("email")
@classmethod
def email_must_contain_at(cls, v: str) -> str:
if "@" not in v:
raise ValueError("Email must contain @.")
return v.lower()
Pydantic validates data when you create an instance:
# Valid data — works fine
profile = UserProfile(name="Alex", email="Alex@Example.com", age=25)
print(profile.email) # alex@example.com (lowered by validator)
print(profile.bio) # None (optional field)
# Invalid data — raises ValidationError
UserProfile(name="Alex", email="invalid", age=25) # No @ in email
UserProfile(name="Alex", email="a@b.com", age=-1) # Age below 0
UserProfile(name=" ", email="a@b.com", age=25) # Empty name
Each invalid call raises a ValidationError with a detailed error message. In a web application, you would catch this and return an error response to the user.
Key differences from dataclasses:
- Validation happens automatically — Pydantic checks types and constraints at creation time
Field(ge=0, le=150)— built-in constraints (ge = greater or equal, le = less or equal)@field_validator— custom validation logic for specific fields- Type coercion — Pydantic converts types when possible (the string
"25"becomes the integer25) - Optional fields default to
None
Nested Models
Pydantic supports nested models naturally:
class Address(BaseModel):
"""A street address."""
street: str
city: str
country: str = "Germany"
class Employee(BaseModel):
"""An employee with a nested address."""
name: str
role: str
address: Address
skills: list[str] = []
You can create nested models from dictionaries. Pydantic handles the conversion automatically:
data = {
"name": "Sam",
"role": "Developer",
"address": {"street": "Main St 1", "city": "Berlin"},
}
emp = Employee.model_validate(data)
print(emp.address.city) # Berlin
print(emp.address.country) # Germany (default)
print(emp.skills) # [] (default)
model_dump and model_validate
Convert between Pydantic models and dictionaries:
# Model to dict
data = profile.model_dump()
print(data)
# {'name': 'Alex', 'email': 'alex@example.com', 'age': 25, 'bio': None}
# Dict to model
restored = UserProfile.model_validate(data)
print(restored.name) # Alex
You can also exclude or include specific fields:
data = profile.model_dump(exclude={"bio"})
# {'name': 'Alex', 'email': 'alex@example.com', 'age': 25}
data = profile.model_dump(exclude_none=True)
# {'name': 'Alex', 'email': 'alex@example.com', 'age': 25}
These methods are essential for working with APIs, databases, and JSON files.
JSON Serialization
Pydantic can also convert to and from JSON strings:
json_string = profile.model_dump_json()
# '{"name":"Alex","email":"alex@example.com","age":25,"bio":null}'
profile = UserProfile.model_validate_json(json_string)
Dataclass vs Pydantic vs Regular Class
| Feature | Regular Class | Dataclass | Pydantic |
|---|---|---|---|
Auto __init__ | No | Yes | Yes |
Auto __repr__ | No | Yes | Yes |
Auto __eq__ | No | Yes | Yes |
| Type validation | No | No | Yes |
| Type coercion | No | No | Yes |
| Default values | Manual | Easy | Easy |
| Immutability | Manual | frozen=True | model_config |
| JSON serialization | Manual | Manual | Built-in |
| External dependency | No | No | Yes (pydantic) |
| Performance | Fast | Fast | Slower (validation overhead) |
Use this guide to choose:
- Regular class — when you need complex behavior, custom initialization logic, or inheritance hierarchies
- Dataclass — when you mainly store data with some methods, and you do not need validation. Great for internal data structures.
- Pydantic — when you need validation, type coercion, or work with external data (API requests, config files, database records, user input)
In practice, many Python projects use both: dataclasses for internal data structures and Pydantic for API boundaries and configuration.
Dataclass Inheritance
Dataclasses support inheritance. The child class gets all fields from the parent:
@dataclass
class Person:
name: str
age: int
@dataclass
class Employee(Person):
company: str
role: str = "Developer"
emp = Employee("Alex", 30, "Acme Corp")
print(emp) # Employee(name='Alex', age=30, company='Acme Corp', role='Developer')
The child class fields come after the parent class fields in the constructor. Be careful with defaults — if the parent has a field with a default, all child fields must also have defaults (same rule as function parameters).
Converting Between Dataclass and Dict
Dataclasses have a built-in function to convert to dictionaries:
from dataclasses import asdict, astuple
@dataclass
class Point:
x: float
y: float
p = Point(3.0, 4.0)
print(asdict(p)) # {'x': 3.0, 'y': 4.0}
print(astuple(p)) # (3.0, 4.0)
To convert back from a dict, unpack it:
data = {'x': 3.0, 'y': 4.0}
point = Point(**data)
Note: asdict creates a deep copy. Nested dataclasses are also converted to dicts. For Pydantic, use model_dump() and model_validate() instead.
Common Mistakes
Mutable Defaults Without field()
# BAD — shared list between all instances!
@dataclass
class Team:
members: list[str] = []
# GOOD — each instance gets its own list
@dataclass
class Team:
members: list[str] = field(default_factory=list)
This is the same mutable default gotcha from functions, but it causes an error in dataclasses. Python 3.x actually raises a ValueError if you try to use a mutable default without field().
Confusing Frozen and Regular Dataclasses
@dataclass(frozen=True)
class Config:
name: str
debug: bool
config = Config("app", True)
config.debug = False # AttributeError! Frozen means immutable.
If you need to “change” a frozen dataclass, create a new one:
from dataclasses import replace
new_config = replace(config, debug=False)
print(new_config) # Config(name='app', debug=False)
The replace() function creates a copy with the specified fields changed. The original is untouched.
Source Code
You can find the code for this tutorial on GitHub:
kemalcodes/python-tutorial — tutorial-10-dataclasses
Run the examples:
python src/py10_dataclasses.py
Run the tests:
python -m pytest tests/test_py10.py -v
What’s Next?
In the next tutorial, we will learn about error handling — try/except, custom exceptions, and the patterns that make your code more robust.
Related Articles
- Python Tutorial #9: OOP — classes, inheritance, magic methods
- Python Tutorial #5: Functions — def, args, kwargs, lambdas
- Python Cheat Sheet — quick reference for Python syntax