In the previous tutorial, we learned about functions. Now let’s learn about Python’s built-in data structures: lists, dictionaries, sets, and tuples.

These are the tools you use every day in Python. By the end of this tutorial, you will know how to store, access, and transform collections of data.

Lists

A list is an ordered, mutable collection. You can add, remove, and change items.

fruits = ["apple", "banana", "cherry"]
print(fruits[0])     # apple — first item
print(fruits[-1])    # cherry — last item
print(len(fruits))   # 3

Adding Items

fruits.append("date")        # Add to end: ["apple", "banana", "cherry", "date"]
fruits.insert(1, "avocado")  # Insert at index 1: ["apple", "avocado", "banana", ...]
fruits.extend(["fig", "grape"])  # Add multiple items to end

Removing Items

fruits.remove("banana")  # Remove by value (first occurrence)
last = fruits.pop()      # Remove and return last item
item = fruits.pop(0)     # Remove and return item at index 0

List Slicing

Slicing creates a new list from part of an existing list. The syntax is list[start:end:step]:

numbers = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

print(numbers[:3])     # [0, 1, 2] — first three
print(numbers[-3:])    # [7, 8, 9] — last three
print(numbers[3:7])    # [3, 4, 5, 6] — from index 3 to 6
print(numbers[::2])    # [0, 2, 4, 6, 8] — every other item
print(numbers[::-1])   # [9, 8, 7, ..., 0] — reversed

Remember: the end index is not included. numbers[3:7] gives items at index 3, 4, 5, 6.

Sorting

numbers = [3, 1, 4, 1, 5, 9, 2, 6]

# sorted() returns a NEW list (original unchanged)
print(sorted(numbers))              # [1, 1, 2, 3, 4, 5, 6, 9]
print(sorted(numbers, reverse=True))  # [9, 6, 5, 4, 3, 2, 1, 1]

# .sort() modifies the list IN PLACE
numbers.sort()
print(numbers)  # [1, 1, 2, 3, 4, 5, 6, 9]

Sort by a key function:

names = ["Sam", "Alex", "Jordan", "Kim"]
print(sorted(names, key=len))  # ["Sam", "Kim", "Alex", "Jordan"]

List Comprehensions

List comprehensions create new lists in a single line:

# Squares of 0 to 5
squares = [x ** 2 for x in range(6)]
# [0, 1, 4, 9, 16, 25]

# Only even numbers
evens = [x for x in range(10) if x % 2 == 0]
# [0, 2, 4, 6, 8]

# Transform strings
names = ["alex", "sam", "jordan"]
upper = [name.upper() for name in names]
# ["ALEX", "SAM", "JORDAN"]

List comprehensions replace simple for loops that build a list. They are more readable and more Pythonic.

Flattening Nested Lists

nested = [[1, 2], [3, 4], [5, 6]]
flat = [item for sublist in nested for item in sublist]
# [1, 2, 3, 4, 5, 6]

Read this as: “for each sublist in nested, for each item in sublist, take the item.”

Tuples

A tuple is like a list but immutable. You cannot change it after creation:

point = (3, 4)
print(point[0])  # 3
print(point[1])  # 4

# point[0] = 5   # ERROR: tuples are immutable

When to Use Tuples

Use tuples when:

  • The data should not change (coordinates, RGB colors, database rows)
  • You need a hashable type (dict keys, set elements)
  • You want to return multiple values from a function

Tuple Unpacking

This is one of Python’s most useful features:

# Basic unpacking
point = (3, 4)
x, y = point
print(f"x={x}, y={y}")  # x=3, y=4

# Swap values (no temp variable needed!)
a, b = 1, 2
a, b = b, a
print(f"a={a}, b={b}")  # a=2, b=1

# Extended unpacking with *
first, *middle, last = [1, 2, 3, 4, 5]
print(f"first={first}, middle={middle}, last={last}")
# first=1, middle=[2, 3, 4], last=5

The * operator captures the remaining items into a list.

Dictionaries

A dictionary stores key-value pairs. Keys must be unique and immutable (strings, numbers, tuples).

user = {"name": "Alex", "age": 25, "city": "Berlin"}

Accessing Values

print(user["name"])          # "Alex"
print(user.get("email"))     # None (key does not exist)
print(user.get("email", "N/A"))  # "N/A" (custom default)

Use .get() when you are not sure if a key exists. Using [] on a missing key raises a KeyError.

Adding and Updating

user["email"] = "alex@example.com"  # Add new key
user["age"] = 26                     # Update existing key

Removing

removed = user.pop("city")       # Remove and return value
del user["email"]                # Remove without returning

Iterating

# Over keys
for key in user:
    print(key)

# Over values
for value in user.values():
    print(value)

# Over key-value pairs (most useful)
for key, value in user.items():
    print(f"{key}: {value}")

Dictionary Comprehensions

# Map words to their lengths
words = ["hi", "python", "go"]
lengths = {word: len(word) for word in words}
# {"hi": 2, "python": 6, "go": 2}

# Invert a dictionary (swap keys and values)
original = {"a": 1, "b": 2, "c": 3}
inverted = {v: k for k, v in original.items()}
# {1: "a", 2: "b", 3: "c"}

Sets

A set is an unordered collection of unique elements:

colors = {"red", "green", "blue"}
colors.add("yellow")     # Add an element
colors.discard("red")    # Remove an element (no error if missing)

Set Operations

Sets support mathematical operations:

a = {1, 2, 3, 4}
b = {3, 4, 5, 6}

print(a | b)   # Union: {1, 2, 3, 4, 5, 6}
print(a & b)   # Intersection: {3, 4}
print(a - b)   # Difference: {1, 2}
print(a ^ b)   # Symmetric difference: {1, 2, 5, 6}

Removing Duplicates

The easiest way to remove duplicates from a list:

numbers = [1, 2, 2, 3, 1, 4, 3]
unique = list(set(numbers))
# [1, 2, 3, 4] — order is NOT preserved!

If you need to preserve order:

def unique_items(items):
    seen = set()
    result = []
    for item in items:
        if item not in seen:
            seen.add(item)
            result.append(item)
    return result

print(unique_items([1, 2, 2, 3, 1, 4, 3]))
# [1, 2, 3, 4] — order preserved

Counter and defaultdict

The collections module has two very useful data structures.

Counter

Counter counts how many times each element appears:

from collections import Counter

text = "the cat sat on the mat the cat"
words = text.split()
counter = Counter(words)

print(counter)              # Counter({'the': 3, 'cat': 2, 'sat': 1, 'on': 1, 'mat': 1})
print(counter.most_common(2))  # [('the', 3), ('cat', 2)]

Counter is perfect for frequency analysis: counting words, characters, votes, or any repeated items.

defaultdict

defaultdict creates default values for missing keys:

from collections import defaultdict

# Group words by their length
words = ["hi", "go", "python", "rust", "ai", "code"]
groups = defaultdict(list)

for word in words:
    groups[len(word)].append(word)

print(dict(groups))
# {2: ['hi', 'go', 'ai'], 6: ['python'], 4: ['rust', 'code']}

Without defaultdict, you would need to check if the key exists and create the list first. defaultdict handles that automatically.

Nested Data Structures

Real-world data is often nested. For example, a list of users where each user has a list of scores:

users = [
    {"name": "Alex", "scores": [90, 85, 92]},
    {"name": "Sam", "scores": [78, 88, 95]},
    {"name": "Jordan", "scores": [82, 91, 87]},
]

# Calculate average score for each user
for user in users:
    avg = sum(user["scores"]) / len(user["scores"])
    print(f"{user['name']}: {avg:.1f}")

# Output:
# Alex: 89.0
# Sam: 87.0
# Jordan: 86.7

When to Use Which?

StructureOrderedMutableDuplicatesUse Case
ListYesYesYesGeneral-purpose collection
TupleYesNoYesFixed data, dict keys, function returns
DictYes (3.7+)YesKeys: NoKey-value mappings
SetNoYesNoUnique items, membership testing

Practical Example: Student Gradebook

Let’s combine multiple data structures to build a simple gradebook:

from collections import defaultdict

# Students and their test scores
gradebook = {
    "Alex": [90, 85, 92, 88],
    "Sam": [78, 82, 95, 70],
    "Jordan": [65, 72, 68, 75],
    "Kim": [95, 98, 92, 97],
}

# Calculate stats for each student
for name, scores in gradebook.items():
    avg = sum(scores) / len(scores)
    highest = max(scores)
    lowest = min(scores)
    print(f"{name}: avg={avg:.1f}, high={highest}, low={lowest}")

# Grade distribution
def get_grade(avg: float) -> str:
    if avg >= 90: return "A"
    if avg >= 80: return "B"
    if avg >= 70: return "C"
    return "F"

# Group students by grade
grades = defaultdict(list)
for name, scores in gradebook.items():
    avg = sum(scores) / len(scores)
    grade = get_grade(avg)
    grades[grade].append(name)

print(f"\nGrade A: {grades['A']}")
print(f"Grade B: {grades['B']}")
print(f"Grade C: {grades['C']}")

Output:

Alex: avg=88.8, high=92, low=85
Sam: avg=81.2, high=95, low=70
Jordan: avg=70.0, high=75, low=65
Kim: avg=95.5, high=98, low=92

Grade A: ['Kim']
Grade B: ['Alex', 'Sam']
Grade C: ['Jordan']

This example uses dictionaries, lists, defaultdict, f-strings, and functions all together. Real-world Python code often looks like this: combining multiple data structures to solve a problem.

Checking Membership

Use in to check if an element exists in a collection:

# Lists — O(n) time
fruits = ["apple", "banana", "cherry"]
print("banana" in fruits)  # True

# Sets — O(1) time (much faster for large collections)
fruit_set = {"apple", "banana", "cherry"}
print("banana" in fruit_set)  # True

# Dicts — checks keys
user = {"name": "Alex", "age": 25}
print("name" in user)     # True
print("email" in user)    # False
print(25 in user.values())  # True — check values

For large collections, use sets for membership testing. Sets use hash tables, so lookups are instant regardless of size. This is one of the most important performance tips in Python: if you check membership frequently, convert your list to a set first.

Common Mistakes

Modifying a List While Iterating

# BAD: modifying the list you're looping over
numbers = [1, 2, 3, 4, 5]
for num in numbers:
    if num % 2 == 0:
        numbers.remove(num)  # Skips elements!

# GOOD: use a list comprehension to create a new list
numbers = [1, 2, 3, 4, 5]
numbers = [num for num in numbers if num % 2 != 0]

Using Mutable Objects as Dict Keys

# BAD: lists are mutable, so they cannot be dict keys
# d = {[1, 2]: "value"}  # TypeError!

# GOOD: use a tuple instead
d = {(1, 2): "value"}  # Works!

Summary

Here is a quick reference for Python’s data structures:

Lists — ordered, mutable, allow duplicates. Use for general collections.

items = [1, 2, 3]
items.append(4)
items[0] = 10
filtered = [x for x in items if x > 2]

Tuples — ordered, immutable, allow duplicates. Use for fixed data and function returns.

point = (3, 4)
x, y = point

Dictionaries — key-value pairs, ordered (3.7+), mutable. Use for mappings and lookups.

user = {"name": "Alex", "age": 25}
user.get("email", "N/A")
lengths = {w: len(w) for w in words}

Sets — unordered, unique elements. Use for membership testing and removing duplicates.

colors = {"red", "green", "blue"}
common = set_a & set_b

Counter — count occurrences. defaultdict — dict with automatic default values.

Source Code

You can find the code for this tutorial on GitHub:

kemalcodes/python-tutorial — tutorial-06-data-structures

Run the examples:

python src/py06_data_structures.py

Run the tests:

python -m pytest tests/test_py06.py -v

What’s Next?

In the next tutorial, we will take a deep dive into strings: advanced methods, formatting tricks, and regular expressions.