Python Tutorial #13: Generators and Iterators — Lazy Data Processing

In the previous tutorial, we learned about file I/O. Now let’s learn about generators and iterators — tools for processing data lazily without loading everything into memory.

A generator produces values one at a time. It only calculates the next value when you ask for it. This is called lazy evaluation. Instead of creating a list of one million items in memory, a generator produces them one by one. By the end of this tutorial, you will know how to create generators, build data pipelines, and use itertools for efficient data processing.

Generator Functions

A generator function uses yield instead of return. When Python sees yield in a function, it turns that function into a generator:

def count_up(start: int, end: int):
    """Generate numbers from start to end (inclusive)."""
    current = start
    while current <= end:
        yield current
        current += 1

When you call a generator function, it does not run the code. It returns a generator object — a lazy iterator:

gen = count_up(1, 5)  # No code runs yet!
print(type(gen))      # <class 'generator'>

print(next(gen))  # 1 — runs until first yield
print(next(gen))  # 2 — continues from where it stopped
print(next(gen))  # 3

Each call to next() runs the generator until the next yield, produces the value, and then pauses. The generator remembers its position and local variables between calls.

You can also use a generator in a for loop, which calls next() automatically:

for num in count_up(1, 5):
    print(num, end=" ")
# Output: 1 2 3 4 5

The key difference from a regular function: a generator pauses at each yield and resumes when you ask for the next value. A regular function runs all the way through and returns one result.

How yield Works

Let me walk through what happens step by step:

def simple_gen():
    print("Before first yield")
    yield 1
    print("Before second yield")
    yield 2
    print("After last yield")

gen = simple_gen()        # Nothing prints yet
print(next(gen))          # "Before first yield" then 1
print(next(gen))          # "Before second yield" then 2
# next(gen) would print "After last yield" then raise StopIteration

When the generator function ends (no more yields), it raises StopIteration. The for loop catches this automatically and stops.

Fibonacci Generator

Generators are perfect for sequences where calculating all values upfront would be wasteful:

def fibonacci(limit: int):
    """Generate Fibonacci numbers up to a limit."""
    a, b = 0, 1
    while a <= limit:
        yield a
        a, b = b, a + b

for num in fibonacci(100):
    print(num, end=" ")
# Output: 0 1 1 2 3 5 8 13 21 34 55 89

The generator calculates each number on demand. It never stores the entire sequence in memory. If you asked for Fibonacci numbers up to one billion, the generator would still use the same tiny amount of memory.

Infinite Generators

Generators can produce values forever:

def infinite_counter(start: int = 0):
    """Generate numbers forever, starting from start."""
    current = start
    while True:
        yield current
        current += 1

Use next() to get values one at a time:

counter = infinite_counter(10)
print(next(counter))  # 10
print(next(counter))  # 11
print(next(counter))  # 12

You cannot use an infinite generator in a regular for loop without a break — it would run forever. Instead, combine it with itertools.islice or use a manual break:

for num in infinite_counter():
    if num > 4:
        break
    print(num, end=" ")
# Output: 0 1 2 3 4

Generator Expressions

Like list comprehensions, but with parentheses instead of brackets:

# List comprehension — creates entire list in memory
squares_list = [x * x for x in range(1_000_000)]  # ~8 MB of memory

# Generator expression — calculates one value at a time
squares_gen = (x * x for x in range(1_000_000))  # ~200 bytes of memory

Generator expressions are useful with functions like sum(), max(), and min() that consume all values:

def sum_of_squares(n: int) -> int:
    """Sum of squares from 1 to n using a generator expression."""
    return sum(x * x for x in range(1, n + 1))

print(sum_of_squares(10))  # 385

The sum() function processes each value from the generator and discards it immediately. No intermediate list is ever created.

When you pass a generator expression as the only argument to a function, you can omit the extra parentheses:

# Both are the same
sum((x * x for x in range(10)))
sum(x * x for x in range(10))  # Cleaner — no extra parentheses

Generator Pipelines

One of the most powerful uses of generators is chaining them into data processing pipelines. Each step processes one item at a time:

def read_values(data: list[str]):
    """Step 1: Yield each value, stripped of whitespace."""
    for item in data:
        yield item.strip()

def parse_ints(values):
    """Step 2: Parse to ints, skipping invalid values."""
    for value in values:
        try:
            yield int(value)
        except ValueError:
            continue

def filter_positive(numbers):
    """Step 3: Yield only positive numbers."""
    for num in numbers:
        if num > 0:
            yield num

def pipeline(data: list[str]) -> list[int]:
    """Process data through a generator pipeline."""
    values = read_values(data)
    numbers = parse_ints(values)
    positives = filter_positive(numbers)
    return list(positives)

data = ["  10  ", "abc", " -5 ", "  20  ", "xyz", "  30  "]
result = pipeline(data)
print(result)  # [10, 20, 30]

Each step processes one item at a time. No intermediate lists are created. The first item flows through all three steps before the second item even starts. This is very memory-efficient for large datasets like processing a multi-gigabyte log file.

Custom Iterators

An iterator is any object that implements __iter__ and __next__:

class Countdown:
    """An iterator that counts down from a number to 1."""

    def __init__(self, start: int) -> None:
        self.current = start

    def __iter__(self):
        return self

    def __next__(self) -> int:
        if self.current <= 0:
            raise StopIteration
        value = self.current
        self.current -= 1
        return value

for num in Countdown(5):
    print(num, end=" ")
# Output: 5 4 3 2 1

StopIteration tells the for loop to stop. Python raises this automatically when a generator function ends.

The problem with iterator classes: you can only iterate once. After the first loop, the iterator is exhausted:

c = Countdown(3)
print(list(c))  # [3, 2, 1]
print(list(c))  # [] — exhausted!

Reusable Iterables

To make a reusable iterable, return a new iterator from __iter__ each time:

class Range:
    """A reusable iterable that generates numbers in a range."""

    def __init__(self, start: int, end: int) -> None:
        self.start = start
        self.end = end

    def __iter__(self):
        current = self.start
        while current < self.end:
            yield current
            current += 1

r = Range(1, 4)
print(list(r))  # [1, 2, 3]
print(list(r))  # [1, 2, 3] — works again!

The trick: __iter__ is a generator function. Each call creates a new generator object, so you can iterate as many times as you want.

itertools: Powerful Iterator Utilities

The itertools module has tools for efficient iteration. These are all lazy — they produce values one at a time.

chain: Combine Multiple Iterables

import itertools

result = list(itertools.chain([1, 2], [3, 4], [5, 6]))
print(result)  # [1, 2, 3, 4, 5, 6]

Unlike [1, 2] + [3, 4] + [5, 6], chain does not create a new list. It yields items from each iterable in order.

islice: Take a Slice from Any Iterable

# Take the first 5 Fibonacci numbers
first_5 = list(itertools.islice(fibonacci(1000), 5))
print(first_5)  # [0, 1, 1, 2, 3]

This works with infinite generators too. islice stops after the requested number of items.

groupby: Group Consecutive Items

words = ["hi", "go", "hello", "world", "hey"]
sorted_words = sorted(words, key=len)

for length, group in itertools.groupby(sorted_words, key=len):
    print(f"Length {length}: {list(group)}")
# Length 2: ['hi', 'go']
# Length 3: ['hey']
# Length 5: ['hello', 'world']

Important: groupby groups consecutive items with the same key. The data must be sorted by the grouping key first. If you forget to sort, you get multiple groups for the same key.

batched: Split into Chunks (Python 3.12+)

chunks = list(itertools.batched(range(10), 3))
print(chunks)  # [(0, 1, 2), (3, 4, 5), (6, 7, 8), (9,)]

This is useful for processing data in batches, such as inserting rows into a database 100 at a time.

functools.partial

functools.partial creates a new function with some arguments pre-filled. It is not a generator, but it is commonly used with iterators and higher-order functions:

from functools import partial

def multiply(a: int, b: int) -> int:
    return a * b

double = partial(multiply, b=2)
triple = partial(multiply, b=3)

print(double(5))   # 10
print(triple(5))   # 15

This is useful with map(), filter(), and other functions that take a function as an argument.

Memory: List vs Generator

Generators use almost no memory compared to lists:

import sys

# List: stores all 1 million values in memory
numbers_list = [x * x for x in range(1_000_000)]
print(sys.getsizeof(numbers_list))  # ~8,000,000 bytes (8 MB)

# Generator: stores only the formula and its current state
numbers_gen = (x * x for x in range(1_000_000))
print(sys.getsizeof(numbers_gen))  # ~200 bytes

The list stores all 1 million values. The generator stores only the expression and its current position. For large datasets, this difference can be the difference between your program working and running out of memory.

When to Use Generators

Situation	Use
Process large files line by line	Generator
Build data pipelines	Generator
Infinite sequences	Generator
Lazy computation	Generator
Need random access (`items[5]`)	List
Need to iterate multiple times	List (or reusable iterable)
Need the length (`len(items)`)	List
Small dataset	Either works

The rule of thumb: if you only need to iterate through the data once and do not need to know the length or access items by index, use a generator.

Common Mistakes

Using a Generator Twice

Generators are consumed after one iteration. If you need the values again, convert to a list first:

gen = (x * x for x in range(5))

print(list(gen))  # [0, 1, 4, 9, 16]
print(list(gen))  # [] — empty! Generator is exhausted.

# Fix: convert to list first if you need to iterate multiple times
squares = list(x * x for x in range(5))
print(squares)  # [0, 1, 4, 9, 16] — always available
print(squares)  # [0, 1, 4, 9, 16] — still there

Calling len() on a Generator

Generators do not have a length. You cannot call len() on them:

gen = (x for x in range(10))
len(gen)  # TypeError: object of type 'generator' has no len()

If you need the length, use a list or count manually.

Forgetting that range() is Already Lazy

Python’s built-in range() is already lazy. You do not need to wrap it in a generator:

# Unnecessary — range is already lazy
gen = (x for x in range(1_000_000))

# Just use range directly
for x in range(1_000_000):
    process(x)

Real-World Example: Processing a CSV File

Here is a practical generator pipeline for processing a large CSV file:

def read_csv_lines(path):
    """Read a CSV file line by line."""
    with open(path, "r", encoding="utf-8") as f:
        next(f)  # Skip header
        for line in f:
            yield line.strip().split(",")

def filter_by_country(rows, country):
    """Filter rows by country column."""
    for row in rows:
        if row[2] == country:
            yield row

def extract_emails(rows):
    """Extract email column from rows."""
    for row in rows:
        yield row[1]

# Pipeline: read -> filter -> extract
rows = read_csv_lines("users.csv")
german_rows = filter_by_country(rows, "Germany")
emails = extract_emails(german_rows)

for email in emails:
    send_newsletter(email)

This processes millions of rows using almost no memory. Each row flows through the entire pipeline before the next row is read.

Source Code

You can find the code for this tutorial on GitHub:

kemalcodes/python-tutorial — tutorial-13-generators

Run the examples:

python src/py13_generators.py

Run the tests:

python -m pytest tests/test_py13.py -v

What’s Next?

In the next tutorial, we will learn about decorators — functions that modify other functions. They build on the generator and closure concepts we have covered so far.

Python Tutorial #12: File I/O — reading and writing files
Python Tutorial #5: Functions — closures and first-class functions
Python Cheat Sheet — quick reference for Python syntax

Generator Functions#

How yield Works#

Fibonacci Generator#

Infinite Generators#

Generator Expressions#

Generator Pipelines#

Custom Iterators#

Reusable Iterables#

itertools: Powerful Iterator Utilities#

chain: Combine Multiple Iterables#

islice: Take a Slice from Any Iterable#

groupby: Group Consecutive Items#

batched: Split into Chunks (Python 3.12+)#

functools.partial#

Memory: List vs Generator#

When to Use Generators#

Common Mistakes#

Using a Generator Twice#

Calling len() on a Generator#

Forgetting that range() is Already Lazy#

Real-World Example: Processing a CSV File#

Source Code#

What’s Next?#

Related Articles#