In the previous tutorial, we learned about file I/O. Now let’s learn about generators and iterators — tools for processing data lazily without loading everything into memory.
A generator produces values one at a time. It only calculates the next value when you ask for it. This is called lazy evaluation. Instead of creating a list of one million items in memory, a generator produces them one by one. By the end of this tutorial, you will know how to create generators, build data pipelines, and use itertools for efficient data processing.
Generator Functions
A generator function uses yield instead of return. When Python sees yield in a function, it turns that function into a generator:
def count_up(start: int, end: int):
"""Generate numbers from start to end (inclusive)."""
current = start
while current <= end:
yield current
current += 1
When you call a generator function, it does not run the code. It returns a generator object — a lazy iterator:
gen = count_up(1, 5) # No code runs yet!
print(type(gen)) # <class 'generator'>
print(next(gen)) # 1 — runs until first yield
print(next(gen)) # 2 — continues from where it stopped
print(next(gen)) # 3
Each call to next() runs the generator until the next yield, produces the value, and then pauses. The generator remembers its position and local variables between calls.
You can also use a generator in a for loop, which calls next() automatically:
for num in count_up(1, 5):
print(num, end=" ")
# Output: 1 2 3 4 5
The key difference from a regular function: a generator pauses at each yield and resumes when you ask for the next value. A regular function runs all the way through and returns one result.
How yield Works
Let me walk through what happens step by step:
def simple_gen():
print("Before first yield")
yield 1
print("Before second yield")
yield 2
print("After last yield")
gen = simple_gen() # Nothing prints yet
print(next(gen)) # "Before first yield" then 1
print(next(gen)) # "Before second yield" then 2
# next(gen) would print "After last yield" then raise StopIteration
When the generator function ends (no more yields), it raises StopIteration. The for loop catches this automatically and stops.
Fibonacci Generator
Generators are perfect for sequences where calculating all values upfront would be wasteful:
def fibonacci(limit: int):
"""Generate Fibonacci numbers up to a limit."""
a, b = 0, 1
while a <= limit:
yield a
a, b = b, a + b
for num in fibonacci(100):
print(num, end=" ")
# Output: 0 1 1 2 3 5 8 13 21 34 55 89
The generator calculates each number on demand. It never stores the entire sequence in memory. If you asked for Fibonacci numbers up to one billion, the generator would still use the same tiny amount of memory.
Infinite Generators
Generators can produce values forever:
def infinite_counter(start: int = 0):
"""Generate numbers forever, starting from start."""
current = start
while True:
yield current
current += 1
Use next() to get values one at a time:
counter = infinite_counter(10)
print(next(counter)) # 10
print(next(counter)) # 11
print(next(counter)) # 12
You cannot use an infinite generator in a regular for loop without a break — it would run forever. Instead, combine it with itertools.islice or use a manual break:
for num in infinite_counter():
if num > 4:
break
print(num, end=" ")
# Output: 0 1 2 3 4
Generator Expressions
Like list comprehensions, but with parentheses instead of brackets:
# List comprehension — creates entire list in memory
squares_list = [x * x for x in range(1_000_000)] # ~8 MB of memory
# Generator expression — calculates one value at a time
squares_gen = (x * x for x in range(1_000_000)) # ~200 bytes of memory
Generator expressions are useful with functions like sum(), max(), and min() that consume all values:
def sum_of_squares(n: int) -> int:
"""Sum of squares from 1 to n using a generator expression."""
return sum(x * x for x in range(1, n + 1))
print(sum_of_squares(10)) # 385
The sum() function processes each value from the generator and discards it immediately. No intermediate list is ever created.
When you pass a generator expression as the only argument to a function, you can omit the extra parentheses:
# Both are the same
sum((x * x for x in range(10)))
sum(x * x for x in range(10)) # Cleaner — no extra parentheses
Generator Pipelines
One of the most powerful uses of generators is chaining them into data processing pipelines. Each step processes one item at a time:
def read_values(data: list[str]):
"""Step 1: Yield each value, stripped of whitespace."""
for item in data:
yield item.strip()
def parse_ints(values):
"""Step 2: Parse to ints, skipping invalid values."""
for value in values:
try:
yield int(value)
except ValueError:
continue
def filter_positive(numbers):
"""Step 3: Yield only positive numbers."""
for num in numbers:
if num > 0:
yield num
def pipeline(data: list[str]) -> list[int]:
"""Process data through a generator pipeline."""
values = read_values(data)
numbers = parse_ints(values)
positives = filter_positive(numbers)
return list(positives)
data = [" 10 ", "abc", " -5 ", " 20 ", "xyz", " 30 "]
result = pipeline(data)
print(result) # [10, 20, 30]
Each step processes one item at a time. No intermediate lists are created. The first item flows through all three steps before the second item even starts. This is very memory-efficient for large datasets like processing a multi-gigabyte log file.
Custom Iterators
An iterator is any object that implements __iter__ and __next__:
class Countdown:
"""An iterator that counts down from a number to 1."""
def __init__(self, start: int) -> None:
self.current = start
def __iter__(self):
return self
def __next__(self) -> int:
if self.current <= 0:
raise StopIteration
value = self.current
self.current -= 1
return value
for num in Countdown(5):
print(num, end=" ")
# Output: 5 4 3 2 1
StopIteration tells the for loop to stop. Python raises this automatically when a generator function ends.
The problem with iterator classes: you can only iterate once. After the first loop, the iterator is exhausted:
c = Countdown(3)
print(list(c)) # [3, 2, 1]
print(list(c)) # [] — exhausted!
Reusable Iterables
To make a reusable iterable, return a new iterator from __iter__ each time:
class Range:
"""A reusable iterable that generates numbers in a range."""
def __init__(self, start: int, end: int) -> None:
self.start = start
self.end = end
def __iter__(self):
current = self.start
while current < self.end:
yield current
current += 1
r = Range(1, 4)
print(list(r)) # [1, 2, 3]
print(list(r)) # [1, 2, 3] — works again!
The trick: __iter__ is a generator function. Each call creates a new generator object, so you can iterate as many times as you want.
itertools: Powerful Iterator Utilities
The itertools module has tools for efficient iteration. These are all lazy — they produce values one at a time.
chain: Combine Multiple Iterables
import itertools
result = list(itertools.chain([1, 2], [3, 4], [5, 6]))
print(result) # [1, 2, 3, 4, 5, 6]
Unlike [1, 2] + [3, 4] + [5, 6], chain does not create a new list. It yields items from each iterable in order.
islice: Take a Slice from Any Iterable
# Take the first 5 Fibonacci numbers
first_5 = list(itertools.islice(fibonacci(1000), 5))
print(first_5) # [0, 1, 1, 2, 3]
This works with infinite generators too. islice stops after the requested number of items.
groupby: Group Consecutive Items
words = ["hi", "go", "hello", "world", "hey"]
sorted_words = sorted(words, key=len)
for length, group in itertools.groupby(sorted_words, key=len):
print(f"Length {length}: {list(group)}")
# Length 2: ['hi', 'go']
# Length 3: ['hey']
# Length 5: ['hello', 'world']
Important: groupby groups consecutive items with the same key. The data must be sorted by the grouping key first. If you forget to sort, you get multiple groups for the same key.
batched: Split into Chunks (Python 3.12+)
chunks = list(itertools.batched(range(10), 3))
print(chunks) # [(0, 1, 2), (3, 4, 5), (6, 7, 8), (9,)]
This is useful for processing data in batches, such as inserting rows into a database 100 at a time.
functools.partial
functools.partial creates a new function with some arguments pre-filled. It is not a generator, but it is commonly used with iterators and higher-order functions:
from functools import partial
def multiply(a: int, b: int) -> int:
return a * b
double = partial(multiply, b=2)
triple = partial(multiply, b=3)
print(double(5)) # 10
print(triple(5)) # 15
This is useful with map(), filter(), and other functions that take a function as an argument.
Memory: List vs Generator
Generators use almost no memory compared to lists:
import sys
# List: stores all 1 million values in memory
numbers_list = [x * x for x in range(1_000_000)]
print(sys.getsizeof(numbers_list)) # ~8,000,000 bytes (8 MB)
# Generator: stores only the formula and its current state
numbers_gen = (x * x for x in range(1_000_000))
print(sys.getsizeof(numbers_gen)) # ~200 bytes
The list stores all 1 million values. The generator stores only the expression and its current position. For large datasets, this difference can be the difference between your program working and running out of memory.
When to Use Generators
| Situation | Use |
|---|---|
| Process large files line by line | Generator |
| Build data pipelines | Generator |
| Infinite sequences | Generator |
| Lazy computation | Generator |
Need random access (items[5]) | List |
| Need to iterate multiple times | List (or reusable iterable) |
Need the length (len(items)) | List |
| Small dataset | Either works |
The rule of thumb: if you only need to iterate through the data once and do not need to know the length or access items by index, use a generator.
Common Mistakes
Using a Generator Twice
Generators are consumed after one iteration. If you need the values again, convert to a list first:
gen = (x * x for x in range(5))
print(list(gen)) # [0, 1, 4, 9, 16]
print(list(gen)) # [] — empty! Generator is exhausted.
# Fix: convert to list first if you need to iterate multiple times
squares = list(x * x for x in range(5))
print(squares) # [0, 1, 4, 9, 16] — always available
print(squares) # [0, 1, 4, 9, 16] — still there
Calling len() on a Generator
Generators do not have a length. You cannot call len() on them:
gen = (x for x in range(10))
len(gen) # TypeError: object of type 'generator' has no len()
If you need the length, use a list or count manually.
Forgetting that range() is Already Lazy
Python’s built-in range() is already lazy. You do not need to wrap it in a generator:
# Unnecessary — range is already lazy
gen = (x for x in range(1_000_000))
# Just use range directly
for x in range(1_000_000):
process(x)
Real-World Example: Processing a CSV File
Here is a practical generator pipeline for processing a large CSV file:
def read_csv_lines(path):
"""Read a CSV file line by line."""
with open(path, "r", encoding="utf-8") as f:
next(f) # Skip header
for line in f:
yield line.strip().split(",")
def filter_by_country(rows, country):
"""Filter rows by country column."""
for row in rows:
if row[2] == country:
yield row
def extract_emails(rows):
"""Extract email column from rows."""
for row in rows:
yield row[1]
# Pipeline: read -> filter -> extract
rows = read_csv_lines("users.csv")
german_rows = filter_by_country(rows, "Germany")
emails = extract_emails(german_rows)
for email in emails:
send_newsletter(email)
This processes millions of rows using almost no memory. Each row flows through the entire pipeline before the next row is read.
Source Code
You can find the code for this tutorial on GitHub:
kemalcodes/python-tutorial — tutorial-13-generators
Run the examples:
python src/py13_generators.py
Run the tests:
python -m pytest tests/test_py13.py -v
What’s Next?
In the next tutorial, we will learn about decorators — functions that modify other functions. They build on the generator and closure concepts we have covered so far.
Related Articles
- Python Tutorial #12: File I/O — reading and writing files
- Python Tutorial #5: Functions — closures and first-class functions
- Python Cheat Sheet — quick reference for Python syntax