In the previous tutorial, we learned about error handling. Now let’s learn about file I/O — how to read, write, and work with files in Python.

Almost every program needs to work with files. Configuration files, log files, data exports, user uploads — files are everywhere. By the end of this tutorial, you will know how to read and write text files, work with JSON and CSV, use pathlib for modern file system operations, and handle temporary files.

Writing Text Files

Use open() with a context manager (with statement) to write files safely:

from pathlib import Path

def write_text_file(path: Path, content: str) -> None:
    """Write a string to a text file."""
    with open(path, "w", encoding="utf-8") as f:
        f.write(content)

The with statement automatically closes the file when the block ends, even if an error occurs. You never need to call f.close() manually.

The second argument to open() is the mode:

ModeDescription
"r"Read (default). File must exist.
"w"Write. Creates file or overwrites existing content.
"a"Append. Adds to the end of the file.
"x"Create. Fails if file already exists.
"b"Binary mode. Add to other modes: "rb", "wb".

Always include encoding="utf-8". Without it, Python uses the system default encoding, which varies between operating systems. On Windows, the default is often cp1252, which cannot represent all Unicode characters. Always specify UTF-8 to avoid surprises.

Writing Multiple Lines

def write_lines(path: Path, lines: list[str]) -> None:
    """Write a list of strings as lines to a file."""
    with open(path, "w", encoding="utf-8") as f:
        for line in lines:
            f.write(line + "\n")

Each call to f.write() adds text to the file. It does not add a newline automatically — you need to add "\n" yourself.

Reading Text Files

Reading a file is just as simple:

def read_text_file(path: Path) -> str:
    """Read an entire text file and return its content."""
    with open(path, "r", encoding="utf-8") as f:
        return f.read()

f.read() reads the entire file into a string. This is fine for small files, but for large files, you should read line by line to save memory.

Reading Lines

For line-by-line processing, iterate over the file object:

def read_lines(path: Path) -> list[str]:
    """Read a file and return a list of lines (stripped)."""
    with open(path, "r", encoding="utf-8") as f:
        return [line.strip() for line in f]

This is memory-efficient because Python reads one line at a time. The strip() removes the trailing newline character (\n) and any extra whitespace from each line.

You can also process lines without storing them all in memory:

with open("large_file.txt", "r", encoding="utf-8") as f:
    for line in f:
        process(line.strip())  # One line at a time

This pattern can handle files of any size because only one line is in memory at a time.

Appending to Files

Use mode "a" to add content without overwriting the existing content:

def append_to_file(path: Path, line: str) -> None:
    """Append a line to a text file."""
    with open(path, "a", encoding="utf-8") as f:
        f.write(line + "\n")

This is useful for log files or any file where you want to add data over time:

append_to_file(Path("app.log"), "2026-06-22 10:00:00 INFO App started")
append_to_file(Path("app.log"), "2026-06-22 10:00:01 INFO User logged in")

pathlib: Modern File Paths

The pathlib module is the modern way to work with file paths in Python. Use it instead of os.path:

from pathlib import Path

# Create a path using / operator
path = Path("data") / "users" / "config.json"
print(path)  # data/users/config.json

# Path from string
path = Path("/home/alex/documents/report.txt")

File Information

Path objects have useful properties:

path = Path("data/users/config.json")

print(path.name)      # config.json — file name with extension
print(path.stem)      # config — file name without extension
print(path.suffix)    # .json — the extension
print(path.parent)    # data/users — parent directory
print(path.exists())  # True or False
print(path.is_file()) # True if it's a file
print(path.is_dir())  # True if it's a directory

Creating Directories

# Create a single directory
Path("output").mkdir(exist_ok=True)

# Create nested directories
Path("output/reports/2026").mkdir(parents=True, exist_ok=True)

The exist_ok=True prevents an error if the directory already exists. The parents=True creates parent directories as needed.

Listing Files

def list_files(directory: Path, pattern: str = "*") -> list[str]:
    """List files in a directory matching a pattern."""
    return sorted([f.name for f in directory.glob(pattern) if f.is_file()])

# List all Python files
py_files = list_files(Path("src"), "*.py")
# ['py01_why_python.py', 'py02_first_program.py', ...]

# List all files
all_files = list_files(Path("src"))

The glob() method searches one directory. The rglob() method searches recursively through all subdirectories:

def list_files_recursive(directory: Path, pattern: str = "*") -> list[str]:
    """List files recursively using rglob."""
    return sorted([
        str(f.relative_to(directory))
        for f in directory.rglob(pattern)
        if f.is_file()
    ])

# Find all .py files in src/ and all subdirectories
all_py = list_files_recursive(Path("src"), "*.py")

Quick Read and Write with pathlib

pathlib also has shortcut methods for simple operations:

# Write text
Path("hello.txt").write_text("Hello, World!", encoding="utf-8")

# Read text
content = Path("hello.txt").read_text(encoding="utf-8")

# Write bytes
Path("data.bin").write_bytes(b"\x00\x01\x02")

# Read bytes
data = Path("data.bin").read_bytes()

These are convenient for one-off reads and writes, but they open and close the file each time. For repeated operations, use open() with a context manager.

JSON: Read and Write

JSON (JavaScript Object Notation) is the most common format for data exchange. Python’s json module handles it:

import json

def write_json(path: Path, data: dict | list) -> None:
    """Write data to a JSON file."""
    with open(path, "w", encoding="utf-8") as f:
        json.dump(data, f, indent=2, ensure_ascii=False)

def read_json(path: Path) -> dict | list:
    """Read data from a JSON file."""
    with open(path, "r", encoding="utf-8") as f:
        return json.load(f)

Example:

users = [
    {"name": "Alex", "age": 25, "city": "Berlin"},
    {"name": "Sam", "age": 30, "city": "Munich"},
]

# Write
write_json(Path("users.json"), users)

# Read
loaded = read_json(Path("users.json"))
print(loaded[0]["name"])  # Alex

The arguments to json.dump():

  • indent=2 — makes the output human-readable with 2-space indentation
  • ensure_ascii=False — preserves non-ASCII characters (like German umlauts: a, o, u)

JSON and Strings

You can also convert between JSON strings and Python objects:

# Python dict to JSON string
json_string = json.dumps({"name": "Alex", "age": 25}, indent=2)
print(json_string)
# {
#   "name": "Alex",
#   "age": 25
# }

# JSON string to Python dict
data = json.loads('{"name": "Alex", "age": 25}')
print(data["name"])  # Alex

Python types map to JSON types: dict becomes object, list becomes array, str becomes string, int/float become number, True/False become true/false, None becomes null.

CSV: Read and Write

CSV (Comma-Separated Values) is common for spreadsheet data and data exports:

import csv

def write_csv(path: Path, headers: list[str], rows: list[list]) -> None:
    """Write data to a CSV file."""
    with open(path, "w", encoding="utf-8", newline="") as f:
        writer = csv.writer(f)
        writer.writerow(headers)
        writer.writerows(rows)

def read_csv(path: Path) -> list[dict]:
    """Read a CSV file and return a list of dicts."""
    with open(path, "r", encoding="utf-8") as f:
        reader = csv.DictReader(f)
        return list(reader)

Example:

# Write CSV
write_csv(
    Path("scores.csv"),
    headers=["name", "score"],
    rows=[["Alex", "85"], ["Sam", "92"]],
)

# Read CSV — returns list of dicts
records = read_csv(Path("scores.csv"))
print(records[0]["name"])   # Alex
print(records[0]["score"])  # "85" — always a string!

Important: CSV values are always strings. You need to convert them manually:

score = int(records[0]["score"])  # Convert string "85" to int 85

Writing Dicts to CSV

If your data is already a list of dictionaries, use DictWriter:

def write_csv_dicts(path: Path, headers: list[str], rows: list[dict]) -> None:
    """Write a list of dicts to a CSV file."""
    with open(path, "w", encoding="utf-8", newline="") as f:
        writer = csv.DictWriter(f, fieldnames=headers)
        writer.writeheader()
        writer.writerows(rows)

Temporary Files

Use tempfile for files you only need temporarily. They are useful for tests, data processing, and any code that creates files it does not need to keep:

import tempfile

# Create a temp file
def create_temp_file(content: str) -> Path:
    tmp = tempfile.NamedTemporaryFile(
        mode="w", suffix=".txt", delete=False, encoding="utf-8",
    )
    tmp.write(content)
    tmp.close()
    return Path(tmp.name)

# Create a temp directory that cleans itself up
with tempfile.TemporaryDirectory() as tmp_dir:
    path = Path(tmp_dir) / "output.txt"
    path.write_text("temporary data", encoding="utf-8")
    # Directory and all files are deleted when the block ends

TemporaryDirectory is a context manager. It creates a directory, lets you use it, and then deletes everything when the with block ends. This is very useful for tests.

Counting Words

Here is a practical example that combines several concepts:

def count_words(path: Path) -> int:
    """Count the total number of words in a file."""
    text = read_text_file(path)
    return len(text.split())

Common Mistakes

Not Using encoding

# BAD — encoding depends on the operating system
with open("file.txt", "r") as f:
    content = f.read()

# GOOD — always specify encoding
with open("file.txt", "r", encoding="utf-8") as f:
    content = f.read()

Forgetting newline="" for CSV

# BAD — may create extra blank lines on Windows
with open("data.csv", "w") as f:
    writer = csv.writer(f)

# GOOD — newline="" prevents double newlines on Windows
with open("data.csv", "w", newline="") as f:
    writer = csv.writer(f)

Not Using the with Statement

# BAD — file may stay open if an error occurs
f = open("file.txt")
content = f.read()
f.close()  # What if f.read() raises an error? File stays open!

# GOOD — file is always closed, even on error
with open("file.txt") as f:
    content = f.read()

Reading Large Files Into Memory

# BAD — loads entire file into memory
content = Path("huge_file.txt").read_text()
lines = content.split("\n")  # Now you have the content AND the lines in memory

# GOOD — process line by line
with open("huge_file.txt", "r", encoding="utf-8") as f:
    for line in f:
        process(line)

Working with Binary Files

Some files are not text — images, PDFs, and executables are binary data. Use "rb" and "wb" modes:

# Read binary file
with open("image.png", "rb") as f:
    data = f.read()
    print(f"Size: {len(data)} bytes")

# Write binary file (copy)
with open("copy.png", "wb") as f:
    f.write(data)

Never open a binary file in text mode — it will corrupt the data or raise an error.

Checking if a File Exists

Use pathlib to check before reading:

path = Path("config.json")

if path.exists():
    config = read_json(path)
else:
    config = {"default": True}

Or use the EAFP pattern from Tutorial #11:

try:
    config = read_json(Path("config.json"))
except FileNotFoundError:
    config = {"default": True}

The EAFP pattern is more Pythonic because it avoids a race condition: the file could be deleted between the exists() check and the read_json() call.

Practical Example: Log File Parser

Here is a complete example that combines reading, parsing, and writing:

def parse_log(input_path: Path, output_path: Path, level: str = "ERROR") -> int:
    """Extract log lines matching a level and write them to a new file."""
    count = 0
    with open(input_path, "r", encoding="utf-8") as infile:
        with open(output_path, "w", encoding="utf-8") as outfile:
            for line in infile:
                if f"[{level}]" in line:
                    outfile.write(line)
                    count += 1
    return count

# Extract all ERROR lines from a log file
errors_found = parse_log(Path("app.log"), Path("errors.log"))
print(f"Found {errors_found} errors")

This reads the input file line by line (memory-efficient) and writes matching lines to the output file. It works with log files of any size.

Source Code

You can find the code for this tutorial on GitHub:

kemalcodes/python-tutorial — tutorial-12-file-io

Run the examples:

python src/py12_file_io.py

Run the tests:

python -m pytest tests/test_py12.py -v

What’s Next?

In the next tutorial, we will learn about generators and iterators — how to process data lazily and save memory when working with large datasets.