Python Tutorial #25: Build an Automation Script — Real-World Python

In the previous tutorial, we built a web scraper. Now let’s build an automation script — a file organizer that sorts files by type, processes CSV data, and logs everything properly.

This is the final project in the Python Tutorial series. It ties together everything: file I/O, dataclasses, error handling, logging, testing, and packaging. By the end, you will have a practical tool you can use every day.

What We Are Building

A file organizer that:

Scans a directory (like your Downloads folder)
Sorts files into folders by type (images, documents, code, etc.)
Handles name conflicts (adds a number suffix)
Supports dry run mode (preview without moving)
Generates a report (text and JSON)
Processes CSV files (filter, transform)
Uses environment variables for configuration

File Type Categories

First, we define what types of files exist:

FILE_CATEGORIES: dict[str, list[str]] = {
    "images": [".jpg", ".jpeg", ".png", ".gif", ".svg", ".webp"],
    "documents": [".pdf", ".doc", ".docx", ".txt", ".md"],
    "spreadsheets": [".xls", ".xlsx", ".csv"],
    "code": [".py", ".js", ".ts", ".html", ".css", ".java", ".kt", ".rs"],
    "archives": [".zip", ".tar", ".gz", ".rar", ".7z"],
    "videos": [".mp4", ".avi", ".mov", ".mkv"],
    "audio": [".mp3", ".wav", ".flac", ".ogg"],
    "data": [".json", ".xml", ".yaml", ".yml", ".toml", ".sql"],
}

def get_category(filepath: Path) -> str:
    """Get the category for a file based on its extension."""
    ext = filepath.suffix.lower()
    for category, extensions in FILE_CATEGORIES.items():
        if ext in extensions:
            return category
    return "other"

Files that do not match any category go into “other”.

The File Organizer

The core class handles all the logic:

from dataclasses import dataclass, field
from pathlib import Path
import shutil
import logging

@dataclass
class MoveResult:
    source: str
    destination: str
    category: str
    success: bool
    error: str = ""

@dataclass
class OrganizeReport:
    total_files: int = 0
    moved: int = 0
    skipped: int = 0
    errors: int = 0
    categories: dict[str, int] = field(default_factory=dict)

class FileOrganizer:
    def __init__(
        self,
        source_dir: Path,
        target_dir: Path | None = None,
        dry_run: bool = False,
        logger: logging.Logger | None = None,
    ) -> None:
        self.source_dir = Path(source_dir)
        self.target_dir = Path(target_dir) if target_dir else self.source_dir
        self.dry_run = dry_run
        self.logger = logger or logging.getLogger("file_organizer")

The organize() Method

def organize(self) -> OrganizeReport:
    """Organize all files in the source directory."""
    report = OrganizeReport()

    if not self.source_dir.exists():
        self.logger.error("Source directory does not exist: %s", self.source_dir)
        return report

    files = [f for f in self.source_dir.iterdir() if f.is_file()]
    report.total_files = len(files)

    for filepath in files:
        category = get_category(filepath)
        target_folder = self.target_dir / category
        target_path = target_folder / filepath.name

        # Handle name conflicts
        target_path = self._resolve_conflict(target_path)

        if self.dry_run:
            self.logger.info("[DRY RUN] %s -> %s/", filepath.name, category)
            report.moved += 1
        else:
            try:
                target_folder.mkdir(parents=True, exist_ok=True)
                shutil.move(str(filepath), str(target_path))
                report.moved += 1
            except OSError as e:
                self.logger.error("Failed: %s — %s", filepath.name, e)
                report.errors += 1
                continue

        report.categories[category] = report.categories.get(category, 0) + 1

    return report

Handling Name Conflicts

If photo.jpg already exists in the target folder, we create photo_1.jpg:

def _resolve_conflict(self, target_path: Path) -> Path:
    """If target exists, add a number suffix."""
    if not target_path.exists():
        return target_path

    stem = target_path.stem
    suffix = target_path.suffix
    parent = target_path.parent
    counter = 1

    while target_path.exists():
        target_path = parent / f"{stem}_{counter}{suffix}"
        counter += 1

    return target_path

Running the Organizer

from pathlib import Path

# Organize Downloads folder
organizer = FileOrganizer(
    source_dir=Path.home() / "Downloads",
    dry_run=True,  # Preview first!
)
report = organizer.organize()

print(f"Would move {report.moved} files:")
for category, count in sorted(report.categories.items()):
    print(f"  {category}: {count}")

Always use dry_run=True first to preview what will happen.

CSV Processing

Automation scripts often process CSV data. Here are three useful functions:

Analyze a CSV File

import csv
from pathlib import Path
from dataclasses import dataclass

@dataclass
class CSVStats:
    rows: int = 0
    columns: int = 0
    column_names: list[str] = field(default_factory=list)

def analyze_csv(filepath: Path) -> CSVStats:
    """Get statistics about a CSV file."""
    stats = CSVStats()
    with open(filepath, newline="", encoding="utf-8") as f:
        reader = csv.DictReader(f)
        stats.column_names = list(reader.fieldnames or [])
        stats.columns = len(stats.column_names)
        stats.rows = sum(1 for _ in reader)
    return stats

Filter CSV Rows

def filter_csv(input_path: Path, output_path: Path, column: str, value: str) -> int:
    """Filter rows where column matches value. Returns match count."""
    matching = []
    with open(input_path, newline="", encoding="utf-8") as f:
        reader = csv.DictReader(f)
        fieldnames = reader.fieldnames or []
        for row in reader:
            if row.get(column, "").strip() == value:
                matching.append(row)

    if matching:
        with open(output_path, "w", newline="", encoding="utf-8") as f:
            writer = csv.DictWriter(f, fieldnames=fieldnames)
            writer.writeheader()
            writer.writerows(matching)

    return len(matching)

# Usage: filter employees in Berlin
count = filter_csv(
    Path("employees.csv"),
    Path("berlin_employees.csv"),
    column="city",
    value="Berlin",
)
print(f"Found {count} employees in Berlin")

Transform CSV Columns

def transform_csv(input_path: Path, output_path: Path, column: str, fn) -> int:
    """Apply a function to a column. Returns row count."""
    rows = []
    with open(input_path, newline="", encoding="utf-8") as f:
        reader = csv.DictReader(f)
        fieldnames = reader.fieldnames or []
        for row in reader:
            if column in row:
                row[column] = fn(row[column])
            rows.append(row)

    with open(output_path, "w", newline="", encoding="utf-8") as f:
        writer = csv.DictWriter(f, fieldnames=fieldnames)
        writer.writeheader()
        writer.writerows(rows)

    return len(rows)

# Usage: uppercase all names
transform_csv(Path("data.csv"), Path("data_upper.csv"), "name", str.upper)

Environment Variables

Never hardcode paths or configuration. Use environment variables:

import os

def get_env(name: str, default: str = "") -> str:
    """Get an environment variable with a default."""
    return os.environ.get(name, default)

def get_env_required(name: str) -> str:
    """Get a required variable. Raises if missing."""
    value = os.environ.get(name)
    if value is None:
        raise EnvironmentError(f"Required: {name}")
    return value

def get_env_bool(name: str, default: bool = False) -> bool:
    """Get a boolean environment variable."""
    value = os.environ.get(name, "").lower()
    if value in ("1", "true", "yes", "on"):
        return True
    if value in ("0", "false", "no", "off"):
        return False
    return default

Usage:

source_dir = get_env("ORGANIZE_SOURCE", str(Path.home() / "Downloads"))
dry_run = get_env_bool("DRY_RUN", default=True)

For more complex configuration, use python-dotenv to load .env files:

# .env file
ORGANIZE_SOURCE=/Users/alex/Downloads
DRY_RUN=true
LOG_LEVEL=INFO

from dotenv import load_dotenv
load_dotenv()  # Load .env file into os.environ

File Utilities

Human-Readable File Sizes

def format_size(size_bytes: int) -> str:
    """Format bytes to human readable string."""
    if size_bytes < 1024:
        return f"{size_bytes} B"
    if size_bytes < 1024 * 1024:
        return f"{size_bytes / 1024:.1f} KB"
    if size_bytes < 1024 * 1024 * 1024:
        return f"{size_bytes / (1024 * 1024):.1f} MB"
    return f"{size_bytes / (1024 * 1024 * 1024):.1f} GB"

print(format_size(1_500_000))  # 1.4 MB

Finding Duplicate Files

def find_duplicates(directory: Path) -> dict[int, list[Path]]:
    """Find files with the same size (potential duplicates)."""
    size_map: dict[int, list[Path]] = {}
    for filepath in directory.iterdir():
        if filepath.is_file():
            size = filepath.stat().st_size
            size_map.setdefault(size, []).append(filepath)
    return {size: files for size, files in size_map.items() if len(files) > 1}

Generating Reports

Save the results as a text report and a JSON file:

def generate_text_report(report: OrganizeReport) -> str:
    lines = [
        "File Organization Report",
        "=" * 40,
        f"Total files: {report.total_files}",
        f"Moved: {report.moved}",
        f"Skipped: {report.skipped}",
        f"Errors: {report.errors}",
        "",
        "Files per category:",
    ]
    for category, count in sorted(report.categories.items()):
        lines.append(f"  {category}: {count}")
    return "\n".join(lines)

Scheduling Tasks

For running your script on a schedule:

Using the schedule Library

import schedule
import time

def run_organizer():
    organizer = FileOrganizer(Path.home() / "Downloads")
    report = organizer.organize()
    print(f"Organized {report.moved} files")

schedule.every(30).minutes.do(run_organizer)
schedule.every().day.at("09:00").do(run_organizer)

while True:
    schedule.run_pending()
    time.sleep(60)

Using OS-Level Scheduling (Production)

For production, use your operating system’s scheduler:

Linux/macOS (cron):

# Run every hour
0 * * * * /usr/bin/python3 /path/to/organizer.py

macOS (launchd):

<!-- ~/Library/LaunchAgents/com.organizer.plist -->
<dict>
    <key>ProgramArguments</key>
    <array>
        <string>/usr/bin/python3</string>
        <string>/path/to/organizer.py</string>
    </array>
    <key>StartInterval</key>
    <integer>3600</integer>
</dict>

OS-level schedulers are more reliable than schedule because they run even if your script crashes.

Testing the Organizer

Use tmp_path (a pytest fixture) to test file operations safely:

import pytest
from src.py25_automation import FileOrganizer, get_category

class TestFileOrganizer:
    @pytest.fixture
    def source_dir(self, tmp_path):
        source = tmp_path / "downloads"
        source.mkdir()
        (source / "photo.jpg").write_text("img")
        (source / "doc.pdf").write_text("pdf")
        (source / "script.py").write_text("code")
        return source

    def test_organize_moves_files(self, source_dir):
        organizer = FileOrganizer(source_dir)
        report = organizer.organize()
        assert (source_dir / "images" / "photo.jpg").exists()
        assert (source_dir / "documents" / "doc.pdf").exists()
        assert report.moved == 3

    def test_dry_run_does_not_move(self, source_dir):
        organizer = FileOrganizer(source_dir, dry_run=True)
        report = organizer.organize()
        assert report.moved == 3
        assert (source_dir / "photo.jpg").exists()  # Still there

    def test_name_conflict(self, tmp_path):
        source = tmp_path / "source"
        target = tmp_path / "target"
        source.mkdir()
        (target / "images").mkdir(parents=True)
        (source / "photo.jpg").write_text("new")
        (target / "images" / "photo.jpg").write_text("old")

        organizer = FileOrganizer(source, target_dir=target)
        organizer.organize()
        assert (target / "images" / "photo_1.jpg").exists()

Common Mistakes

1. Hardcoded Paths

# BAD — only works on your machine
source = Path("/Users/alex/Downloads")

# GOOD — use environment variables or Path.home()
source = Path(os.environ.get("SOURCE_DIR", str(Path.home() / "Downloads")))

2. No Logging

# BAD — print and forget
print(f"Moved {filename}")

# GOOD — proper logging
logger.info("Moved %s to %s/", filename, category)

3. No Error Handling for Missing Files

# BAD — crashes if file is deleted mid-operation
shutil.move(str(source), str(target))

# GOOD — handle race conditions
try:
    shutil.move(str(source), str(target))
except FileNotFoundError:
    logger.warning("File disappeared: %s", source.name)
except PermissionError:
    logger.error("Permission denied: %s", source.name)

Packaging as a Standalone Script

Create a pyproject.toml to make your script installable:

[build-system]
requires = ["setuptools>=68.0"]
build-backend = "setuptools.build_meta"

[project]
name = "file-organizer"
version = "1.0.0"
description = "Organize files by type"
requires-python = ">=3.12"

[project.scripts]
organize = "src.py25_automation:main"

After pip install -e ., you can run organize from anywhere.

Source Code

You can find all the code from this tutorial on GitHub:

GitHub: python-tutorial/tutorial-25-automation

Series Complete!

Congratulations! You have completed the entire Python from Zero to Building Projects series. Here is what you learned:

Foundations (Tutorials 1-8): Variables, control flow, functions, data structures, modules, virtual environments

Intermediate (Tutorials 9-17): OOP, dataclasses, error handling, file I/O, generators, decorators, context managers, type hints, testing

Advanced (Tutorials 18-21): Async/await, HTTP APIs, databases, logging and debugging

Projects (Tutorials 22-25): CLI tool, REST API, web scraper, automation script

You now have the skills to build real Python applications. Keep building, keep learning, and check out the Python Cheat Sheet for a quick reference.

Python Tutorial #12: File I/O — pathlib and file operations
Python Tutorial #21: Logging and Debugging — the logging patterns used here
Python Tutorial #17: Testing with pytest — testing with tmp_path
Python Cheat Sheet — quick reference for everything

What We Are Building#

File Type Categories#

The File Organizer#

The organize() Method#

Handling Name Conflicts#

Running the Organizer#

CSV Processing#

Analyze a CSV File#

Filter CSV Rows#

Transform CSV Columns#

Environment Variables#

File Utilities#

Human-Readable File Sizes#

Finding Duplicate Files#

Generating Reports#

Scheduling Tasks#

Using the schedule Library#

Using OS-Level Scheduling (Production)#

Testing the Organizer#

Common Mistakes#

1. Hardcoded Paths#

2. No Logging#

3. No Error Handling for Missing Files#

Packaging as a Standalone Script#

Source Code#

Series Complete!#

Related Articles#