In the previous tutorial, we built a web scraper. Now let’s build an automation script — a file organizer that sorts files by type, processes CSV data, and logs everything properly.
This is the final project in the Python Tutorial series. It ties together everything: file I/O, dataclasses, error handling, logging, testing, and packaging. By the end, you will have a practical tool you can use every day.
What We Are Building
A file organizer that:
- Scans a directory (like your Downloads folder)
- Sorts files into folders by type (images, documents, code, etc.)
- Handles name conflicts (adds a number suffix)
- Supports dry run mode (preview without moving)
- Generates a report (text and JSON)
- Processes CSV files (filter, transform)
- Uses environment variables for configuration
File Type Categories
First, we define what types of files exist:
FILE_CATEGORIES: dict[str, list[str]] = {
"images": [".jpg", ".jpeg", ".png", ".gif", ".svg", ".webp"],
"documents": [".pdf", ".doc", ".docx", ".txt", ".md"],
"spreadsheets": [".xls", ".xlsx", ".csv"],
"code": [".py", ".js", ".ts", ".html", ".css", ".java", ".kt", ".rs"],
"archives": [".zip", ".tar", ".gz", ".rar", ".7z"],
"videos": [".mp4", ".avi", ".mov", ".mkv"],
"audio": [".mp3", ".wav", ".flac", ".ogg"],
"data": [".json", ".xml", ".yaml", ".yml", ".toml", ".sql"],
}
def get_category(filepath: Path) -> str:
"""Get the category for a file based on its extension."""
ext = filepath.suffix.lower()
for category, extensions in FILE_CATEGORIES.items():
if ext in extensions:
return category
return "other"
Files that do not match any category go into “other”.
The File Organizer
The core class handles all the logic:
from dataclasses import dataclass, field
from pathlib import Path
import shutil
import logging
@dataclass
class MoveResult:
source: str
destination: str
category: str
success: bool
error: str = ""
@dataclass
class OrganizeReport:
total_files: int = 0
moved: int = 0
skipped: int = 0
errors: int = 0
categories: dict[str, int] = field(default_factory=dict)
class FileOrganizer:
def __init__(
self,
source_dir: Path,
target_dir: Path | None = None,
dry_run: bool = False,
logger: logging.Logger | None = None,
) -> None:
self.source_dir = Path(source_dir)
self.target_dir = Path(target_dir) if target_dir else self.source_dir
self.dry_run = dry_run
self.logger = logger or logging.getLogger("file_organizer")
The organize() Method
def organize(self) -> OrganizeReport:
"""Organize all files in the source directory."""
report = OrganizeReport()
if not self.source_dir.exists():
self.logger.error("Source directory does not exist: %s", self.source_dir)
return report
files = [f for f in self.source_dir.iterdir() if f.is_file()]
report.total_files = len(files)
for filepath in files:
category = get_category(filepath)
target_folder = self.target_dir / category
target_path = target_folder / filepath.name
# Handle name conflicts
target_path = self._resolve_conflict(target_path)
if self.dry_run:
self.logger.info("[DRY RUN] %s -> %s/", filepath.name, category)
report.moved += 1
else:
try:
target_folder.mkdir(parents=True, exist_ok=True)
shutil.move(str(filepath), str(target_path))
report.moved += 1
except OSError as e:
self.logger.error("Failed: %s — %s", filepath.name, e)
report.errors += 1
continue
report.categories[category] = report.categories.get(category, 0) + 1
return report
Handling Name Conflicts
If photo.jpg already exists in the target folder, we create photo_1.jpg:
def _resolve_conflict(self, target_path: Path) -> Path:
"""If target exists, add a number suffix."""
if not target_path.exists():
return target_path
stem = target_path.stem
suffix = target_path.suffix
parent = target_path.parent
counter = 1
while target_path.exists():
target_path = parent / f"{stem}_{counter}{suffix}"
counter += 1
return target_path
Running the Organizer
from pathlib import Path
# Organize Downloads folder
organizer = FileOrganizer(
source_dir=Path.home() / "Downloads",
dry_run=True, # Preview first!
)
report = organizer.organize()
print(f"Would move {report.moved} files:")
for category, count in sorted(report.categories.items()):
print(f" {category}: {count}")
Always use dry_run=True first to preview what will happen.
CSV Processing
Automation scripts often process CSV data. Here are three useful functions:
Analyze a CSV File
import csv
from pathlib import Path
from dataclasses import dataclass
@dataclass
class CSVStats:
rows: int = 0
columns: int = 0
column_names: list[str] = field(default_factory=list)
def analyze_csv(filepath: Path) -> CSVStats:
"""Get statistics about a CSV file."""
stats = CSVStats()
with open(filepath, newline="", encoding="utf-8") as f:
reader = csv.DictReader(f)
stats.column_names = list(reader.fieldnames or [])
stats.columns = len(stats.column_names)
stats.rows = sum(1 for _ in reader)
return stats
Filter CSV Rows
def filter_csv(input_path: Path, output_path: Path, column: str, value: str) -> int:
"""Filter rows where column matches value. Returns match count."""
matching = []
with open(input_path, newline="", encoding="utf-8") as f:
reader = csv.DictReader(f)
fieldnames = reader.fieldnames or []
for row in reader:
if row.get(column, "").strip() == value:
matching.append(row)
if matching:
with open(output_path, "w", newline="", encoding="utf-8") as f:
writer = csv.DictWriter(f, fieldnames=fieldnames)
writer.writeheader()
writer.writerows(matching)
return len(matching)
# Usage: filter employees in Berlin
count = filter_csv(
Path("employees.csv"),
Path("berlin_employees.csv"),
column="city",
value="Berlin",
)
print(f"Found {count} employees in Berlin")
Transform CSV Columns
def transform_csv(input_path: Path, output_path: Path, column: str, fn) -> int:
"""Apply a function to a column. Returns row count."""
rows = []
with open(input_path, newline="", encoding="utf-8") as f:
reader = csv.DictReader(f)
fieldnames = reader.fieldnames or []
for row in reader:
if column in row:
row[column] = fn(row[column])
rows.append(row)
with open(output_path, "w", newline="", encoding="utf-8") as f:
writer = csv.DictWriter(f, fieldnames=fieldnames)
writer.writeheader()
writer.writerows(rows)
return len(rows)
# Usage: uppercase all names
transform_csv(Path("data.csv"), Path("data_upper.csv"), "name", str.upper)
Environment Variables
Never hardcode paths or configuration. Use environment variables:
import os
def get_env(name: str, default: str = "") -> str:
"""Get an environment variable with a default."""
return os.environ.get(name, default)
def get_env_required(name: str) -> str:
"""Get a required variable. Raises if missing."""
value = os.environ.get(name)
if value is None:
raise EnvironmentError(f"Required: {name}")
return value
def get_env_bool(name: str, default: bool = False) -> bool:
"""Get a boolean environment variable."""
value = os.environ.get(name, "").lower()
if value in ("1", "true", "yes", "on"):
return True
if value in ("0", "false", "no", "off"):
return False
return default
Usage:
source_dir = get_env("ORGANIZE_SOURCE", str(Path.home() / "Downloads"))
dry_run = get_env_bool("DRY_RUN", default=True)
For more complex configuration, use python-dotenv to load .env files:
# .env file
ORGANIZE_SOURCE=/Users/alex/Downloads
DRY_RUN=true
LOG_LEVEL=INFO
from dotenv import load_dotenv
load_dotenv() # Load .env file into os.environ
File Utilities
Human-Readable File Sizes
def format_size(size_bytes: int) -> str:
"""Format bytes to human readable string."""
if size_bytes < 1024:
return f"{size_bytes} B"
if size_bytes < 1024 * 1024:
return f"{size_bytes / 1024:.1f} KB"
if size_bytes < 1024 * 1024 * 1024:
return f"{size_bytes / (1024 * 1024):.1f} MB"
return f"{size_bytes / (1024 * 1024 * 1024):.1f} GB"
print(format_size(1_500_000)) # 1.4 MB
Finding Duplicate Files
def find_duplicates(directory: Path) -> dict[int, list[Path]]:
"""Find files with the same size (potential duplicates)."""
size_map: dict[int, list[Path]] = {}
for filepath in directory.iterdir():
if filepath.is_file():
size = filepath.stat().st_size
size_map.setdefault(size, []).append(filepath)
return {size: files for size, files in size_map.items() if len(files) > 1}
Generating Reports
Save the results as a text report and a JSON file:
def generate_text_report(report: OrganizeReport) -> str:
lines = [
"File Organization Report",
"=" * 40,
f"Total files: {report.total_files}",
f"Moved: {report.moved}",
f"Skipped: {report.skipped}",
f"Errors: {report.errors}",
"",
"Files per category:",
]
for category, count in sorted(report.categories.items()):
lines.append(f" {category}: {count}")
return "\n".join(lines)
Scheduling Tasks
For running your script on a schedule:
Using the schedule Library
import schedule
import time
def run_organizer():
organizer = FileOrganizer(Path.home() / "Downloads")
report = organizer.organize()
print(f"Organized {report.moved} files")
schedule.every(30).minutes.do(run_organizer)
schedule.every().day.at("09:00").do(run_organizer)
while True:
schedule.run_pending()
time.sleep(60)
Using OS-Level Scheduling (Production)
For production, use your operating system’s scheduler:
Linux/macOS (cron):
# Run every hour
0 * * * * /usr/bin/python3 /path/to/organizer.py
macOS (launchd):
<!-- ~/Library/LaunchAgents/com.organizer.plist -->
<dict>
<key>ProgramArguments</key>
<array>
<string>/usr/bin/python3</string>
<string>/path/to/organizer.py</string>
</array>
<key>StartInterval</key>
<integer>3600</integer>
</dict>
OS-level schedulers are more reliable than schedule because they run even if your script crashes.
Testing the Organizer
Use tmp_path (a pytest fixture) to test file operations safely:
import pytest
from src.py25_automation import FileOrganizer, get_category
class TestFileOrganizer:
@pytest.fixture
def source_dir(self, tmp_path):
source = tmp_path / "downloads"
source.mkdir()
(source / "photo.jpg").write_text("img")
(source / "doc.pdf").write_text("pdf")
(source / "script.py").write_text("code")
return source
def test_organize_moves_files(self, source_dir):
organizer = FileOrganizer(source_dir)
report = organizer.organize()
assert (source_dir / "images" / "photo.jpg").exists()
assert (source_dir / "documents" / "doc.pdf").exists()
assert report.moved == 3
def test_dry_run_does_not_move(self, source_dir):
organizer = FileOrganizer(source_dir, dry_run=True)
report = organizer.organize()
assert report.moved == 3
assert (source_dir / "photo.jpg").exists() # Still there
def test_name_conflict(self, tmp_path):
source = tmp_path / "source"
target = tmp_path / "target"
source.mkdir()
(target / "images").mkdir(parents=True)
(source / "photo.jpg").write_text("new")
(target / "images" / "photo.jpg").write_text("old")
organizer = FileOrganizer(source, target_dir=target)
organizer.organize()
assert (target / "images" / "photo_1.jpg").exists()
Common Mistakes
1. Hardcoded Paths
# BAD — only works on your machine
source = Path("/Users/alex/Downloads")
# GOOD — use environment variables or Path.home()
source = Path(os.environ.get("SOURCE_DIR", str(Path.home() / "Downloads")))
2. No Logging
# BAD — print and forget
print(f"Moved {filename}")
# GOOD — proper logging
logger.info("Moved %s to %s/", filename, category)
3. No Error Handling for Missing Files
# BAD — crashes if file is deleted mid-operation
shutil.move(str(source), str(target))
# GOOD — handle race conditions
try:
shutil.move(str(source), str(target))
except FileNotFoundError:
logger.warning("File disappeared: %s", source.name)
except PermissionError:
logger.error("Permission denied: %s", source.name)
Packaging as a Standalone Script
Create a pyproject.toml to make your script installable:
[build-system]
requires = ["setuptools>=68.0"]
build-backend = "setuptools.build_meta"
[project]
name = "file-organizer"
version = "1.0.0"
description = "Organize files by type"
requires-python = ">=3.12"
[project.scripts]
organize = "src.py25_automation:main"
After pip install -e ., you can run organize from anywhere.
Source Code
You can find all the code from this tutorial on GitHub:
GitHub: python-tutorial/tutorial-25-automation
Series Complete!
Congratulations! You have completed the entire Python from Zero to Building Projects series. Here is what you learned:
Foundations (Tutorials 1-8): Variables, control flow, functions, data structures, modules, virtual environments
Intermediate (Tutorials 9-17): OOP, dataclasses, error handling, file I/O, generators, decorators, context managers, type hints, testing
Advanced (Tutorials 18-21): Async/await, HTTP APIs, databases, logging and debugging
Projects (Tutorials 22-25): CLI tool, REST API, web scraper, automation script
You now have the skills to build real Python applications. Keep building, keep learning, and check out the Python Cheat Sheet for a quick reference.
Related Articles
- Python Tutorial #12: File I/O — pathlib and file operations
- Python Tutorial #21: Logging and Debugging — the logging patterns used here
- Python Tutorial #17: Testing with pytest — testing with tmp_path
- Python Cheat Sheet — quick reference for everything