In the previous tutorial, we learned about lists, dictionaries, sets, and tuples. Now let’s take a deep dive into strings — one of the most used data types in Python.

We covered string basics in Tutorial #3. This tutorial goes further: advanced methods, formatting tricks, raw strings, and regular expressions.

String Methods

Strings have many built-in methods. Here are the most useful ones for daily work.

Cleaning Text

text = "  Hello, World!  "
print(text.strip())       # "Hello, World!" — remove whitespace from both sides
print(text.lstrip())      # "Hello, World!  " — left side only
print(text.rstrip())      # "  Hello, World!" — right side only
print(text.lower())       # "  hello, world!  "
print(text.upper())       # "  HELLO, WORLD!  "

A common pattern: strip whitespace and normalize case:

def clean_text(text: str) -> str:
    return text.strip().lower()

print(clean_text("  Hello, World!  "))  # "hello, world!"

Splitting and Joining

# Split a string into a list of words
sentence = "Python is fun and powerful"
words = sentence.split()  # Split on whitespace (default)
print(words)  # ['Python', 'is', 'fun', 'and', 'powerful']

# Split on a specific character
csv_line = "Alex,25,Berlin"
parts = csv_line.split(",")
print(parts)  # ['Alex', '25', 'Berlin']

# Join a list back into a string
print(" ".join(words))    # "Python is fun and powerful"
print(", ".join(parts))   # "Alex, 25, Berlin"
print("-".join(["2026", "06", "17"]))  # "2026-06-17"

split() and join() are the most frequently used string methods. You will use them constantly.

Searching

text = "Hello, World! Hello, Python!"

print(text.find("World"))       # 7 — index where "World" starts
print(text.find("Java"))        # -1 — not found
print(text.count("Hello"))      # 2 — how many times
print(text.startswith("Hello")) # True
print(text.endswith("!"))       # True
print("World" in text)          # True — the simplest check

Replacing

text = "I like cats and cats are great"
print(text.replace("cats", "dogs"))
# "I like dogs and dogs are great"

print(text.replace("cats", "dogs", 1))
# "I like dogs and cats are great" — replace only first occurrence

For multiple replacements, loop through a dictionary:

def replace_all(text: str, replacements: dict[str, str]) -> str:
    for old, new in replacements.items():
        text = text.replace(old, new)
    return text

result = replace_all("I like cats and dogs", {"cats": "birds", "dogs": "fish"})
print(result)  # "I like birds and fish"

Checking Content

print("hello123".isalnum())    # True — letters and numbers only
print("hello".isalpha())       # True — letters only
print("12345".isdigit())       # True — digits only
print("hello world".isspace()) # False — not all whitespace
print("Hello World".istitle()) # True — title case

Truncating

A useful helper for displaying text:

def truncate(text: str, max_length: int = 50) -> str:
    if len(text) <= max_length:
        return text
    return text[:max_length - 3] + "..."

print(truncate("This is a very long string that needs to be shortened"))
# "This is a very long string that needs to be sh..."

Advanced f-String Formatting

We learned basic f-strings in Tutorial #3. Here are more advanced techniques.

Number Formatting

n = 1_000_000

print(f"{n:,}")        # "1,000,000" — comma separator
print(f"{n:_}")        # "1_000_000" — underscore separator
print(f"{255:b}")      # "11111111" — binary
print(f"{255:x}")      # "ff" — hexadecimal
print(f"{255:o}")      # "377" — octal
print(f"{42:08d}")     # "00000042" — zero-padded to 8 digits

Percentage and Decimal Places

ratio = 0.8567
print(f"{ratio:.2f}")     # "0.86" — 2 decimal places
print(f"{ratio:.0%}")     # "86%" — percentage (no decimals)
print(f"{ratio:.2%}")     # "85.67%" — percentage (2 decimals)

Alignment and Padding

name = "Alex"
print(f"{name:<20}")   # "Alex                " — left-aligned, 20 chars
print(f"{name:>20}")   # "                Alex" — right-aligned
print(f"{name:^20}")   # "        Alex        " — centered
print(f"{name:*^20}")  # "********Alex********" — centered with fill char

Building Tables

Alignment is perfect for creating text tables:

data = [("Alex", 95, 89.5), ("Sam", 87, 82.3), ("Jordan", 92, 91.0)]

print(f"{'Name':<15}{'Score':>8}{'Average':>10}")
print("-" * 33)
for name, score, avg in data:
    print(f"{name:<15}{score:>8}{avg:>10.1f}")

Output:

Name              Score   Average
---------------------------------
Alex                 95      89.5
Sam                  87      82.3
Jordan               92      91.0

Raw Strings

A raw string starts with r before the quotes. Python treats backslashes as literal characters, not escape sequences:

# Normal string: \n is a newline
print("Hello\nWorld")
# Output:
# Hello
# World

# Raw string: \n is literal backslash + n
print(r"Hello\nWorld")
# Output: Hello\nWorld

Raw strings are useful for:

  • File paths on Windows: r"C:\Users\Alex\new_folder"
  • Regular expressions: r"\d+\.\d+" (avoid double escaping)

Multi-Line Strings

Triple quotes create multi-line strings:

message = """Dear Alex,

Thank you for signing up.
Your account is ready.

Best regards,
The Team"""

print(message)

You can also use triple quotes with f-strings:

name = "Alex"
items = 3
total = 49.99

receipt = f"""
Receipt for {name}
-------------------
Items: {items}
Total: ${total:.2f}
"""
print(receipt)

Regular Expressions (Regex Basics)

Regular expressions let you search for patterns in text. Python’s re module provides regex support.

re.search() — Find the First Match

import re

text = "Contact us at support@example.com for help"
match = re.search(r"[\w.+-]+@[\w-]+\.[\w.-]+", text)

if match:
    print(match.group())  # "support@example.com"

re.search() returns a match object if found, or None if not. Always check before calling .group().

re.findall() — Find All Matches

text = "I have 3 cats and 2 dogs, total 5 pets"
numbers = re.findall(r"\d+", text)
print(numbers)  # ['3', '2', '5']

# Convert to integers
numbers = [int(n) for n in re.findall(r"\d+", text)]
print(numbers)  # [3, 2, 5]
text = "I love #Python and #coding!"
hashtags = re.findall(r"#\w+", text)
print(hashtags)  # ['#Python', '#coding']

re.sub() — Search and Replace

text = "Email me at alex@example.com or bob@test.org"
censored = re.sub(r"[\w.+-]+@[\w-]+\.[\w.-]+", "[REDACTED]", text)
print(censored)
# "Email me at [REDACTED] or [REDACTED]"
# Normalize whitespace
messy = "too   many    spaces   here"
clean = re.sub(r"\s+", " ", messy)
print(clean)  # "too many spaces here"

re.split() — Split on a Pattern

text = "hello, world; foo. bar!"
parts = re.split(r"[,;.!?]+", text)
# ['hello', ' world', ' foo', ' bar', '']

This is more powerful than str.split() because you can split on any pattern.

Regex Groups

Use parentheses () to capture parts of a match:

date_str = "2026-06-17"
match = re.match(r"(\d{4})-(\d{2})-(\d{2})", date_str)

if match:
    year = match.group(1)   # "2026"
    month = match.group(2)  # "06"
    day = match.group(3)    # "17"
    print(f"{year}/{month}/{day}")

Common Regex Patterns

Here is a quick reference for common patterns:

PatternMatches
\dAny digit (0-9)
\wAny word character (letter, digit, underscore)
\sAny whitespace (space, tab, newline)
.Any character except newline
+One or more of the previous
*Zero or more of the previous
?Zero or one of the previous
^Start of string
$End of string
[abc]Any of a, b, or c
[^abc]Anything except a, b, or c

Validating Input

Use regex to validate patterns:

def validate_phone(phone: str) -> bool:
    return bool(re.match(r"^\+\d{1,3}-\d{3}-\d{3}-\d{4}$", phone))

print(validate_phone("+1-555-123-4567"))   # True
print(validate_phone("555-123-4567"))       # False

When to Use Regex

Use regex when:

  • You need to match complex patterns
  • You need to extract data from unstructured text
  • Simple string methods are not enough

Do not use regex when:

  • A simple in, startswith(), or split() works
  • You are parsing structured data like JSON or CSV (use the proper libraries)

As the saying goes: “Some people, when confronted with a problem, think ‘I know, I’ll use regular expressions.’ Now they have two problems.”

Keep your regex simple. If a pattern is hard to read, add comments or break it into parts.

Practical Example: Log Parser

Let’s combine string methods and regex to parse a log file:

import re

log_lines = [
    "2026-06-17 10:30:15 [INFO] User Alex logged in",
    "2026-06-17 10:31:02 [ERROR] Database connection failed",
    "2026-06-17 10:32:45 [INFO] User Sam logged out",
    "2026-06-17 10:33:10 [WARNING] Disk usage at 85%",
    "2026-06-17 10:34:00 [ERROR] API timeout after 30s",
]

# Extract all error messages
errors = [line for line in log_lines if "[ERROR]" in line]
print(f"Errors: {len(errors)}")

# Parse each line with regex
pattern = r"(\d{4}-\d{2}-\d{2}) (\d{2}:\d{2}:\d{2}) \[(\w+)\] (.+)"
for line in log_lines:
    match = re.match(pattern, line)
    if match:
        date, time, level, message = match.groups()
        print(f"  {level:8} | {time} | {message}")

Output:

Errors: 2
  INFO     | 10:30:15 | User Alex logged in
  ERROR    | 10:31:02 | Database connection failed
  INFO     | 10:32:45 | User Sam logged out
  WARNING  | 10:33:10 | Disk usage at 85%
  ERROR    | 10:34:00 | API timeout after 30s

This is a realistic example. Log parsing is one of the most common uses for string methods and regex in Python.

String Encoding

Strings in Python 3 are Unicode by default. This means they can hold characters from any language:

# All of these work in Python 3
english = "Hello"
german = "Hallo, wie geht's?"
japanese = "こんにちは"
emoji = "Python is fun! 🐍"

print(f"{english} | {german} | {japanese} | {emoji}")

When reading or writing files, you may need to specify the encoding:

# Read a file with UTF-8 encoding
with open("data.txt", "r", encoding="utf-8") as f:
    content = f.read()

# Encode a string to bytes
text = "Hello"
encoded = text.encode("utf-8")  # b'Hello'
decoded = encoded.decode("utf-8")  # "Hello"

UTF-8 is the default encoding for most systems. If you see garbled text, check the encoding. We will cover file I/O in more detail in a later tutorial.

Common Mistakes

Forgetting to Use Raw Strings for Regex

# BAD: \b is a backspace in normal strings
re.search("\bword\b", text)

# GOOD: raw string preserves the backslash
re.search(r"\bword\b", text)

Always use raw strings (r"...") for regex patterns.

Not Checking for None

# BAD: crashes if no match
match = re.search(r"\d+", "no numbers")
print(match.group())  # AttributeError!

# GOOD: check first
match = re.search(r"\d+", "no numbers")
if match:
    print(match.group())

Practical Example: Slug Generator

A slug is a URL-friendly version of a string. For example, “Hello World! 123” becomes “hello-world-123”. Here is how to build one:

import re

def slugify(text: str) -> str:
    """Convert text to a URL-friendly slug."""
    text = text.lower().strip()
    text = re.sub(r"[^\w\s-]", "", text)  # Remove special chars
    text = re.sub(r"[\s_]+", "-", text)    # Replace spaces with hyphens
    return text.strip("-")

print(slugify("Hello World! 123"))       # "hello-world-123"
print(slugify("  Python Tutorial  "))    # "python-tutorial"
print(slugify("What's New in 2026?"))    # "whats-new-in-2026"

This function uses lower(), strip(), and re.sub() together. Slug generation is a common real-world task in web development and content management.

Summary

Here is a quick reference for the string operations we covered:

OperationMethodExample
Cleanstrip(), lower()text.strip().lower()
Splitsplit()"a,b,c".split(",")
Joinjoin()",".join(["a","b","c"])
Findfind(), intext.find("word")
Replacereplace()text.replace("old", "new")
Formatf-stringf"{name}: {score:.1f}"
Regex findre.search()re.search(r"\d+", text)
Regex allre.findall()re.findall(r"\d+", text)
Regex replacere.sub()re.sub(r"\s+", " ", text)

Source Code

You can find the code for this tutorial on GitHub:

kemalcodes/python-tutorial — tutorial-07-strings

Run the examples:

python src/py07_strings.py

Run the tests:

python -m pytest tests/test_py07.py -v

What’s Next?

In the next tutorial, we will learn about modules, packages, and virtual environments. You will learn how to organize your code into files, install third-party packages, and manage dependencies.