In the previous tutorial, we learned about lists, dictionaries, sets, and tuples. Now let’s take a deep dive into strings — one of the most used data types in Python.
We covered string basics in Tutorial #3. This tutorial goes further: advanced methods, formatting tricks, raw strings, and regular expressions.
String Methods
Strings have many built-in methods. Here are the most useful ones for daily work.
Cleaning Text
text = " Hello, World! "
print(text.strip()) # "Hello, World!" — remove whitespace from both sides
print(text.lstrip()) # "Hello, World! " — left side only
print(text.rstrip()) # " Hello, World!" — right side only
print(text.lower()) # " hello, world! "
print(text.upper()) # " HELLO, WORLD! "
A common pattern: strip whitespace and normalize case:
def clean_text(text: str) -> str:
return text.strip().lower()
print(clean_text(" Hello, World! ")) # "hello, world!"
Splitting and Joining
# Split a string into a list of words
sentence = "Python is fun and powerful"
words = sentence.split() # Split on whitespace (default)
print(words) # ['Python', 'is', 'fun', 'and', 'powerful']
# Split on a specific character
csv_line = "Alex,25,Berlin"
parts = csv_line.split(",")
print(parts) # ['Alex', '25', 'Berlin']
# Join a list back into a string
print(" ".join(words)) # "Python is fun and powerful"
print(", ".join(parts)) # "Alex, 25, Berlin"
print("-".join(["2026", "06", "17"])) # "2026-06-17"
split() and join() are the most frequently used string methods. You will use them constantly.
Searching
text = "Hello, World! Hello, Python!"
print(text.find("World")) # 7 — index where "World" starts
print(text.find("Java")) # -1 — not found
print(text.count("Hello")) # 2 — how many times
print(text.startswith("Hello")) # True
print(text.endswith("!")) # True
print("World" in text) # True — the simplest check
Replacing
text = "I like cats and cats are great"
print(text.replace("cats", "dogs"))
# "I like dogs and dogs are great"
print(text.replace("cats", "dogs", 1))
# "I like dogs and cats are great" — replace only first occurrence
For multiple replacements, loop through a dictionary:
def replace_all(text: str, replacements: dict[str, str]) -> str:
for old, new in replacements.items():
text = text.replace(old, new)
return text
result = replace_all("I like cats and dogs", {"cats": "birds", "dogs": "fish"})
print(result) # "I like birds and fish"
Checking Content
print("hello123".isalnum()) # True — letters and numbers only
print("hello".isalpha()) # True — letters only
print("12345".isdigit()) # True — digits only
print("hello world".isspace()) # False — not all whitespace
print("Hello World".istitle()) # True — title case
Truncating
A useful helper for displaying text:
def truncate(text: str, max_length: int = 50) -> str:
if len(text) <= max_length:
return text
return text[:max_length - 3] + "..."
print(truncate("This is a very long string that needs to be shortened"))
# "This is a very long string that needs to be sh..."
Advanced f-String Formatting
We learned basic f-strings in Tutorial #3. Here are more advanced techniques.
Number Formatting
n = 1_000_000
print(f"{n:,}") # "1,000,000" — comma separator
print(f"{n:_}") # "1_000_000" — underscore separator
print(f"{255:b}") # "11111111" — binary
print(f"{255:x}") # "ff" — hexadecimal
print(f"{255:o}") # "377" — octal
print(f"{42:08d}") # "00000042" — zero-padded to 8 digits
Percentage and Decimal Places
ratio = 0.8567
print(f"{ratio:.2f}") # "0.86" — 2 decimal places
print(f"{ratio:.0%}") # "86%" — percentage (no decimals)
print(f"{ratio:.2%}") # "85.67%" — percentage (2 decimals)
Alignment and Padding
name = "Alex"
print(f"{name:<20}") # "Alex " — left-aligned, 20 chars
print(f"{name:>20}") # " Alex" — right-aligned
print(f"{name:^20}") # " Alex " — centered
print(f"{name:*^20}") # "********Alex********" — centered with fill char
Building Tables
Alignment is perfect for creating text tables:
data = [("Alex", 95, 89.5), ("Sam", 87, 82.3), ("Jordan", 92, 91.0)]
print(f"{'Name':<15}{'Score':>8}{'Average':>10}")
print("-" * 33)
for name, score, avg in data:
print(f"{name:<15}{score:>8}{avg:>10.1f}")
Output:
Name Score Average
---------------------------------
Alex 95 89.5
Sam 87 82.3
Jordan 92 91.0
Raw Strings
A raw string starts with r before the quotes. Python treats backslashes as literal characters, not escape sequences:
# Normal string: \n is a newline
print("Hello\nWorld")
# Output:
# Hello
# World
# Raw string: \n is literal backslash + n
print(r"Hello\nWorld")
# Output: Hello\nWorld
Raw strings are useful for:
- File paths on Windows:
r"C:\Users\Alex\new_folder" - Regular expressions:
r"\d+\.\d+"(avoid double escaping)
Multi-Line Strings
Triple quotes create multi-line strings:
message = """Dear Alex,
Thank you for signing up.
Your account is ready.
Best regards,
The Team"""
print(message)
You can also use triple quotes with f-strings:
name = "Alex"
items = 3
total = 49.99
receipt = f"""
Receipt for {name}
-------------------
Items: {items}
Total: ${total:.2f}
"""
print(receipt)
Regular Expressions (Regex Basics)
Regular expressions let you search for patterns in text. Python’s re module provides regex support.
re.search() — Find the First Match
import re
text = "Contact us at support@example.com for help"
match = re.search(r"[\w.+-]+@[\w-]+\.[\w.-]+", text)
if match:
print(match.group()) # "support@example.com"
re.search() returns a match object if found, or None if not. Always check before calling .group().
re.findall() — Find All Matches
text = "I have 3 cats and 2 dogs, total 5 pets"
numbers = re.findall(r"\d+", text)
print(numbers) # ['3', '2', '5']
# Convert to integers
numbers = [int(n) for n in re.findall(r"\d+", text)]
print(numbers) # [3, 2, 5]
text = "I love #Python and #coding!"
hashtags = re.findall(r"#\w+", text)
print(hashtags) # ['#Python', '#coding']
re.sub() — Search and Replace
text = "Email me at alex@example.com or bob@test.org"
censored = re.sub(r"[\w.+-]+@[\w-]+\.[\w.-]+", "[REDACTED]", text)
print(censored)
# "Email me at [REDACTED] or [REDACTED]"
# Normalize whitespace
messy = "too many spaces here"
clean = re.sub(r"\s+", " ", messy)
print(clean) # "too many spaces here"
re.split() — Split on a Pattern
text = "hello, world; foo. bar!"
parts = re.split(r"[,;.!?]+", text)
# ['hello', ' world', ' foo', ' bar', '']
This is more powerful than str.split() because you can split on any pattern.
Regex Groups
Use parentheses () to capture parts of a match:
date_str = "2026-06-17"
match = re.match(r"(\d{4})-(\d{2})-(\d{2})", date_str)
if match:
year = match.group(1) # "2026"
month = match.group(2) # "06"
day = match.group(3) # "17"
print(f"{year}/{month}/{day}")
Common Regex Patterns
Here is a quick reference for common patterns:
| Pattern | Matches |
|---|---|
\d | Any digit (0-9) |
\w | Any word character (letter, digit, underscore) |
\s | Any whitespace (space, tab, newline) |
. | Any character except newline |
+ | One or more of the previous |
* | Zero or more of the previous |
? | Zero or one of the previous |
^ | Start of string |
$ | End of string |
[abc] | Any of a, b, or c |
[^abc] | Anything except a, b, or c |
Validating Input
Use regex to validate patterns:
def validate_phone(phone: str) -> bool:
return bool(re.match(r"^\+\d{1,3}-\d{3}-\d{3}-\d{4}$", phone))
print(validate_phone("+1-555-123-4567")) # True
print(validate_phone("555-123-4567")) # False
When to Use Regex
Use regex when:
- You need to match complex patterns
- You need to extract data from unstructured text
- Simple string methods are not enough
Do not use regex when:
- A simple
in,startswith(), orsplit()works - You are parsing structured data like JSON or CSV (use the proper libraries)
As the saying goes: “Some people, when confronted with a problem, think ‘I know, I’ll use regular expressions.’ Now they have two problems.”
Keep your regex simple. If a pattern is hard to read, add comments or break it into parts.
Practical Example: Log Parser
Let’s combine string methods and regex to parse a log file:
import re
log_lines = [
"2026-06-17 10:30:15 [INFO] User Alex logged in",
"2026-06-17 10:31:02 [ERROR] Database connection failed",
"2026-06-17 10:32:45 [INFO] User Sam logged out",
"2026-06-17 10:33:10 [WARNING] Disk usage at 85%",
"2026-06-17 10:34:00 [ERROR] API timeout after 30s",
]
# Extract all error messages
errors = [line for line in log_lines if "[ERROR]" in line]
print(f"Errors: {len(errors)}")
# Parse each line with regex
pattern = r"(\d{4}-\d{2}-\d{2}) (\d{2}:\d{2}:\d{2}) \[(\w+)\] (.+)"
for line in log_lines:
match = re.match(pattern, line)
if match:
date, time, level, message = match.groups()
print(f" {level:8} | {time} | {message}")
Output:
Errors: 2
INFO | 10:30:15 | User Alex logged in
ERROR | 10:31:02 | Database connection failed
INFO | 10:32:45 | User Sam logged out
WARNING | 10:33:10 | Disk usage at 85%
ERROR | 10:34:00 | API timeout after 30s
This is a realistic example. Log parsing is one of the most common uses for string methods and regex in Python.
String Encoding
Strings in Python 3 are Unicode by default. This means they can hold characters from any language:
# All of these work in Python 3
english = "Hello"
german = "Hallo, wie geht's?"
japanese = "こんにちは"
emoji = "Python is fun! 🐍"
print(f"{english} | {german} | {japanese} | {emoji}")
When reading or writing files, you may need to specify the encoding:
# Read a file with UTF-8 encoding
with open("data.txt", "r", encoding="utf-8") as f:
content = f.read()
# Encode a string to bytes
text = "Hello"
encoded = text.encode("utf-8") # b'Hello'
decoded = encoded.decode("utf-8") # "Hello"
UTF-8 is the default encoding for most systems. If you see garbled text, check the encoding. We will cover file I/O in more detail in a later tutorial.
Common Mistakes
Forgetting to Use Raw Strings for Regex
# BAD: \b is a backspace in normal strings
re.search("\bword\b", text)
# GOOD: raw string preserves the backslash
re.search(r"\bword\b", text)
Always use raw strings (r"...") for regex patterns.
Not Checking for None
# BAD: crashes if no match
match = re.search(r"\d+", "no numbers")
print(match.group()) # AttributeError!
# GOOD: check first
match = re.search(r"\d+", "no numbers")
if match:
print(match.group())
Practical Example: Slug Generator
A slug is a URL-friendly version of a string. For example, “Hello World! 123” becomes “hello-world-123”. Here is how to build one:
import re
def slugify(text: str) -> str:
"""Convert text to a URL-friendly slug."""
text = text.lower().strip()
text = re.sub(r"[^\w\s-]", "", text) # Remove special chars
text = re.sub(r"[\s_]+", "-", text) # Replace spaces with hyphens
return text.strip("-")
print(slugify("Hello World! 123")) # "hello-world-123"
print(slugify(" Python Tutorial ")) # "python-tutorial"
print(slugify("What's New in 2026?")) # "whats-new-in-2026"
This function uses lower(), strip(), and re.sub() together. Slug generation is a common real-world task in web development and content management.
Summary
Here is a quick reference for the string operations we covered:
| Operation | Method | Example |
|---|---|---|
| Clean | strip(), lower() | text.strip().lower() |
| Split | split() | "a,b,c".split(",") |
| Join | join() | ",".join(["a","b","c"]) |
| Find | find(), in | text.find("word") |
| Replace | replace() | text.replace("old", "new") |
| Format | f-string | f"{name}: {score:.1f}" |
| Regex find | re.search() | re.search(r"\d+", text) |
| Regex all | re.findall() | re.findall(r"\d+", text) |
| Regex replace | re.sub() | re.sub(r"\s+", " ", text) |
Source Code
You can find the code for this tutorial on GitHub:
kemalcodes/python-tutorial — tutorial-07-strings
Run the examples:
python src/py07_strings.py
Run the tests:
python -m pytest tests/test_py07.py -v
What’s Next?
In the next tutorial, we will learn about modules, packages, and virtual environments. You will learn how to organize your code into files, install third-party packages, and manage dependencies.
Related Articles
- Python Tutorial #6: Data Structures — lists, dicts, sets, tuples
- Regex Cheat Sheet — quick reference for regular expressions
- Python Cheat Sheet — quick reference for Python syntax