In the previous article, you learned how caching speeds up systems. But behind every cache, there is a database. The database is where your data lives permanently.
Choosing the right database is one of the most important decisions in system design. It affects performance, scalability, and how easy your system is to maintain.
SQL Databases
SQL (Structured Query Language) databases store data in tables with rows and columns. They follow a fixed schema — you define the structure before inserting data.
Users table:
+----+---------+-------------------+------------+
| id | name | email | created_at |
+----+---------+-------------------+------------+
| 1 | Alex | alex@example.com | 2026-01-15 |
| 2 | Sam | sam@example.com | 2026-02-20 |
| 3 | Jordan | jordan@example.com| 2026-03-10 |
+----+---------+-------------------+------------+
Popular SQL Databases
- PostgreSQL — the most feature-rich open-source SQL database. Supports JSON, full-text search, geospatial queries. Used by Instagram, Uber, Stripe.
- MySQL — the most popular open-source SQL database. Simple, reliable, well-documented. Used by Facebook, Twitter, YouTube.
- SQLite — a lightweight database stored in a single file. Great for mobile apps and embedded systems.
- Microsoft SQL Server — enterprise SQL database. Used in corporate environments.
Strengths of SQL Databases
- ACID compliance — guarantees data integrity (more on this below)
- Complex queries — JOINs, subqueries, aggregations across multiple tables
- Fixed schema — enforces data structure, prevents bad data
- Mature ecosystem — decades of tooling, documentation, and expertise
- Transactions — group multiple operations into one atomic unit
NoSQL Databases
NoSQL databases store data in flexible formats without a fixed schema. There are several types:
1. Document Databases
Store data as JSON-like documents. Each document can have a different structure.
MongoDB document:
{
"_id": "user_001",
"name": "Alex",
"email": "alex@example.com",
"address": {
"city": "Berlin",
"country": "Germany"
},
"hobbies": ["coding", "hiking", "photography"]
}
Popular: MongoDB, CouchDB, Amazon DocumentDB
Best for: content management, user profiles, product catalogs — data with varied structure.
2. Key-Value Databases
The simplest type. Store data as key-value pairs. Very fast for lookups by key.
Key: "session:abc123" Value: {"user_id": 1, "expires": "2026-05-16T12:00:00"}
Key: "user:1:name" Value: "Alex"
Key: "cache:popular" Value: "[1, 5, 12, 8, 3]"
Popular: Redis, Amazon DynamoDB, etcd
Best for: caching, sessions, shopping carts, real-time leaderboards.
3. Column-Family Databases
Store data in columns instead of rows. Optimized for reading and writing large amounts of data.
Row key: "user_001"
Column family "profile": name="Alex", email="alex@example.com"
Column family "activity": last_login="2026-05-16", login_count=150
Column family "metrics": posts=42, followers=1200
Popular: Apache Cassandra, HBase, Google Bigtable
Best for: time-series data, IoT data, analytics, write-heavy workloads with billions of rows.
4. Graph Databases
Store data as nodes and relationships. Optimized for queries about connections between data.
Graph structure:
(Alex) --[FOLLOWS]--> (Sam)
(Alex) --[FOLLOWS]--> (Jordan)
(Sam) --[FOLLOWS]--> (Jordan)
(Jordan) --[LIKES]--> (Post: "System Design is Fun")
Popular: Neo4j, Amazon Neptune, ArangoDB
Best for: social networks, recommendation engines, fraud detection, knowledge graphs.
ACID Properties
ACID is a set of properties that guarantee database transactions are processed reliably. SQL databases are typically ACID-compliant. Most NoSQL databases sacrifice some ACID properties for performance and scalability.
Atomicity
A transaction is all-or-nothing. Either every operation in the transaction succeeds, or none of them do.
Transfer $100 from Alex to Sam:
1. Deduct $100 from Alex's account
2. Add $100 to Sam's account
Atomicity guarantees:
EITHER both steps happen (success)
OR neither step happens (failure — rollback)
You never end up with $100 deducted from Alex but not added to Sam.
Consistency
Every transaction brings the database from one valid state to another. Database rules (constraints, foreign keys) are always enforced.
Rule: account balance must be >= 0
Alex has $50. Transfer $100 to Sam.
Step 1: Deduct $100 from Alex --> balance = -$50 (violates rule!)
Transaction is rolled back. Database stays consistent.
Isolation
Concurrent transactions do not interfere with each other. Each transaction appears to execute in isolation, as if it were the only one running.
Two transactions running at the same time:
Transaction A: Read Alex's balance ($500)
Transaction B: Read Alex's balance ($500)
Transaction A: Deduct $100 --> write $400
Transaction B: Deduct $200 --> write $300
Without isolation: Alex's balance is $300 (lost Transaction A's update!)
With isolation: Transactions run one at a time. Final balance = $200 (correct)
Durability
Once a transaction is committed, it stays committed. Even if the server crashes, the data is not lost.
1. Transaction committed: Alex's balance = $400
2. Server crashes
3. Server restarts
4. Alex's balance is still $400 (data was written to disk)
SQL vs NoSQL — Decision Framework
The question is not “which is better?” but “which fits your use case?”
Choose SQL When:
- Your data has relationships — users have orders, orders have items, items belong to categories. SQL JOINs handle this naturally.
- You need ACID transactions — financial systems, inventory management, booking systems.
- Your schema is well-defined — you know the structure of your data upfront.
- You need complex queries — aggregations, reporting, analytics across multiple tables.
Choose NoSQL When:
- Your data has no fixed schema — user-generated content, logs, IoT data.
- You need extreme write throughput — billions of events per day (time-series, analytics).
- You need horizontal scaling — NoSQL databases are designed to run across many machines.
- Your access patterns are simple — mostly key-value lookups or document reads.
- Low latency is critical — key-value stores like Redis return data in under 1 millisecond.
Common Combinations
Most real systems use multiple databases:
Typical e-commerce system:
[PostgreSQL] — users, orders, payments (needs ACID)
[Redis] — sessions, caching, rate limiting
[MongoDB] — product catalog (flexible schema)
[Elasticsearch] — search functionality
This is called polyglot persistence — using the right database for each job.
Database Replication
Replication means keeping copies of data on multiple machines. It improves availability (if one machine dies, others have the data) and read performance (distribute reads across copies).
Leader-Follower Replication
One database (the leader) handles all writes. Copies (followers or replicas) handle reads.
Leader-Follower:
[App Server] --writes--> [Leader Database]
|
|--> replication --> [Follower 1] <--reads-- [App Server]
|--> replication --> [Follower 2] <--reads-- [App Server]
|--> replication --> [Follower 3] <--reads-- [App Server]
How it works:
- All writes go to the leader
- The leader sends changes to followers (replication log)
- Followers apply the changes to their own copy
- Read requests go to any follower
Pros: Simple. Scales reads easily by adding more followers. Cons: The leader is a single point of failure for writes. There is a delay (replication lag) between the leader writing data and followers receiving it.
Leader-Leader Replication
Multiple databases accept writes. Each leader replicates to the other leaders.
Leader-Leader:
[App Server US] --writes--> [Leader US] <--replication--> [Leader EU] <--writes-- [App Server EU]
Pros: No single point of failure for writes. Users can write to the nearest datacenter. Cons: Write conflicts. What if two users update the same record on different leaders at the same time? You need conflict resolution logic.
Replication Lag
Replication is not instant. There is a delay between when data is written to the leader and when it appears on followers. This is called replication lag.
Timeline:
T=0ms: User updates profile on leader
T=0ms: User reads profile from follower --> sees OLD data
T=100ms: Replication completes
T=100ms: User reads profile from follower --> sees NEW data
This is called eventual consistency — followers will eventually have the same data as the leader, but there is a delay. For many applications (social media, content platforms), a few hundred milliseconds of lag is acceptable. For others (banking, inventory), it is not.
Solutions for Replication Lag
1. Read-your-own-writes: After a user writes data, route their reads to the leader (not a follower) for a short period. Other users can still read from followers.
2. Causal consistency: If operation B depends on operation A, ensure B is not visible until A is replicated. This prevents confusing scenarios.
Database Sharding
Sharding is splitting data across multiple database machines. Each machine holds a portion (a shard) of the total data.
Without sharding:
[Single Database: 1 TB, all data]
With sharding:
[Shard 1: 250 GB, users A-G]
[Shard 2: 250 GB, users H-N]
[Shard 3: 250 GB, users O-T]
[Shard 4: 250 GB, users U-Z]
Sharding is necessary when a single database cannot handle the data volume or traffic. But it adds significant complexity.
Sharding Strategies
1. Range-Based Sharding
Split data based on ranges of a key value.
Range-based sharding by user ID:
Shard 1: user_id 1 to 1,000,000
Shard 2: user_id 1,000,001 to 2,000,000
Shard 3: user_id 2,000,001 to 3,000,000
Pros: Simple. Range queries are easy (find all users with ID 500-1000). Cons: Uneven distribution. If new users are more active, the last shard gets more traffic (a “hot shard”).
2. Hash-Based Sharding
Apply a hash function to the key and use the result to determine the shard.
Hash-based sharding:
shard_number = hash(user_id) % number_of_shards
hash("user_001") % 4 = 2 --> Shard 2
hash("user_002") % 4 = 0 --> Shard 0
hash("user_003") % 4 = 3 --> Shard 3
Pros: Even distribution of data across shards. No hot shards. Cons: Range queries are impossible (you cannot find all users in a range because they are scattered). Adding or removing shards requires rehashing all data (this is solved by consistent hashing, covered in a future article).
3. Directory-Based Sharding
A lookup table tells you which shard holds each piece of data.
Directory table:
user_001 --> Shard 2
user_002 --> Shard 1
user_003 --> Shard 3
Lookup:
1. Check directory: where is user_002?
2. Directory says: Shard 1
3. Query Shard 1
Pros: Flexible. You can move data between shards by updating the directory. Cons: The directory is a single point of failure. Every query requires an extra lookup.
Sharding Challenges
Sharding is powerful but complex. Here are the main challenges:
1. Cross-shard queries. JOINs across shards are very expensive because data lives on different machines.
Problem: Find all orders for user_123
User data: Shard 1
Order data: Shard 3
You need to query two different machines and combine results.
2. Rebalancing. When you add or remove shards, you must redistribute data. This is complex and resource-intensive.
3. Referential integrity. Foreign keys do not work across shards. You must enforce data integrity in your application code.
4. Operational complexity. Instead of one database to monitor and back up, you have many.
When to Shard
Sharding should be a last resort. Try these first:
- Optimize queries — add indexes, rewrite slow queries
- Vertical scaling — upgrade to a bigger database machine
- Read replicas — distribute read traffic across replicas
- Caching — reduce database load with Redis
If none of these help, then shard.
Database Indexing
An index is a data structure that speeds up data retrieval. Without an index, the database must scan every row to find what you are looking for (a “full table scan”).
Without index:
Find user with email = "alex@example.com"
Database scans all 10,000,000 rows
Time: 5 seconds
With index on email column:
Database uses index to jump directly to the row
Time: 1 millisecond
How indexes work: An index is like the index at the back of a book. Instead of reading every page to find “caching,” you look up “caching” in the index and it tells you the page number.
Most SQL databases use B-tree indexes by default. They work well for equality lookups (WHERE email = ‘x’) and range queries (WHERE age BETWEEN 20 AND 30).
Trade-off: Indexes speed up reads but slow down writes. Every time you insert or update a row, the index must also be updated. Do not add indexes on every column — only on columns you query frequently.
For a deep dive into indexing, see the SQL Tutorial #6: Indexes and Performance.
Choosing the Right Database
Here is a decision framework:
Start with these questions:
1. What does your data look like? (structured, semi-structured, graph)
2. What are your access patterns? (reads vs writes, simple lookups vs complex queries)
3. How much data do you have? (GBs, TBs, PBs)
4. Do you need ACID transactions?
5. How important is horizontal scaling?
Decision tree:
Need ACID + complex queries + relationships?
--> PostgreSQL or MySQL
Need flexible schema + document storage?
--> MongoDB
Need extreme write throughput + horizontal scaling?
--> Cassandra
Need fast key-value lookups + caching?
--> Redis or DynamoDB
Need full-text search?
--> Elasticsearch
Need graph queries (social connections, recommendations)?
--> Neo4j
Need time-series data (metrics, IoT)?
--> TimescaleDB or InfluxDB
Real-World Database Choices
| Company | Primary DB | Other DBs | Why |
|---|---|---|---|
| PostgreSQL | Cassandra, Redis | ACID for users/orders, Cassandra for feed | |
| Uber | MySQL | Cassandra, Redis | MySQL sharded for trips, Cassandra for analytics |
| Netflix | Cassandra | MySQL, Redis | Cassandra for massive scale, MySQL for billing |
| MySQL | Manhattan (custom), Redis | MySQL sharded, custom DB for tweets | |
| Stripe | PostgreSQL | Redis | ACID is critical for payments |
Notice that most large companies use multiple databases. They pick the best tool for each job.
Interview Tips
When discussing databases in a system design interview:
- Always justify your choice. Do not just say “I will use PostgreSQL.” Say “I will use PostgreSQL because this data has relationships and we need ACID transactions for payments.”
- Know the trade-offs of SQL vs NoSQL. Be ready to explain when each is better.
- Mention replication early. “I will use a leader-follower setup with read replicas to handle read traffic.”
- Do not shard unless necessary. “I would start with a single database, add read replicas, and only shard when we exceed the capacity of a single machine.”
- Discuss indexing. “I would add an index on the email column since we query by email frequently.”
- Combine databases. “For this system, I would use PostgreSQL for user data and Redis for caching and sessions.”
Related Articles
- System Design #4: Caching — Redis, Memcached, CDN
- SQL Tutorial #1: SQL Basics — Learn SQL from scratch
- SQL Tutorial #6: Indexes and Performance — Database indexing and optimization
- Go Tutorial #19: Database with sqlx — Working with databases in Go
What’s Next?
In the next article, System Design #6: CAP Theorem and Consistency Patterns, you will learn:
- CAP Theorem: Consistency, Availability, Partition Tolerance
- Why you must choose between consistency and availability
- Strong consistency vs eventual consistency
- Real-world examples of CP and AP systems
- Quorum reads and writes
This is part 5 of the System Design Tutorial series. Follow along to learn system design from scratch.