In the previous article, you designed a search engine. Now let us wrap up this series with everything you need to ace a system design interview.
This article is a cheat sheet and guide. Bookmark it and review before your interview.
The 4-Step Framework
Every system design interview follows the same structure. Use this framework to stay organized and cover everything the interviewer expects.
4-Step Framework (40 minutes total):
Step 1: Requirements (5 minutes)
- Clarify functional requirements (what the system does)
- Clarify non-functional requirements (scale, latency, availability)
- Define what is IN scope and OUT of scope
Step 2: Estimation (5 minutes)
- Users, traffic, storage
- QPS (queries per second)
- Peak vs average load
Step 3: High-Level Design (15 minutes)
- Architecture diagram
- Core components and how they interact
- API design
- Data model
Step 4: Deep Dive (15 minutes)
- Interviewer picks 2-3 components to go deeper
- Discuss trade-offs, failure modes, scaling
- Show your knowledge of specific technologies
Step 1: Requirements (5 Minutes)
Do not skip this step. Jumping straight to the design is the number one mistake candidates make.
Questions to Ask:
Functional:
"What are the main features?"
"Who are the users?"
"What are the main use cases?"
Non-Functional:
"How many users? DAU?"
"What latency is acceptable?"
"Do we need strong consistency or is eventual OK?"
"What is the read-to-write ratio?"
Scope:
"Should I design authentication?"
"Should I handle international users?"
"What about mobile vs web?"
Example (URL Shortener):
Functional: shorten URL, redirect, custom codes, analytics
Non-functional: 100M URLs/day, < 100ms redirect, 99.99% uptime
Out of scope: user accounts, rate limiting by plan
Step 2: Estimation (5 Minutes)
Back-of-the-envelope estimation shows you can think about scale. You do not need exact numbers — order of magnitude is enough.
Estimation Template:
1. Users and Traffic:
DAU = X million
Requests per day = DAU * actions per user
QPS = requests / 86,400
Peak QPS = QPS * 2 to 5
2. Storage:
Data per record = X bytes
Records per day = Y
Storage per day = X * Y
Storage per year = daily * 365
3. Bandwidth:
Incoming: write QPS * request size
Outgoing: read QPS * response size
4. Memory (for caching):
Cache 20% of daily data (80/20 rule)
Cache size = 0.2 * daily data
Step 3: High-Level Design (15 Minutes)
Draw the architecture. Start with the client and work your way down.
Design Template:
[Client] --> [Load Balancer] --> [API Servers]
|
+--------+--------+
| | |
[Service A] [Service B] [Service C]
| | |
[Cache] [Message Queue] [Storage]
| | |
[Database] [Workers] [Blob Store]
For each component, briefly explain:
- What it does
- Why it is needed
- Technology choice (Redis, Kafka, PostgreSQL, etc.)
Step 4: Deep Dive (15 Minutes)
The interviewer will pick areas to go deeper. Be ready to discuss:
- How a specific component handles failure
- How to scale a bottleneck
- Trade-offs between different approaches
- Specific algorithm details (consistent hashing, fan-out, etc.)
Back-of-the-Envelope Estimation Cheat Sheet
Powers of 2
Powers of 2:
2^10 = 1 Thousand = 1 KB
2^20 = 1 Million = 1 MB
2^30 = 1 Billion = 1 GB
2^40 = 1 Trillion = 1 TB
2^50 = 1 Quadrillion = 1 PB
Handy approximations:
1 Million seconds = ~12 days
1 Billion seconds = ~31 years
1 day = 86,400 seconds (~100K for quick math)
1 month = 2.5 million seconds
1 year = 31.5 million seconds
Latency Numbers
Latency Numbers Every Developer Should Know:
L1 cache reference: 0.5 ns
L2 cache reference: 7 ns
Main memory (RAM) reference: 100 ns
SSD random read: 150 us (150,000 ns)
HDD random read: 10 ms (10,000,000 ns)
Network round trip (same DC): 500 us
Network round trip (US to EU): 150 ms
Summary:
RAM is 1000x faster than SSD.
SSD is 100x faster than HDD.
Local network is 300x faster than cross-continent.
Always cache in memory when possible.
QPS Estimation
QPS Estimation:
Given: DAU (Daily Active Users) and actions per user
QPS = DAU * actions_per_user / 86,400
Peak QPS = QPS * 3 (typical peak-to-average ratio)
Example:
DAU = 10 million
Actions per user = 20
QPS = 10M * 20 / 86,400 = ~2,300 QPS
Peak QPS = 2,300 * 3 = ~7,000 QPS
Server capacity (rule of thumb):
Single web server: 1,000-10,000 QPS (depends on request complexity)
Single database: 5,000-50,000 QPS (depends on query complexity)
Redis: 100,000+ QPS
Kafka: 100,000+ messages/sec per partition
Storage Estimation
Storage Estimation:
Per record size (typical):
Tweet-like post: 250 bytes
User profile: 1 KB
Image metadata: 500 bytes
Image file: 200 KB - 2 MB
Video file: 50 MB - 2 GB
Chat message: 200 bytes
Formula:
Daily storage = records_per_day * record_size
Yearly storage = daily * 365
Total storage = yearly * retention_years
Example (chat system):
50 billion messages/day * 200 bytes = 10 TB/day
10 TB * 365 = 3.6 PB/year
5-year retention: 18 PB
Technology Cheat Sheet
Database Selection Guide
When to use what:
PostgreSQL / MySQL:
- Structured data with relationships
- Need ACID transactions
- Complex queries and joins
- < 10 TB data, < 50K QPS
- Example: user accounts, orders, financial data
MongoDB:
- Semi-structured data (varying schemas)
- Document-oriented access pattern
- Need flexible schema
- Example: product catalogs, content management
Cassandra / ScyllaDB:
- Write-heavy workloads (100K+ writes/sec)
- Time-series data
- Need linear horizontal scaling
- Can tolerate eventual consistency
- Example: chat messages, IoT sensor data, activity logs
Redis:
- Cache layer (sub-millisecond latency)
- Session storage
- Rate limiting counters
- Leaderboards (sorted sets)
- Pub/sub messaging
Elasticsearch:
- Full-text search
- Log analytics
- Autocomplete
- Example: product search, log monitoring
ClickHouse / TimescaleDB:
- Analytics and aggregation queries
- Time-series data analysis
- Example: metrics, event analytics, dashboards
Message Queue Selection
When to use what:
Apache Kafka:
- High throughput (millions of messages/sec)
- Event streaming (retain events for replay)
- Fan-out to multiple consumers
- Example: activity feeds, event sourcing, log aggregation
RabbitMQ:
- Task queues (distribute work to workers)
- Complex routing rules
- Lower latency than Kafka
- Example: email sending, image processing tasks
Amazon SQS:
- Simple, managed queue (no infrastructure to manage)
- Standard (at-least-once) or FIFO (exactly-once)
- Example: decoupling microservices, async processing
Caching Strategy
When to cache what:
Cache-Aside (Lazy Loading):
Read from cache. On miss, read from DB, write to cache.
Best for: read-heavy workloads
Risk: stale data until TTL expires
Write-Through:
Write to cache AND database simultaneously.
Best for: data that must be fresh in cache
Risk: higher write latency
Write-Behind:
Write to cache immediately, write to DB asynchronously.
Best for: write-heavy workloads
Risk: data loss if cache crashes before DB write
CDN:
Cache static content (images, videos, JS, CSS) at the edge.
Best for: global user base, media-heavy applications
Design Patterns Cheat Sheet
Pattern: When to Use
Load Balancer:
Multiple servers handling the same traffic.
Algorithms: round robin, least connections, consistent hashing.
Database Replication:
Read-heavy workload. Leader handles writes, followers handle reads.
Database Sharding:
Single database cannot handle the data volume or traffic.
Shard by user_id, entity_id, or geography.
Caching:
Read-heavy workload with frequently accessed data.
Use Redis or Memcached.
Message Queue:
Async processing needed. Decouple producer and consumer.
Use Kafka, RabbitMQ, or SQS.
CDN:
Serving static content to a global audience.
Use Cloudflare, AWS CloudFront, or Akamai.
Fan-Out on Write:
Pre-compute results for fast reads (news feed, notifications).
Trade-off: higher write cost, faster reads.
Fan-Out on Read:
Compute results at read time.
Trade-off: lower write cost, slower reads.
Consistent Hashing:
Distributing data across servers with minimal redistribution
when servers are added or removed.
Rate Limiting:
Protecting APIs from abuse. Token bucket or sliding window.
Circuit Breaker:
Preventing cascade failures in microservices.
Stop calling a failing service, try again later.
Saga Pattern:
Distributed transactions across multiple services.
Each step has a compensating action for rollback.
Non-Functional Requirements Checklist
Use this list to make sure you address key concerns in your design.
Non-Functional Requirements:
1. Scalability
"How does the system handle 10x the current traffic?"
--> Horizontal scaling, sharding, caching, CDN
2. Availability
"What happens when a server goes down?"
--> Redundancy, failover, multiple data centers
Target: 99.99% = 52 min downtime/year
3. Consistency
"Do all users see the same data at the same time?"
--> Strong consistency (banking) vs eventual (social media)
--> CAP theorem trade-offs
4. Latency
"How fast does the system respond?"
--> Caching, CDN, database indexing, async processing
Target: p99 < 200ms for user-facing APIs
5. Durability
"What if the database crashes? Is data lost?"
--> Replication (3 copies), backups, WAL (write-ahead log)
Target: 99.999999999% (11 nines) for critical data
6. Security
"How do we protect against attacks?"
--> Authentication, authorization, encryption, rate limiting
Common Mistakes That Fail Candidates
Mistake 1: Jumping to the solution
Bad: "I will use Kafka and Cassandra and Redis and..."
Good: "Let me first understand the requirements. What scale are we targeting?"
Mistake 2: Not drawing a diagram
Bad: Talking without visualizing
Good: Draw boxes and arrows. Label each component.
Mistake 3: One-size-fits-all
Bad: "I always use MongoDB" or "I always use microservices"
Good: "For this use case, PostgreSQL fits because..."
Mistake 4: Ignoring trade-offs
Bad: "We should use strong consistency"
Good: "Strong consistency adds latency. For this use case, eventual
consistency is acceptable because users can tolerate a 1-second delay."
Mistake 5: Over-engineering
Bad: Designing for Google scale when the system has 10K users
Good: "At this scale, a single PostgreSQL with read replicas is enough.
I would shard only when we exceed 1TB of data."
Mistake 6: Not mentioning failure modes
Bad: Assuming everything works perfectly
Good: "If Redis goes down, we fall back to the database. Latency
increases but the system stays available."
Mistake 7: Forgetting about data
Bad: Designing services without thinking about the data model
Good: "The main entity is a message with fields: id, sender, content,
timestamp. I will partition by conversation_id."
What Interviewers Actually Look For
What gets you hired:
1. Structured approach
You follow a clear framework. You do not ramble.
2. Trade-off analysis
You explain WHY you chose a technology, not just WHAT.
"I chose Cassandra over PostgreSQL because our write load
is 500K/sec and we need linear horizontal scaling."
3. Scale awareness
You think about what happens at 10x or 100x current load.
You know when things break and how to fix them.
4. Communication
You explain clearly. You check in with the interviewer.
"Does this make sense? Should I go deeper into any part?"
5. Depth on demand
When the interviewer asks "how would you handle X?",
you can go 2-3 levels deeper into the details.
What does NOT matter:
- Memorizing exact numbers (order of magnitude is enough)
- Knowing every technology (understanding patterns matters more)
- Having the "perfect" design (trade-off analysis > correctness)
How to Practice
Practice Plan (2 weeks):
Week 1: Foundations
Day 1: Review all concepts (this cheat sheet)
Day 2: Design a URL Shortener
Day 3: Design a Chat System
Day 4: Design a News Feed
Day 5: Design a Video Streaming Service
Day 6: Design a Notification System
Day 7: Rest / review weak areas
Week 2: Practice
Day 8: Design a Ride-Sharing App (like Uber)
Day 9: Design a Payment System (like Stripe)
Day 10: Design a Rate Limiter
Day 11: Design a Search Autocomplete
Day 12: Design a Metrics/Monitoring System
Day 13: Mock interview with a friend
Day 14: Review and polish
For each design:
- Set a 40-minute timer
- Follow the 4-step framework
- Draw the architecture on paper
- Write down the trade-offs you considered
- Compare with published solutions afterward
Top 15 System Design Problems by Interview Frequency
Ranked by how often they appear in interviews:
1. Design a URL Shortener (bit.ly) -- Very Common
2. Design a Chat System (WhatsApp) -- Very Common
3. Design a News Feed (Twitter) -- Very Common
4. Design a Web Crawler -- Common
5. Design a Notification System -- Common
6. Design a Rate Limiter -- Common
7. Design a Key-Value Store -- Common
8. Design a Search Autocomplete -- Common
9. Design a Video Platform (YouTube) -- Common
10. Design a File Storage (Google Drive) -- Common
11. Design a Ride-Sharing App (Uber) -- Moderate
12. Design a Payment System (Stripe) -- Moderate
13. Design a Metrics/Monitoring System -- Moderate
14. Design a Ticket Booking System -- Moderate
15. Design a Social Graph (LinkedIn) -- Moderate
If you can design the top 10, you can handle any system design interview.
The patterns repeat across different systems.
Quick Reference: Design Any System
When you get a system design question you have never seen:
1. Requirements (5 min)
"What does the system do? How many users? What latency?"
2. Estimation (5 min)
"X million users * Y actions = Z QPS. Z * data_size = storage."
3. API Design (2 min)
"Main endpoints: POST /create, GET /read, PUT /update"
4. Data Model (3 min)
"Main entities: User, Item, Action. Relationships between them."
5. High-Level Design (10 min)
"Client -> LB -> API -> Service -> Cache -> DB"
Draw it. Label each box.
6. Deep Dive (15 min)
Pick the hardest parts. Discuss:
- How to scale the bottleneck
- How to handle failures
- Trade-offs you made and why
Related Articles
This article is the final part of the System Design Tutorial series. Here are all the articles:
Foundations:
Building Blocks:
- #7: Message Queues
- #8: API Design
- #9: Microservices vs Monolith
- #10: Rate Limiting
- #11: Consistent Hashing
- #12: Data Partitioning
Real System Designs:
- #13: Design a URL Shortener
- #14: Design a Chat System
- #15: Design a News Feed
- #16: Design a Video Streaming Service
- #17: Design a File Storage System
- #18: Design a Notification System
Advanced:
This is the final part of the System Design Tutorial series. You now have everything you need to design scalable systems and ace system design interviews. Good luck!