System Design #14: Design a Chat System

In the previous article, you designed a URL shortener. Now let us tackle a more complex system: a real-time chat application like WhatsApp or Slack. Chat systems are a favorite in system design interviews because they combine real-time communication, message storage, presence detection, and push notifications. Step 1: Requirements Functional Requirements One-on-one messaging Group messaging (up to 500 members) Online/offline status (presence) Read receipts (message seen) Media sharing (images, files) Push notifications for offline users Message history (persistent storage) Non-Functional Requirements Real-time delivery (< 200ms for online users) Messages must never be lost (durability) Message ordering must be preserved within a conversation The system should support 2 billion users with 50 billion messages per day Step 2: Back-of-the-Envelope Estimation Users: 2 billion total, 500 million daily active users (DAU) Messages: 50 billion messages/day 50B / 86,400 = ~580,000 messages/sec Peak: ~1.5 million messages/sec Message size: Average message: 200 bytes (text) Media messages: ~200 KB (image thumbnail + metadata) Storage per day: Text: 50B * 200 bytes = 10 TB/day Media: assume 5% of messages have media 2.5B * 200 KB = 500 TB/day (media stored in blob storage) Text storage per year: 10 TB * 365 = 3.6 PB Connections: 500M concurrent WebSocket connections Each connection uses ~10 KB of memory Total memory for connections: 5 TB Need thousands of chat servers Step 3: Communication Protocol Why WebSocket? For real-time chat, the server must push messages to clients immediately. HTTP is request-response — the client must ask for new messages. WebSocket provides a persistent, bidirectional connection. ...

May 27, 2026 · 11 min

System Design #13: Design a URL Shortener

In the previous article, you learned about data partitioning and sharding. Now let us design a real system: a URL shortener like bit.ly. This is one of the most popular system design interview questions. It looks simple but touches many core concepts: hashing, databases, caching, and scaling. Step 1: Requirements Always start by clarifying what the system needs to do. Functional Requirements Given a long URL, generate a short URL When a user visits the short URL, redirect to the original long URL Users can optionally set a custom short code Short URLs expire after a configurable time (default: 5 years) Non-Functional Requirements The system should be highly available (redirects must always work) Redirection should happen in real time (< 100ms) Short URLs should not be guessable (no sequential IDs) Not in Scope (for this design) User accounts and authentication URL analytics dashboard (we will discuss basic analytics) Paid plans and rate limiting by plan Step 2: Back-of-the-Envelope Estimation Traffic Estimation: Write (new URLs created): 100 million per day Read (redirections): 10 billion per day (100:1 read-to-write ratio) Writes per second: 100M / 86,400 = ~1,160 writes/sec Reads per second: 10B / 86,400 = ~115,740 reads/sec Peak: 2-3x average Peak writes: ~3,000/sec Peak reads: ~350,000/sec Storage Estimation: Each URL mapping: ~500 bytes (short code + long URL + metadata) Per day: 100M * 500 bytes = 50 GB/day Per year: 50 GB * 365 = ~18 TB/year 5 years (retention): ~90 TB total Short Code Length: Using Base62 (a-z, A-Z, 0-9) = 62 characters 6 characters: 62^6 = 56.8 billion combinations 7 characters: 62^7 = 3.5 trillion combinations At 100M URLs/day for 5 years = 182.5 billion URLs 7 characters is enough (3.5 trillion >> 182.5 billion) Step 3: API Design REST API: POST /api/shorten Request: { "long_url": "https://example.com/very/long/path?query=value", "custom_code": "my-link", // optional "expiration": "2031-01-01" // optional } Response: { "short_url": "https://short.ly/Ab3xK9", "long_url": "https://example.com/very/long/path?query=value", "expires_at": "2031-01-01T00:00:00Z" } GET /{shortCode} Response: HTTP 301 Redirect to the long URL Location: https://example.com/very/long/path?query=value 301 vs 302 Redirect 301 (Permanent Redirect): The browser caches the redirect. Subsequent visits go directly to the long URL without hitting your server. Less server load but you lose analytics data. 302 (Temporary Redirect): The browser does NOT cache. Every visit hits your server first. More server load but you can track every click. Choose 302 if analytics are important. Choose 301 for maximum performance with less tracking. ...

May 27, 2026 · 10 min

System Design #12: Data Partitioning and Sharding

In the previous article, you learned about consistent hashing. Now let us dive deep into data partitioning — how to split your database across multiple machines when one is not enough. Why Partition Data? A single database server has limits. It can only store so much data and handle so many queries per second. When you hit those limits, you have two choices: Vertical scaling — buy a bigger machine (expensive, has a ceiling) Horizontal scaling — split data across multiple machines (partitioning) Single Database (No Partitioning): [All 500M users] --> [One Database Server] | |--> 10TB of data |--> 50,000 queries/sec |--> Single point of failure |--> $$$$ for a huge machine Partitioned Database: [Users A-M] --> [Database Shard 1] (5TB, 25K qps) [Users N-Z] --> [Database Shard 2] (5TB, 25K qps) Each shard handles half the data and half the traffic. If one shard goes down, only half the users are affected. Horizontal vs Vertical Partitioning There are two ways to split data. ...

May 27, 2026 · 10 min

System Design #11: Consistent Hashing

In the previous article, you learned about rate limiting algorithms. Now let us solve a fundamental problem in distributed systems: how to distribute data across multiple servers. Consistent hashing is the answer. It is used by Amazon DynamoDB, Apache Cassandra, Akamai CDN, and Discord. Once you understand it, you will see it everywhere. The Problem: Distributing Data Across Servers Imagine you have a cache with 4 servers. You need to decide which server stores which data. The simplest approach is modular hashing. ...

May 26, 2026 · 11 min

System Design #10: Rate Limiting and Throttling

In the previous article, you learned about microservices and monolith architectures. Now let us talk about protecting your APIs from abuse: rate limiting. Rate limiting controls how many requests a client can make in a given time period. Without it, a single client can overwhelm your servers, intentionally or by accident. Why Every API Needs Rate Limiting 1. Prevent Abuse A malicious user can send thousands of requests per second to overload your servers. Rate limiting stops them before they cause damage. ...

May 26, 2026 · 13 min

System Design #9: Microservices vs Monolith

In the previous article, you learned about API design with REST, GraphQL, and gRPC. Now let us talk about how to structure your entire application: as one big service (monolith) or many small services (microservices). This is one of the most debated topics in software engineering. The answer is not always microservices. Many successful companies run monoliths. The right choice depends on your team size, system complexity, and stage of growth. ...

May 26, 2026 · 14 min

System Design #8: API Design — REST, GraphQL, gRPC

In the previous article, you learned about message queues for asynchronous communication. But most communication in a system is synchronous — a client sends a request and waits for a response. That is where APIs come in. An API (Application Programming Interface) is a contract between two systems. It defines how they communicate: what requests you can send, what responses you get back, and what format the data is in. Good API design is critical. A bad API slows down development, confuses users, and is hard to change later. ...

May 25, 2026 · 11 min

System Design #7: Message Queues — Kafka, RabbitMQ, SQS

In the previous article, you learned about the CAP Theorem and consistency patterns. Now let us look at one of the most important building blocks in distributed systems: message queues. Almost every large-scale system uses message queues. They are the backbone of asynchronous communication between services. What is a Message Queue? A message queue is a system that stores messages sent by one service (the producer) and delivers them to another service (the consumer). The producer and consumer do not need to be online at the same time. ...

May 25, 2026 · 12 min

System Design #6: CAP Theorem and Consistency Patterns

In the previous article, you learned about databases, replication, and sharding. You saw that replicated databases can have “replication lag” where followers temporarily have stale data. This brings us to one of the most important concepts in distributed systems: the CAP Theorem. It explains why you cannot have everything in a distributed system. You must make trade-offs. What is the CAP Theorem? The CAP Theorem was introduced by computer scientist Eric Brewer in 2000. It states that a distributed system can only guarantee two out of three properties at the same time: ...

May 25, 2026 · 12 min

System Design #5: Databases — SQL vs NoSQL, Sharding, Replication

In the previous article, you learned how caching speeds up systems. But behind every cache, there is a database. The database is where your data lives permanently. Choosing the right database is one of the most important decisions in system design. It affects performance, scalability, and how easy your system is to maintain. SQL Databases SQL (Structured Query Language) databases store data in tables with rows and columns. They follow a fixed schema — you define the structure before inserting data. ...

May 24, 2026 · 12 min