In the previous article, you designed a URL shortener. Now let us tackle a more complex system: a real-time chat application like WhatsApp or Slack.
Chat systems are a favorite in system design interviews because they combine real-time communication, message storage, presence detection, and push notifications.
Step 1: Requirements
Functional Requirements
- One-on-one messaging
- Group messaging (up to 500 members)
- Online/offline status (presence)
- Read receipts (message seen)
- Media sharing (images, files)
- Push notifications for offline users
- Message history (persistent storage)
Non-Functional Requirements
- Real-time delivery (< 200ms for online users)
- Messages must never be lost (durability)
- Message ordering must be preserved within a conversation
- The system should support 2 billion users with 50 billion messages per day
Step 2: Back-of-the-Envelope Estimation
Users: 2 billion total, 500 million daily active users (DAU)
Messages:
50 billion messages/day
50B / 86,400 = ~580,000 messages/sec
Peak: ~1.5 million messages/sec
Message size:
Average message: 200 bytes (text)
Media messages: ~200 KB (image thumbnail + metadata)
Storage per day:
Text: 50B * 200 bytes = 10 TB/day
Media: assume 5% of messages have media
2.5B * 200 KB = 500 TB/day (media stored in blob storage)
Text storage per year: 10 TB * 365 = 3.6 PB
Connections:
500M concurrent WebSocket connections
Each connection uses ~10 KB of memory
Total memory for connections: 5 TB
Need thousands of chat servers
Step 3: Communication Protocol
Why WebSocket?
For real-time chat, the server must push messages to clients immediately. HTTP is request-response — the client must ask for new messages. WebSocket provides a persistent, bidirectional connection.
HTTP Polling vs WebSocket:
HTTP Polling:
Client: "Any new messages?" --> Server: "No" (every 1 sec)
Client: "Any new messages?" --> Server: "No"
Client: "Any new messages?" --> Server: "Yes! Here is a message"
Problem: Wastes bandwidth. 99% of polls return nothing.
With 500M users polling every second = 500M requests/sec (wasteful)
Long Polling:
Client: "Any new messages?" --> Server: ... waits ... "Yes!"
Client: "Any new messages?" --> Server: ... waits ...
Better, but still opens/closes connections frequently.
WebSocket:
Client <---> Server (persistent connection)
Server pushes messages instantly when they arrive.
No wasted requests. No connection overhead.
Used by: WhatsApp, Slack, Discord, Telegram
Connection Flow
WebSocket Connection Flow:
1. Client opens HTTPS connection to a chat server
2. HTTP Upgrade request: "I want to switch to WebSocket"
3. Server accepts: connection is now WebSocket
4. Both sides can send messages at any time
5. Connection stays open until client disconnects
[Client] <===WebSocket===> [Chat Server]
| |
|-- send message ----------->|
|<-------- receive message --|
|<-------- receive message --|
|-- send message ----------->|
Step 4: High-Level Architecture
Architecture Overview:
[Mobile/Web Client]
|
| WebSocket
v
[Load Balancer] (Layer 4 - TCP level)
|
[Chat Servers] (maintain WebSocket connections)
|
+----+----+----+
| | | |
v v v v
[Message Queue (Kafka)]
|
[Message Storage Service]
|
[Database (Cassandra)]
Separate services:
[Presence Service] -- tracks online/offline
[Push Notification Service] -- notifies offline users
[Media Storage (S3)] -- stores images, files
[Group Service] -- manages group membership
Step 5: One-on-One Messaging
Message Flow
Alex sends a message to Sam:
1. Alex's client sends the message via WebSocket to Chat Server A
2. Chat Server A:
a. Generates a unique message ID (Snowflake-like)
b. Stores the message in the database
c. Looks up which chat server Sam is connected to
3. If Sam is ONLINE (connected to Chat Server B):
Chat Server A --> Message Queue --> Chat Server B --> Sam's client
Sam receives the message in real time.
4. If Sam is OFFLINE:
Chat Server A --> Push Notification Service --> APNs/FCM --> Sam's phone
The message is stored. Sam gets it when they come online.
[Alex] --ws--> [Chat Server A] --queue--> [Chat Server B] --ws--> [Sam]
| |
[Store message] [Deliver message]
|
[Cassandra]
Message ID Generation
Messages must be ordered within a conversation. You cannot use a global auto-increment ID because it would be a bottleneck at 580K messages/sec.
Message ID (Snowflake-like):
| 41 bits: timestamp | 10 bits: server_id | 13 bits: sequence |
timestamp: milliseconds since epoch (~69 years)
server_id: which chat server generated the ID (1024 servers)
sequence: counter within the same millisecond (8192 per ms)
Total: 64-bit integer
Properties:
- Unique across all servers (no coordination needed)
- Roughly time-ordered (timestamp is the most significant bits)
- Can generate 8192 * 1024 = 8.3 million IDs per millisecond
Messages are sorted by message_id within a conversation.
Since the ID starts with a timestamp, sorting by ID = sorting by time.
Step 6: Group Messaging
Group chat adds complexity because a single message must be delivered to many recipients.
Fan-Out on Write (Small Groups)
For small groups (< 500 members), copy the message to each member’s inbox when it is sent.
Fan-Out on Write:
Alex sends "Hello!" to a group with 100 members.
1. Chat Server receives the message
2. Look up group membership: [Alex, Sam, Jordan, ... 97 more]
3. For each member:
a. Write a copy to their message queue/inbox
b. If online: deliver via WebSocket
c. If offline: send push notification
[Alex sends] --> [100 copies written to 100 inboxes]
Pros:
- Reading messages is fast (just read from your inbox)
- Simple delivery logic
Cons:
- Write amplification (1 message becomes 100 writes)
- Expensive for large groups
Fan-Out on Read (Large Groups)
For large groups or broadcast channels, do NOT copy the message. Store it once and let each member read it.
Fan-Out on Read:
A news channel posts to 1 million subscribers.
1. Store the message ONCE in the group's message log
2. When a subscriber opens the app:
a. Fetch new messages from the group's log
b. Filter by "messages after my last read timestamp"
[Channel posts] --> [1 message stored]
[User opens app] --> [Reads from group log]
Pros:
- 1 write instead of 1 million writes
- Efficient for large groups
Cons:
- Reading is slower (must fetch and merge from multiple groups)
- Real-time delivery is harder
Hybrid Approach (WhatsApp/Slack Style)
Hybrid Fan-Out:
Groups < 500 members: fan-out on write
--> Real-time delivery to all members
--> Write amplification is manageable (500 writes max)
Groups/Channels > 500 members: fan-out on read
--> Store message once
--> Members pull when they open the app
--> Send push notifications to active members only
This is how WhatsApp, Slack, and Discord handle it.
WhatsApp limits groups to 1024 members (fan-out on write).
Slack channels can have thousands (fan-out on read for large channels).
Step 7: Message Storage
Database Choice
Requirements for message storage:
- Write-heavy: 580K writes/sec
- Key-value access pattern: get messages by conversation_id + time range
- Must scale to petabytes
- Ordered by time within a conversation
Best fit: Apache Cassandra or ScyllaDB
- Designed for high write throughput
- Partition by conversation_id
- Cluster key: message_id (time-ordered)
- Linear horizontal scaling
- Used by: Discord (Cassandra), Facebook Messenger (HBase)
Schema
Messages Table (Cassandra):
Partition key: conversation_id
Clustering key: message_id (ascending)
| conversation_id | message_id | sender_id | content | type | created_at |
|-----------------|------------|-----------|-----------|-------|---------------------|
| conv_123 | 100001 | user_alex | "Hello!" | text | 2026-05-29 10:00:01 |
| conv_123 | 100002 | user_sam | "Hi!" | text | 2026-05-29 10:00:03 |
| conv_123 | 100003 | user_alex | photo.jpg | image | 2026-05-29 10:00:15 |
Query: "Get last 50 messages in conversation conv_123"
--> SELECT * FROM messages
WHERE conversation_id = 'conv_123'
AND time_bucket = '2026-W22'
ORDER BY message_id DESC
LIMIT 50
This query hits exactly ONE partition. Very fast.
(For the basic schema without time buckets, omit the time_bucket filter.)
Partition Sizing
Partition Size Problem:
A very active conversation could have millions of messages.
Cassandra partitions should stay under 100 MB for best performance.
Solution: Bucket by time (same as Discord's approach).
Partition key: (conversation_id, time_bucket)
time_bucket = created_at / 7_days
Each partition holds ~1 week of messages.
Most queries ("show recent messages") hit only the latest partition.
Old messages: hit older partitions (acceptable latency for scrolling back)
Step 8: Online Presence
How does the system know if a user is online or offline?
Presence Service:
Each client sends a heartbeat every 30 seconds.
[Client] --heartbeat--> [Presence Service] --update--> [Redis]
Redis key: "presence:user_alex"
Value: { status: "online", last_seen: 1748520000, server: "chat-07" }
TTL: 60 seconds
If no heartbeat for 60 seconds:
The key expires automatically.
User is considered offline.
When Alex opens a conversation with Sam:
Check Redis for "presence:user_sam"
If key exists: show "online" or "last seen X minutes ago"
If key missing: show "offline"
Presence for Groups
Group Presence:
Showing real-time presence for all group members is expensive.
WhatsApp approach:
- Show "online" status only in 1-on-1 chats
- For groups: show "typing..." only when someone is actively typing
- Do NOT show online status for all 500 group members
Slack approach:
- Show a green dot for online users
- But only fetch presence for visible members (sidebar)
- Use a WebSocket subscription for presence updates
Step 9: Push Notifications
When a user is offline, send a push notification.
Push Notification Flow:
1. Chat Server determines Sam is offline
(no WebSocket connection, no heartbeat)
2. Chat Server sends notification request to Push Service
3. Push Service:
a. Looks up Sam's device tokens (iOS, Android)
b. Creates notification payload
c. Sends to APNs (Apple) and FCM (Google Firebase)
4. Sam's phone displays the notification
[Chat Server] --> [Push Service] --> [APNs / FCM] --> [Sam's phone]
Important details:
- Store multiple device tokens per user (phone + tablet)
- Respect user preferences (muted conversations, quiet hours)
- Batch notifications for group messages (do not send 100 separate notifications)
- Rate limit: max 1 notification per conversation per 30 seconds
Step 10: Media Sharing
Images, videos, and files are stored separately from messages.
Media Upload Flow:
1. Client uploads the file to a media service
2. Media service stores it in blob storage (S3)
3. Media service generates a thumbnail (for images/videos)
4. Media service returns a media URL
5. Client sends a message with the media URL
[Client] --upload--> [Media Service] --store--> [S3]
|
[Generate thumbnail]
|
[Return media_url]
|
[Client] --message { type: "image", url: media_url }--> [Chat Server]
Why separate?
- Messages are small (200 bytes). Media is large (200 KB - 10 MB).
- Different storage requirements (S3 for media, Cassandra for messages).
- Media can be served via CDN for faster delivery.
- Processing (thumbnails, compression) is done asynchronously.
Step 11: End-to-End Encryption Overview
WhatsApp uses end-to-end encryption (E2EE). The server cannot read the messages.
End-to-End Encryption (simplified):
Alex wants to send a message to Sam.
1. Alex encrypts the message with Sam's PUBLIC key
2. Encrypted message is sent to the server
3. Server stores and forwards the encrypted message (cannot read it)
4. Sam decrypts the message with their PRIVATE key
The server only sees encrypted data. Even if the database is breached,
messages are unreadable.
Key exchange: Signal Protocol (used by WhatsApp, Signal)
Group encryption: each member has a pair of keys.
The sender encrypts the message once per member.
Complete Architecture
[GeoDNS]
/ \
[US Region] [EU Region]
| |
[Load Balancer] [Load Balancer]
/ | \ / | \
[Chat-1] [Chat-2] [Chat-3] ... (more servers)
| | |
+----+----+--------+
|
[Message Queue (Kafka)]
/ | \
[Storage [Push [Analytics
Service] Service] Service]
| | |
[Cassandra] [APNs/FCM] [ClickHouse]
Supporting services:
[Presence Service] --> [Redis]
[Group Service] --> [PostgreSQL]
[Media Service] --> [S3 + CDN]
[User Service] --> [PostgreSQL]
Common Mistakes
Using HTTP polling for real-time chat. This wastes bandwidth and adds latency. Use WebSocket for bidirectional real-time communication.
Storing messages in a relational database. At 580K writes/sec and petabytes of data, a single relational database cannot handle the load. Use a distributed database like Cassandra.
Not separating media from messages. Storing images in the same database as messages creates storage and performance problems. Use blob storage (S3) for media.
Showing presence for all group members. Fetching and broadcasting presence for 500 users in real time is expensive. Only show it for 1-on-1 chats or the visible member list.
Interview Tips
Clarify the scale. “How many users? How many messages per day? Group size limits?” This drives your architecture decisions.
Start with the message flow. Draw the path of a single message from sender to receiver. Then add complexity (groups, offline, media).
Mention WebSocket explicitly. “For real-time delivery, I will use WebSocket connections between clients and chat servers.”
Discuss fan-out trade-offs. “For small groups, fan-out on write for real-time delivery. For large channels, fan-out on read to avoid write amplification.”
Do not forget push notifications. Most candidates forget about offline users. Mention APNs and FCM.
Mention message ordering. “I will use a Snowflake-like ID that embeds the timestamp for global ordering within a conversation.”
Related Articles
- System Design #13: Design a URL Shortener — Your first system design walkthrough
- System Design #7: Message Queues — Kafka, RabbitMQ, SQS
- System Design #4: Caching — Redis for presence and caching
- System Design #12: Data Partitioning — Sharding for message storage
What’s Next?
In the next article, System Design #15: Design a News Feed, you will learn:
- Fan-out on write vs fan-out on read for timelines
- How to rank and sort posts
- Trending topics with count-min sketch
- How Twitter and Instagram build their feeds
This is part 14 of the System Design Tutorial series. Follow along to learn system design from scratch.