System Design #14: Design a Chat System

In the previous article, you designed a URL shortener. Now let us tackle a more complex system: a real-time chat application like WhatsApp or Slack.

Chat systems are a favorite in system design interviews because they combine real-time communication, message storage, presence detection, and push notifications.

Step 1: Requirements

Functional Requirements

One-on-one messaging
Group messaging (up to 500 members)
Online/offline status (presence)
Read receipts (message seen)
Media sharing (images, files)
Push notifications for offline users
Message history (persistent storage)

Non-Functional Requirements

Real-time delivery (< 200ms for online users)
Messages must never be lost (durability)
Message ordering must be preserved within a conversation
The system should support 2 billion users with 50 billion messages per day

Step 2: Back-of-the-Envelope Estimation

Users: 2 billion total, 500 million daily active users (DAU)

Messages:
  50 billion messages/day
  50B / 86,400 = ~580,000 messages/sec
  Peak: ~1.5 million messages/sec

Message size:
  Average message: 200 bytes (text)
  Media messages: ~200 KB (image thumbnail + metadata)

Storage per day:
  Text: 50B * 200 bytes = 10 TB/day
  Media: assume 5% of messages have media
         2.5B * 200 KB = 500 TB/day (media stored in blob storage)

  Text storage per year: 10 TB * 365 = 3.6 PB

Connections:
  500M concurrent WebSocket connections
  Each connection uses ~10 KB of memory
  Total memory for connections: 5 TB
  Need thousands of chat servers

Step 3: Communication Protocol

Why WebSocket?

For real-time chat, the server must push messages to clients immediately. HTTP is request-response — the client must ask for new messages. WebSocket provides a persistent, bidirectional connection.

HTTP Polling vs WebSocket:

  HTTP Polling:
    Client: "Any new messages?" --> Server: "No"    (every 1 sec)
    Client: "Any new messages?" --> Server: "No"
    Client: "Any new messages?" --> Server: "Yes! Here is a message"

    Problem: Wastes bandwidth. 99% of polls return nothing.
    With 500M users polling every second = 500M requests/sec (wasteful)

  Long Polling:
    Client: "Any new messages?" --> Server: ... waits ... "Yes!"
    Client: "Any new messages?" --> Server: ... waits ...

    Better, but still opens/closes connections frequently.

  WebSocket:
    Client <---> Server (persistent connection)

    Server pushes messages instantly when they arrive.
    No wasted requests. No connection overhead.

    Used by: WhatsApp, Slack, Discord, Telegram

Connection Flow

WebSocket Connection Flow:

  1. Client opens HTTPS connection to a chat server
  2. HTTP Upgrade request: "I want to switch to WebSocket"
  3. Server accepts: connection is now WebSocket
  4. Both sides can send messages at any time
  5. Connection stays open until client disconnects

  [Client] <===WebSocket===> [Chat Server]
     |                            |
     |-- send message ----------->|
     |<-------- receive message --|
     |<-------- receive message --|
     |-- send message ----------->|

Step 4: High-Level Architecture

Architecture Overview:

  [Mobile/Web Client]
        |
        | WebSocket
        v
  [Load Balancer] (Layer 4 - TCP level)
        |
  [Chat Servers] (maintain WebSocket connections)
        |
   +----+----+----+
   |    |    |    |
   v    v    v    v
  [Message Queue (Kafka)]
        |
  [Message Storage Service]
        |
  [Database (Cassandra)]

  Separate services:
  [Presence Service] -- tracks online/offline
  [Push Notification Service] -- notifies offline users
  [Media Storage (S3)] -- stores images, files
  [Group Service] -- manages group membership

Step 5: One-on-One Messaging

Message Flow

Alex sends a message to Sam:

  1. Alex's client sends the message via WebSocket to Chat Server A
  2. Chat Server A:
     a. Generates a unique message ID (Snowflake-like)
     b. Stores the message in the database
     c. Looks up which chat server Sam is connected to

  3. If Sam is ONLINE (connected to Chat Server B):
     Chat Server A --> Message Queue --> Chat Server B --> Sam's client
     Sam receives the message in real time.

  4. If Sam is OFFLINE:
     Chat Server A --> Push Notification Service --> APNs/FCM --> Sam's phone
     The message is stored. Sam gets it when they come online.

  [Alex] --ws--> [Chat Server A] --queue--> [Chat Server B] --ws--> [Sam]
                       |                          |
                  [Store message]           [Deliver message]
                       |
                  [Cassandra]

Message ID Generation

Messages must be ordered within a conversation. You cannot use a global auto-increment ID because it would be a bottleneck at 580K messages/sec.

Message ID (Snowflake-like):

  | 41 bits: timestamp | 10 bits: server_id | 13 bits: sequence |

  timestamp: milliseconds since epoch (~69 years)
  server_id: which chat server generated the ID (1024 servers)
  sequence:  counter within the same millisecond (8192 per ms)

  Total: 64-bit integer

  Properties:
    - Unique across all servers (no coordination needed)
    - Roughly time-ordered (timestamp is the most significant bits)
    - Can generate 8192 * 1024 = 8.3 million IDs per millisecond

  Messages are sorted by message_id within a conversation.
  Since the ID starts with a timestamp, sorting by ID = sorting by time.

Step 6: Group Messaging

Group chat adds complexity because a single message must be delivered to many recipients.

Fan-Out on Write (Small Groups)

For small groups (< 500 members), copy the message to each member’s inbox when it is sent.

Fan-Out on Write:

  Alex sends "Hello!" to a group with 100 members.

  1. Chat Server receives the message
  2. Look up group membership: [Alex, Sam, Jordan, ... 97 more]
  3. For each member:
     a. Write a copy to their message queue/inbox
     b. If online: deliver via WebSocket
     c. If offline: send push notification

  [Alex sends] --> [100 copies written to 100 inboxes]

  Pros:
    - Reading messages is fast (just read from your inbox)
    - Simple delivery logic

  Cons:
    - Write amplification (1 message becomes 100 writes)
    - Expensive for large groups

Fan-Out on Read (Large Groups)

For large groups or broadcast channels, do NOT copy the message. Store it once and let each member read it.

Fan-Out on Read:

  A news channel posts to 1 million subscribers.

  1. Store the message ONCE in the group's message log
  2. When a subscriber opens the app:
     a. Fetch new messages from the group's log
     b. Filter by "messages after my last read timestamp"

  [Channel posts] --> [1 message stored]
  [User opens app] --> [Reads from group log]

  Pros:
    - 1 write instead of 1 million writes
    - Efficient for large groups

  Cons:
    - Reading is slower (must fetch and merge from multiple groups)
    - Real-time delivery is harder

Hybrid Approach (WhatsApp/Slack Style)

Hybrid Fan-Out:

  Groups < 500 members:  fan-out on write
    --> Real-time delivery to all members
    --> Write amplification is manageable (500 writes max)

  Groups/Channels > 500 members:  fan-out on read
    --> Store message once
    --> Members pull when they open the app
    --> Send push notifications to active members only

  This is how WhatsApp, Slack, and Discord handle it.
  WhatsApp limits groups to 1024 members (fan-out on write).
  Slack channels can have thousands (fan-out on read for large channels).

Step 7: Message Storage

Database Choice

Requirements for message storage:
  - Write-heavy: 580K writes/sec
  - Key-value access pattern: get messages by conversation_id + time range
  - Must scale to petabytes
  - Ordered by time within a conversation

Best fit: Apache Cassandra or ScyllaDB
  - Designed for high write throughput
  - Partition by conversation_id
  - Cluster key: message_id (time-ordered)
  - Linear horizontal scaling
  - Used by: Discord (Cassandra), Facebook Messenger (HBase)

Schema

Messages Table (Cassandra):

  Partition key: conversation_id
  Clustering key: message_id (ascending)

  | conversation_id | message_id | sender_id | content   | type  | created_at          |
  |-----------------|------------|-----------|-----------|-------|---------------------|
  | conv_123        | 100001     | user_alex | "Hello!"  | text  | 2026-05-29 10:00:01 |
  | conv_123        | 100002     | user_sam  | "Hi!"     | text  | 2026-05-29 10:00:03 |
  | conv_123        | 100003     | user_alex | photo.jpg | image | 2026-05-29 10:00:15 |

  Query: "Get last 50 messages in conversation conv_123"
    --> SELECT * FROM messages
        WHERE conversation_id = 'conv_123'
        AND time_bucket = '2026-W22'
        ORDER BY message_id DESC
        LIMIT 50

  This query hits exactly ONE partition. Very fast.
  (For the basic schema without time buckets, omit the time_bucket filter.)

Partition Sizing

Partition Size Problem:

  A very active conversation could have millions of messages.
  Cassandra partitions should stay under 100 MB for best performance.

  Solution: Bucket by time (same as Discord's approach).

  Partition key: (conversation_id, time_bucket)
  time_bucket = created_at / 7_days

  Each partition holds ~1 week of messages.
  Most queries ("show recent messages") hit only the latest partition.

  Old messages: hit older partitions (acceptable latency for scrolling back)

Step 8: Online Presence

How does the system know if a user is online or offline?

Presence Service:

  Each client sends a heartbeat every 30 seconds.

  [Client] --heartbeat--> [Presence Service] --update--> [Redis]

  Redis key: "presence:user_alex"
  Value: { status: "online", last_seen: 1748520000, server: "chat-07" }
  TTL: 60 seconds

  If no heartbeat for 60 seconds:
    The key expires automatically.
    User is considered offline.

  When Alex opens a conversation with Sam:
    Check Redis for "presence:user_sam"
    If key exists: show "online" or "last seen X minutes ago"
    If key missing: show "offline"

Presence for Groups

Group Presence:

  Showing real-time presence for all group members is expensive.

  WhatsApp approach:
    - Show "online" status only in 1-on-1 chats
    - For groups: show "typing..." only when someone is actively typing
    - Do NOT show online status for all 500 group members

  Slack approach:
    - Show a green dot for online users
    - But only fetch presence for visible members (sidebar)
    - Use a WebSocket subscription for presence updates

Step 9: Push Notifications

When a user is offline, send a push notification.

Push Notification Flow:

  1. Chat Server determines Sam is offline
     (no WebSocket connection, no heartbeat)
  2. Chat Server sends notification request to Push Service
  3. Push Service:
     a. Looks up Sam's device tokens (iOS, Android)
     b. Creates notification payload
     c. Sends to APNs (Apple) and FCM (Google Firebase)
  4. Sam's phone displays the notification

  [Chat Server] --> [Push Service] --> [APNs / FCM] --> [Sam's phone]

  Important details:
    - Store multiple device tokens per user (phone + tablet)
    - Respect user preferences (muted conversations, quiet hours)
    - Batch notifications for group messages (do not send 100 separate notifications)
    - Rate limit: max 1 notification per conversation per 30 seconds

Images, videos, and files are stored separately from messages.

Media Upload Flow:

  1. Client uploads the file to a media service
  2. Media service stores it in blob storage (S3)
  3. Media service generates a thumbnail (for images/videos)
  4. Media service returns a media URL
  5. Client sends a message with the media URL

  [Client] --upload--> [Media Service] --store--> [S3]
                              |
                        [Generate thumbnail]
                              |
                        [Return media_url]
                              |
  [Client] --message { type: "image", url: media_url }--> [Chat Server]

  Why separate?
    - Messages are small (200 bytes). Media is large (200 KB - 10 MB).
    - Different storage requirements (S3 for media, Cassandra for messages).
    - Media can be served via CDN for faster delivery.
    - Processing (thumbnails, compression) is done asynchronously.

Step 11: End-to-End Encryption Overview

WhatsApp uses end-to-end encryption (E2EE). The server cannot read the messages.

End-to-End Encryption (simplified):

  Alex wants to send a message to Sam.

  1. Alex encrypts the message with Sam's PUBLIC key
  2. Encrypted message is sent to the server
  3. Server stores and forwards the encrypted message (cannot read it)
  4. Sam decrypts the message with their PRIVATE key

  The server only sees encrypted data. Even if the database is breached,
  messages are unreadable.

  Key exchange: Signal Protocol (used by WhatsApp, Signal)
  Group encryption: each member has a pair of keys.
  The sender encrypts the message once per member.

Complete Architecture

                     [GeoDNS]
                    /         \
          [US Region]          [EU Region]
               |                    |
         [Load Balancer]      [Load Balancer]
          /    |    \          /    |    \
   [Chat-1] [Chat-2] [Chat-3]  ... (more servers)
      |         |        |
      +----+----+--------+
           |
     [Message Queue (Kafka)]
       /        |         \
  [Storage   [Push        [Analytics
   Service]   Service]     Service]
      |          |            |
  [Cassandra] [APNs/FCM]  [ClickHouse]

  Supporting services:
  [Presence Service] --> [Redis]
  [Group Service] --> [PostgreSQL]
  [Media Service] --> [S3 + CDN]
  [User Service] --> [PostgreSQL]

Common Mistakes

Using HTTP polling for real-time chat. This wastes bandwidth and adds latency. Use WebSocket for bidirectional real-time communication.
Storing messages in a relational database. At 580K writes/sec and petabytes of data, a single relational database cannot handle the load. Use a distributed database like Cassandra.
Not separating media from messages. Storing images in the same database as messages creates storage and performance problems. Use blob storage (S3) for media.
Showing presence for all group members. Fetching and broadcasting presence for 500 users in real time is expensive. Only show it for 1-on-1 chats or the visible member list.

Interview Tips

Clarify the scale. “How many users? How many messages per day? Group size limits?” This drives your architecture decisions.
Start with the message flow. Draw the path of a single message from sender to receiver. Then add complexity (groups, offline, media).
Mention WebSocket explicitly. “For real-time delivery, I will use WebSocket connections between clients and chat servers.”
Discuss fan-out trade-offs. “For small groups, fan-out on write for real-time delivery. For large channels, fan-out on read to avoid write amplification.”
Do not forget push notifications. Most candidates forget about offline users. Mention APNs and FCM.
Mention message ordering. “I will use a Snowflake-like ID that embeds the timestamp for global ordering within a conversation.”

System Design #13: Design a URL Shortener — Your first system design walkthrough
System Design #7: Message Queues — Kafka, RabbitMQ, SQS
System Design #4: Caching — Redis for presence and caching
System Design #12: Data Partitioning — Sharding for message storage

What’s Next?

In the next article, System Design #15: Design a News Feed, you will learn:

Fan-out on write vs fan-out on read for timelines
How to rank and sort posts
Trending topics with count-min sketch
How Twitter and Instagram build their feeds

This is part 14 of the System Design Tutorial series. Follow along to learn system design from scratch.

Step 1: Requirements#

Functional Requirements#

Non-Functional Requirements#

Step 2: Back-of-the-Envelope Estimation#

Step 3: Communication Protocol#

Why WebSocket?#

Connection Flow#

Step 4: High-Level Architecture#

Step 5: One-on-One Messaging#

Message Flow#

Message ID Generation#

Step 6: Group Messaging#

Fan-Out on Write (Small Groups)#

Fan-Out on Read (Large Groups)#

Hybrid Approach (WhatsApp/Slack Style)#

Step 7: Message Storage#

Database Choice#

Schema#

Partition Sizing#

Step 8: Online Presence#

Presence for Groups#

Step 9: Push Notifications#

Step 10: Media Sharing#

Step 11: End-to-End Encryption Overview#

Complete Architecture#

Common Mistakes#

Interview Tips#

Related Articles#

What’s Next?#