In the previous article, you learned about consistent hashing. Now let us dive deep into data partitioning — how to split your database across multiple machines when one is not enough.

Why Partition Data?

A single database server has limits. It can only store so much data and handle so many queries per second. When you hit those limits, you have two choices:

  1. Vertical scaling — buy a bigger machine (expensive, has a ceiling)
  2. Horizontal scaling — split data across multiple machines (partitioning)
Single Database (No Partitioning):

  [All 500M users] --> [One Database Server]
                         |
                         |--> 10TB of data
                         |--> 50,000 queries/sec
                         |--> Single point of failure
                         |--> $$$$ for a huge machine

Partitioned Database:

  [Users A-M]  --> [Database Shard 1]  (5TB, 25K qps)
  [Users N-Z]  --> [Database Shard 2]  (5TB, 25K qps)

  Each shard handles half the data and half the traffic.
  If one shard goes down, only half the users are affected.

Horizontal vs Vertical Partitioning

There are two ways to split data.

Horizontal Partitioning (Sharding)

Split rows across multiple databases. Each shard has the same schema but different rows.

Horizontal Partitioning (Sharding):

  Original table: users (500M rows)

  Shard 1: users where user_id 1 to 100M
  Shard 2: users where user_id 100M to 200M
  Shard 3: users where user_id 200M to 300M
  Shard 4: users where user_id 300M to 400M
  Shard 5: users where user_id 400M to 500M

  Each shard has the same columns (id, name, email, ...)
  but different rows.

This is the most common form of partitioning. When people say “sharding,” they usually mean horizontal partitioning.

Vertical Partitioning

Split columns across multiple databases. Each partition has different columns for the same rows.

Vertical Partitioning:

  Original table: users
    (id, name, email, bio, avatar_url, preferences, login_history)

  Partition 1 (core data):
    users_core (id, name, email)
    --> Accessed on every request, kept in fast storage

  Partition 2 (profile data):
    users_profile (id, bio, avatar_url, preferences)
    --> Accessed less often, can be on slower storage

  Partition 3 (activity data):
    users_activity (id, login_history)
    --> Write-heavy, separate to avoid locking core data

Vertical partitioning is useful when some columns are accessed much more often than others. But it is limited — you eventually need horizontal partitioning for scale.

Sharding Strategies

1. Range-Based Sharding

Split data by ranges of the shard key.

Range-Based Sharding:

  Shard key: user_id

  Shard 1: user_id 1        to 10,000,000
  Shard 2: user_id 10,000,001 to 20,000,000
  Shard 3: user_id 20,000,001 to 30,000,000
  Shard 4: user_id 30,000,001 to 40,000,000

  To find user 15,000,003:
    --> Falls in range 10M-20M
    --> Go to Shard 2

Pros:

  • Simple to understand and implement
  • Range queries are efficient (e.g., “get all users with ID 100-200” hits one shard)
  • Easy to add new shards (just extend the range)

Cons:

  • Can create hot spots. If new users get sequential IDs, Shard 4 (the newest shard) gets all the write traffic while older shards sit idle
  • Uneven data distribution if the key is not uniformly distributed
Hot Spot Problem with Range Sharding:

  Shard 1: user_id 1-10M      (old users, low activity)    --> 100 qps
  Shard 2: user_id 10M-20M    (medium users, some activity) --> 500 qps
  Shard 3: user_id 20M-30M    (newer users, active)         --> 2,000 qps
  Shard 4: user_id 30M-40M    (newest users, all new writes) --> 10,000 qps

  Shard 4 is overwhelmed while Shard 1 is nearly idle.

2. Hash-Based Sharding

Hash the shard key and use the hash to determine the shard.

Hash-Based Sharding:

  shard_number = hash(user_id) % number_of_shards

  hash(user_id: 1)       = 89234 % 4 = 2  --> Shard 2
  hash(user_id: 1000001) = 45612 % 4 = 0  --> Shard 0
  hash(user_id: 2000001) = 73281 % 4 = 1  --> Shard 1
  hash(user_id: 3999999) = 12890 % 4 = 2  --> Shard 2

  Data is evenly distributed regardless of the key pattern.

Pros:

  • Even data distribution (no hot spots from sequential keys)
  • Works with any key type

Cons:

  • Range queries are expensive (must query all shards)
  • Adding or removing shards requires rehashing (use consistent hashing to minimize this)

For hash-based sharding, you should use consistent hashing (covered in the previous article) to minimize data movement when the number of shards changes.

3. Directory-Based Sharding

Use a lookup table that maps each key to its shard.

Directory-Based Sharding:

  Lookup Table:
    user_id 1-500        --> Shard A
    user_id 501-1200     --> Shard B
    user_id 1201-1500    --> Shard A  (note: non-contiguous!)
    user_id 1501-3000    --> Shard C
    user_id "premium"    --> Shard D  (special rule)

  The directory service stores these mappings.
  Every query first checks the directory to find the right shard.

Pros:

  • Maximum flexibility — any mapping rule you want
  • Easy to move specific ranges between shards
  • Can apply custom logic (e.g., VIP users on dedicated shard)

Cons:

  • The directory is a single point of failure
  • Extra network hop for every query (to check the directory)
  • Directory becomes a bottleneck at high traffic

4. Geography-Based Sharding

Split data based on the user’s geographic location.

Geography-Based Sharding:

  Shard EU:   users in Europe       --> Frankfurt data center
  Shard US:   users in Americas     --> Virginia data center
  Shard APAC: users in Asia-Pacific --> Singapore data center

  User in Berlin:    --> Shard EU (low latency: ~5ms)
  User in New York:  --> Shard US (low latency: ~5ms)
  User in Tokyo:     --> Shard APAC (low latency: ~5ms)

  Without geo-sharding:
  User in Tokyo --> US data center (high latency: ~200ms)

Pros:

  • Low latency — data is physically close to the user
  • Compliance — EU data stays in EU (GDPR)
  • Natural traffic separation (different time zones peak at different times)

Cons:

  • Uneven distribution (more users in some regions)
  • Cross-region queries are slow
  • Complex data migration when users move regions

Choosing a Shard Key

The shard key is the most important decision in any sharding strategy. A bad shard key leads to hot spots, uneven data distribution, and complex queries.

Good Shard Key Properties

Good Shard Key Checklist:

  1. High cardinality
     --> Many unique values (user_id: millions of values)
     --> NOT status ("active"/"inactive" — only 2 values)

  2. Even distribution
     --> Values are spread uniformly
     --> NOT created_date (recent dates get all traffic)

  3. Query alignment
     --> Most queries include the shard key
     --> If 90% of queries filter by user_id, use user_id

  4. Low cross-shard operations
     --> Queries rarely need data from multiple shards
     --> A query like "get user's orders" should hit ONE shard

Shard Key Examples

E-commerce Application:

  Good shard key: user_id
    --> Each user's data (orders, cart, preferences) on one shard
    --> "Get user's orders" hits one shard
    --> Even distribution (millions of users)

  Bad shard key: order_status
    --> Only a few values: "pending", "shipped", "delivered"
    --> "pending" shard gets all new orders (hot spot)

  Bad shard key: created_date
    --> Today's shard gets all new data
    --> Old shards sit idle

Chat Application:

  Good shard key: conversation_id
    --> All messages in a conversation on one shard
    --> "Get messages for conversation" hits one shard

  Bad shard key: sender_id
    --> A group message requires reading from many shards
       (one per group member)

Problems with Sharding

1. Cross-Shard Queries

When a query needs data from multiple shards, it must be sent to all shards and the results merged.

Cross-Shard Query Problem:

  Sharded by user_id.

  Query: "Find all orders over $100 in the last month"
    --> This query does not filter by user_id
    --> Must send to ALL shards
    --> Each shard runs the query locally
    --> Results are merged by the application

  This is called "scatter-gather" and it is slow.

2. Cross-Shard Joins

Joining data across shards is very expensive because the data lives on different machines.

Cross-Shard Join Problem:

  Shard 1 has user 1001 (in Berlin)
  Shard 3 has user 5003 (in Tokyo)

  Query: "Find all users who ordered the same product as user 1001"
    --> Need to check orders on Shard 1
    --> Then check all other shards for matching products
    --> Very expensive network operation

  Solution: Denormalize data. Store product information
  alongside orders so you do not need cross-shard joins.

3. Rebalancing Shards

When one shard gets too large or too hot, you need to split it or move data to other shards.

Rebalancing Approaches:

  1. Fixed Number of Partitions:
     Create more partitions than servers from the start.

     10 servers, 1000 partitions
     Each server handles 100 partitions.

     Add 1 server (11 total):
     Move some partitions from existing servers to the new one.
     Each server now handles ~91 partitions.

     Used by: Elasticsearch, Riak, Couchbase

  2. Dynamic Partitioning:
     Split a partition when it gets too large.
     Merge partitions when they get too small.

     Partition A grows beyond 10GB --> split into A1 and A2
     Partition A1 shrinks below 1GB --> merge back

     Used by: HBase, MongoDB

  3. Proportional to Nodes:
     Fixed number of partitions per node.
     When a new node joins, it splits existing partitions.

     Used by: Cassandra (256 virtual nodes per node)

4. Hotspots

Even with a good shard key, some data is accessed more than others.

Celebrity Problem:

  Shard key: user_id
  Hash-based sharding (even distribution)

  User "celebrity_123" has 50M followers.
  Every post by celebrity_123 generates 50M fan-out operations.
  The shard holding celebrity_123 is overwhelmed.

  Solutions:
    1. Add a random suffix: celebrity_123_01, celebrity_123_02
       Spread the celebrity's data across multiple shards.
       Application must query all suffixed keys and merge results.

    2. Dedicated shard for hot keys
       Monitor traffic and move hot keys to dedicated resources.

    3. Cache layer in front of the shard
       Cache frequently accessed data to reduce shard load.

Real-World Examples

How Instagram Shards PostgreSQL

Instagram uses PostgreSQL with hash-based sharding. Their shard key is the user ID embedded in every row’s primary key. They use a custom ID generation scheme:

Instagram's ID Format:

  ID = timestamp (41 bits) + shard_id (13 bits) + sequence (10 bits)

  This encodes the shard directly in the ID.
  No directory lookup needed — the shard is in the ID itself.

  41 bits for timestamp: ~69 years of unique timestamps
  13 bits for shard_id:  8,192 possible shards
  10 bits for sequence:  1,024 IDs per millisecond per shard

How Discord Shards Cassandra

Discord stores billions of messages in Cassandra, sharded by channel_id and a time bucket.

Discord's Sharding Strategy:

  Partition key: (channel_id, bucket)

  bucket = message_timestamp / 10_days

  Each partition holds ~10 days of messages for one channel.
  This keeps partition sizes manageable.

  Reading recent messages: hit 1 partition
  Reading old messages: hit a different partition (cold storage)

  This avoids the "unbounded partition" problem where
  a busy channel's partition grows forever.

When NOT to Shard

Most applications do not need sharding. Before you shard, try these simpler approaches first:

Scaling Checklist (before sharding):

  1. Optimize queries (add indexes, rewrite slow queries)
  2. Add read replicas (for read-heavy workloads)
  3. Add a cache layer (Redis for hot data)
  4. Vertical partitioning (separate hot/cold columns)
  5. Archive old data (move old rows to cold storage)
  6. Vertical scaling (bigger machine)

  If NONE of these work --> consider sharding.

  Rules of thumb:
    < 100GB of data:   probably do not need sharding
    < 10K queries/sec:  probably do not need sharding
    Single PostgreSQL can handle 500K+ rows/sec writes
    and serve 100K+ queries/sec with proper indexes.

Sharding adds significant complexity. You lose cross-shard joins, transactions become harder, and your application needs to know about the sharding scheme. Only shard when the benefits outweigh the costs.

Interview Tips

When discussing data partitioning in interviews:

  1. Start with why. “The data is too large for a single server, so I will shard the database by user_id.”

  2. Choose the right strategy. “I will use hash-based sharding with consistent hashing to ensure even distribution and minimal data movement when adding shards.”

  3. Justify your shard key. “I chose user_id because most queries filter by user. This means most queries hit a single shard.”

  4. Address cross-shard queries. “For queries that span multiple shards, like ’trending products,’ I will use a separate denormalized table or a search index.”

  5. Mention rebalancing. “I will use fixed partitioning with more partitions than servers, so adding a server just moves some partitions.”

  6. Know when NOT to shard. “For our estimated data size of 50GB, a single PostgreSQL instance with read replicas and caching should be sufficient. Sharding would add unnecessary complexity.”

What’s Next?

In the next article, System Design #13: Design a URL Shortener, you will learn:

  • How to design a URL shortener like bit.ly from scratch
  • Base62 encoding for short codes
  • Back-of-the-envelope estimation for 100M URLs per day
  • Caching, sharding, and analytics

This is part 12 of the System Design Tutorial series. Follow along to learn system design from scratch.