In the previous article, you learned about horizontal scaling — adding more servers to handle more traffic. But when you have multiple servers, how do you distribute traffic across them?

That is what a load balancer does. It is one of the most important components in any scalable system.

What is a Load Balancer?

A load balancer is a device or software that distributes incoming network traffic across multiple servers. Think of it as a traffic director at a busy intersection.

Without a load balancer:
  [All users] --> [Single Server]   (server overloaded, crashes)

With a load balancer:
  [All users] --> [Load Balancer] --> [Server 1]  (handles 33% of traffic)
                                  --> [Server 2]  (handles 33% of traffic)
                                  --> [Server 3]  (handles 33% of traffic)

The load balancer sits between users and your servers. Users send requests to the load balancer, and it forwards each request to one of the available servers.

Why You Need Load Balancers

1. Distribute Traffic Evenly

Without a load balancer, all traffic hits one server. That server gets overwhelmed while others sit idle. A load balancer spreads the work evenly.

2. High Availability

If one server crashes, the load balancer stops sending traffic to it. Users are automatically redirected to healthy servers. They might not even notice anything went wrong.

Normal operation:
  [LB] --> [Server 1] OK
       --> [Server 2] OK
       --> [Server 3] OK

Server 2 crashes:
  [LB] --> [Server 1] OK     (handles 50% now)
       --> [Server 2] DEAD   (LB stops sending traffic here)
       --> [Server 3] OK     (handles 50% now)

3. Zero-Downtime Deployments

When you deploy new code, you can update servers one at a time. The load balancer routes traffic away from the server being updated. This is called a rolling deployment.

4. SSL Termination

The load balancer can handle HTTPS encryption and decryption. Your backend servers only deal with plain HTTP, which is simpler and faster.

[Client] --HTTPS--> [Load Balancer] --HTTP--> [Server 1]
                                    --HTTP--> [Server 2]

The LB handles encryption. Backend servers stay simple.

Load Balancing Algorithms

The load balancer needs to decide which server gets each request. There are several algorithms for this.

1. Round Robin

The simplest algorithm. Requests are distributed to servers in order, one by one. After the last server, it starts again from the first.

Request 1 --> Server 1
Request 2 --> Server 2
Request 3 --> Server 3
Request 4 --> Server 1  (back to the start)
Request 5 --> Server 2
Request 6 --> Server 3
...

Pros: Very simple. Works well when all servers have equal capacity. Cons: Does not account for server load. A slow server gets the same traffic as a fast one.

Here is how round robin works in code:

package main

import "fmt"

type LoadBalancer struct {
    servers []string
    current int
}

func NewLoadBalancer(servers []string) *LoadBalancer {
    return &LoadBalancer{servers: servers, current: 0}
}

func (lb *LoadBalancer) NextServer() string {
    server := lb.servers[lb.current]
    lb.current = (lb.current + 1) % len(lb.servers)
    return server
}

func main() {
    lb := NewLoadBalancer([]string{
        "server-1:8080",
        "server-2:8080",
        "server-3:8080",
    })

    // Simulate 6 requests
    for i := 1; i <= 6; i++ {
        fmt.Printf("Request %d --> %s\n", i, lb.NextServer())
    }
}

Output:

Request 1 --> server-1:8080
Request 2 --> server-2:8080
Request 3 --> server-3:8080
Request 4 --> server-1:8080
Request 5 --> server-2:8080
Request 6 --> server-3:8080

2. Weighted Round Robin

Like round robin, but servers with more capacity get more requests. You assign a weight to each server.

Server 1 (weight 5): handles 5 out of every 8 requests
Server 2 (weight 2): handles 2 out of every 8 requests
Server 3 (weight 1): handles 1 out of every 8 requests

Use case: When your servers have different hardware. A 32-core server should get more traffic than an 8-core server.

3. Least Connections

Sends each new request to the server with the fewest active connections. This adapts to real-time server load.

Current state:
  Server 1: 15 active connections
  Server 2: 8 active connections  <-- next request goes here
  Server 3: 12 active connections

Pros: Adapts to actual server load. Handles slow requests well. Cons: Slightly more overhead — the load balancer must track connection counts.

Use case: When requests have varying processing times. Some requests take 10ms, others take 5 seconds. Round robin would overload some servers, but least connections balances naturally.

4. Least Response Time

Sends requests to the server that responds fastest. The load balancer measures response times and picks the quickest server.

Response time measurements:
  Server 1: avg 45 ms
  Server 2: avg 12 ms  <-- next request goes here
  Server 3: avg 30 ms

Pros: Optimizes for user experience — requests go to the fastest server. Cons: More overhead — the load balancer must measure response times continuously.

5. IP Hash

Uses the client’s IP address to determine which server handles the request. The same IP always goes to the same server.

Hash function:
  IP 192.168.1.1 --> hash --> Server 2
  IP 10.0.0.5    --> hash --> Server 1
  IP 172.16.0.8  --> hash --> Server 3

Same IP always goes to the same server.

Pros: Session affinity without cookies. Good for stateful applications. Cons: Uneven distribution if some IPs send much more traffic than others. Adding or removing servers changes the mapping for many clients.

6. Random

Picks a server at random for each request. Surprisingly effective when you have many servers.

Pros: Very simple, no state to track. Cons: Can cause uneven distribution with a small number of servers. With many servers, the law of large numbers makes it nearly even.

Algorithm Comparison

AlgorithmComplexityEven DistributionConsiders LoadSession Affinity
Round RobinVery lowYes (equal servers)NoNo
Weighted Round RobinLowYes (weighted)NoNo
Least ConnectionsMediumYes (adaptive)YesNo
Least Response TimeMediumYes (adaptive)YesNo
IP HashLowDepends on IPsNoYes
RandomVery lowApproximatelyNoNo

Layer 4 vs Layer 7 Load Balancing

Load balancers operate at different layers of the network stack. The two most common are Layer 4 and Layer 7.

Layer 4 (Transport Layer)

Layer 4 load balancers work at the TCP/UDP level. They see IP addresses and port numbers but do not inspect the actual content of requests.

Layer 4 sees:
  Source IP: 192.168.1.1
  Destination IP: 10.0.0.100
  Source Port: 54321
  Destination Port: 443
  Protocol: TCP

Layer 4 does NOT see:
  URL path, HTTP headers, cookies, request body

Pros: Very fast because it does not parse the request content. Lower latency. Cons: Cannot route based on URL, headers, or content.

Use case: Generic TCP load balancing, database connections, game servers.

Layer 7 (Application Layer)

Layer 7 load balancers work at the HTTP level. They can inspect URLs, headers, cookies, and request bodies.

Layer 7 sees everything:
  URL: /api/users/123
  Method: GET
  Headers: Authorization: Bearer token123
  Cookie: session=abc456
  Content-Type: application/json

This allows smart routing decisions:

Layer 7 routing rules:
  /api/*       --> API servers (Server 1, 2, 3)
  /images/*    --> Image servers (Server 4, 5)
  /admin/*     --> Admin server (Server 6)
  /health      --> Return 200 directly (no backend needed)

Pros: Intelligent routing based on content. Can modify headers, rewrite URLs, handle SSL. Cons: Slower than Layer 4 because it must parse the full request.

Use case: Web applications, APIs, microservices.

L4 vs L7 — Side by Side

FeatureLayer 4Layer 7
SpeedVery fastSlower (parses content)
RoutingIP + port onlyURL, headers, cookies
SSL terminationLimitedFull support
Content routingNoYes
Health checksTCP pingHTTP health endpoint
CostLowerHigher

Most web applications use Layer 7 load balancers because the content-based routing is essential. Layer 4 is used for non-HTTP protocols or when maximum speed is needed.

Health Checks

A load balancer must know if a server is healthy. If a server crashes or becomes unresponsive, the load balancer should stop sending traffic to it.

How Health Checks Work

The load balancer periodically sends a request to each server and checks the response:

Health check configuration:
  Endpoint: /health
  Interval: 10 seconds
  Timeout: 5 seconds
  Healthy threshold: 3 consecutive successes
  Unhealthy threshold: 2 consecutive failures

Timeline:
  10:00:00 - Check Server 1 --> 200 OK (healthy)
  10:00:00 - Check Server 2 --> 200 OK (healthy)
  10:00:00 - Check Server 3 --> timeout (failure 1)
  10:00:10 - Check Server 3 --> timeout (failure 2 = UNHEALTHY)
  --> Load balancer removes Server 3 from rotation

  10:00:20 - Check Server 3 --> 200 OK (success 1)
  10:00:30 - Check Server 3 --> 200 OK (success 2)
  10:00:40 - Check Server 3 --> 200 OK (success 3 = HEALTHY again)
  --> Load balancer adds Server 3 back to rotation

Types of Health Checks

  • TCP check. Can the load balancer open a TCP connection to the server? (Layer 4)
  • HTTP check. Does the server return a 200 status code at /health? (Layer 7)
  • Deep health check. Does the server check its database connection, cache connection, and disk space before reporting healthy?

A good health check endpoint looks like this:

func healthHandler(w http.ResponseWriter, r *http.Request) {
    // Check database connection
    if err := db.Ping(); err != nil {
        w.WriteHeader(http.StatusServiceUnavailable)
        fmt.Fprintf(w, "database: unhealthy")
        return
    }

    // Check cache connection
    if err := cache.Ping(); err != nil {
        w.WriteHeader(http.StatusServiceUnavailable)
        fmt.Fprintf(w, "cache: unhealthy")
        return
    }

    w.WriteHeader(http.StatusOK)
    fmt.Fprintf(w, "ok")
}

Nginx

The most popular open-source web server and load balancer. Used by millions of websites.

# Nginx load balancer configuration example
upstream backend {
    server server1.example.com:8080;
    server server2.example.com:8080;
    server server3.example.com:8080;
}

server {
    listen 80;

    location / {
        proxy_pass http://backend;
    }
}

Nginx is a Layer 7 load balancer. It supports round robin, weighted round robin, least connections, and IP hash.

HAProxy

A dedicated load balancer known for high performance. It is the go-to choice for high-traffic sites.

HAProxy supports both Layer 4 and Layer 7 load balancing. It is used by companies like GitHub, Reddit, and Stack Overflow.

Cloud Load Balancers

Cloud providers offer managed load balancers:

  • AWS ALB (Application Load Balancer) — Layer 7, great for HTTP/HTTPS
  • AWS NLB (Network Load Balancer) — Layer 4, ultra-low latency
  • Google Cloud Load Balancing — global, Layer 7
  • Azure Load Balancer — Layer 4 and Layer 7 options

Managed load balancers are easier to set up and maintain. You do not need to manage the load balancer servers yourself. They auto-scale and handle failover automatically.

Load Balancing for Databases

Load balancers are not just for web servers. You can also use them for databases.

Database load balancing with read replicas:

Write requests:
  [App Server] --> [Primary Database]  (all writes go here)

Read requests:
  [App Server] --> [LB] --> [Read Replica 1]
                         --> [Read Replica 2]
                         --> [Read Replica 3]

The primary database handles all writes. Read replicas are copies that handle read requests. A load balancer distributes reads across replicas.

This works well because most applications are read-heavy. A typical web app has 90-95% reads and 5-10% writes. By adding read replicas, you can multiply your read capacity.

DNS-Based Load Balancing

DNS load balancing distributes traffic at the DNS level. When a user types your domain name, DNS returns different IP addresses for different users.

DNS query: api.example.com

Response to User 1: 10.0.0.1  (Datacenter US-East)
Response to User 2: 10.0.1.1  (Datacenter EU-West)
Response to User 3: 10.0.0.1  (Datacenter US-East)
Response to User 4: 10.0.2.1  (Datacenter AP-Tokyo)

Pros: Geographic routing (users connect to the nearest datacenter), no single point of failure. Cons: DNS caching means changes take time to propagate. Less granular control than a dedicated load balancer.

DNS load balancing is often used as the first layer of load balancing, routing users to the right region. Then a dedicated load balancer within each region handles the fine-grained distribution.

Global architecture:

[User in Europe] --> [DNS] --> [EU Load Balancer] --> [EU Server 1]
                                                  --> [EU Server 2]

[User in USA]    --> [DNS] --> [US Load Balancer] --> [US Server 1]
                                                  --> [US Server 2]

Common Load Balancer Patterns

The Typical Web Architecture

Most production web applications use this pattern:

[Internet] --> [DNS] --> [CDN (static content)]
                      --> [Load Balancer (L7)]
                          --> [App Server 1]
                          --> [App Server 2]
                          --> [App Server 3]
                              --> [Cache (Redis)]
                              --> [Database (Primary)]
                                  --> [Read Replica 1]
                                  --> [Read Replica 2]

Static content (images, CSS, JavaScript) goes through the CDN. Dynamic requests go through the load balancer to the application servers.

Multiple Load Balancer Tiers

Large systems use multiple layers of load balancing:

[Internet] --> [Global LB (DNS)] --> [Regional LB (L4)]
                                     --> [Service LB (L7)]
                                         --> [API Servers]
                                     --> [Service LB (L7)]
                                         --> [Auth Servers]

Each tier adds more specific routing. The global tier routes by geography. The regional tier handles raw TCP distribution. The service tier routes by URL path to different microservices.

Interview Tips

When discussing load balancers in a system design interview:

  1. Always include a load balancer when you have multiple servers. It is expected.
  2. Know the algorithms. Be able to explain round robin, least connections, and IP hash. Know when to use each.
  3. Mention health checks. Show that you think about failure scenarios.
  4. Know L4 vs L7. Explain why you would choose one over the other.
  5. Consider the load balancer as a single point of failure. Mention that in production, you use redundant load balancers (active-passive or active-active pairs).

What’s Next?

In the next article, System Design #4: Caching — Redis, Memcached, CDN, you will learn:

  • What caching is and why it is critical for performance
  • Caching strategies: cache-aside, write-through, write-behind
  • Cache eviction policies: LRU, LFU, TTL
  • Redis vs Memcached
  • CDN and browser caching
  • The thundering herd problem

This is part 3 of the System Design Tutorial series. Follow along to learn system design from scratch.