You know how to write code. You can build features, fix bugs, and ship apps. But when someone asks you to design a system that handles millions of users, you freeze.

This is what system design is about. It is the skill of building software systems that work at scale. And it is not just for senior engineers or job interviews. Every developer needs it.

This is the first article in the System Design from Zero to Senior series. We will start from the basics and build up to designing real systems like URL shorteners, chat apps, and video streaming platforms.

What is System Design?

System design is the process of defining the architecture, components, and data flow of a software system. It answers one big question: how do you build a system that works reliably for many users?

When you write a simple app, you have one server, one database, and a few users. Everything works fine. But when your app grows to 10,000 users, then 1 million, then 100 million, things break.

System design is about planning for that growth.

Here is a simple example. Imagine you build a photo sharing app:

Small scale (100 users):
  [User] --> [Single Server] --> [Single Database]

  This works fine. One machine handles everything.

Large scale (10 million users):
  [Users] --> [Load Balancer] --> [Server 1]  --> [Database Cluster]
                               --> [Server 2]  --> [Cache (Redis)]
                               --> [Server 3]  --> [Object Storage (S3)]
                               --> [Server N]  --> [CDN for images]

  Now you need many machines working together.

The small version is just coding. The large version is system design.

Why System Design Matters

1. Your Code Will Break at Scale

A database query that takes 50 milliseconds with 100 rows takes 5 seconds with 10 million rows. An API that works for 10 users crashes with 10,000 concurrent requests. A file upload feature that stores files on one server fails when you add a second server.

System design teaches you to think about these problems before they happen.

2. It Is Not Just for Interviews

Yes, companies like Google, Amazon, Meta, and Netflix ask system design questions in interviews. But the real value is in your daily work:

  • Choosing the right database for your project (SQL vs NoSQL)
  • Deciding where to put a cache to speed up slow queries
  • Planning how your services communicate (REST, message queues, gRPC)
  • Handling failures without losing data

Every architecture decision you make is system design.

3. It Makes You a Better Engineer

Junior developers think about functions and classes. Senior developers think about systems. When you understand system design, you can:

  • Review architecture proposals and spot problems
  • Suggest improvements to existing systems
  • Make better technology choices
  • Communicate with infrastructure and DevOps teams

High-Level Design vs Low-Level Design

System design has two levels:

High-Level Design (HLD) is the big picture. It answers: what components do we need and how do they connect?

High-Level Design for a chat app:

[Mobile App] --> [API Gateway] --> [Chat Service] --> [Message Queue]
                                                   --> [Database]
                               --> [Auth Service]  --> [User Database]
                               --> [Notification]  --> [Push Service]

HLD covers servers, databases, caches, load balancers, and how data flows between them.

Low-Level Design (LLD) is the details. It answers: how does each component work internally?

For the Chat Service above, LLD would cover:

  • Database schema (tables, columns, indexes)
  • API endpoints (POST /messages, GET /conversations)
  • Data structures (how messages are stored in memory)
  • Class diagrams and design patterns

This series focuses on high-level design because it is the foundation. You cannot design good internals without understanding the big picture first.

Functional vs Non-Functional Requirements

Every system has two types of requirements:

Functional requirements describe what the system does:

  • Users can send messages
  • Users can upload photos
  • Users can search for other users

Non-functional requirements describe how well the system does it:

  • Scalability — handle 10 million users
  • Availability — 99.99% uptime (less than 53 minutes of downtime per year)
  • Latency — respond in under 200 milliseconds
  • Durability — never lose data
  • Consistency — all users see the same data

Non-functional requirements are what make system design hard. Building a chat app for 10 users is easy. Building one for 10 million users with 99.99% uptime and sub-200ms latency is a real engineering challenge.

Back-of-the-Envelope Estimation

Before designing a system, you need to estimate the scale. This is called back-of-the-envelope estimation. It uses simple math to figure out how much storage, bandwidth, and computing power you need.

Here are some numbers every developer should know:

Storage:
  1 character = 1 byte (UTF-8, ASCII)
  1 KB = 1,000 bytes (a short email)
  1 MB = 1,000 KB (a high-resolution photo)
  1 GB = 1,000 MB (a short movie)
  1 TB = 1,000 GB (a large database)

Time:
  1 ns   = L1 cache access
  100 ns = RAM access
  1 us   = Mutex lock/unlock
  1 ms   = Network round trip (same datacenter)
  1 ms   = Read 1 MB sequentially from SSD
  10 ms  = Disk seek (HDD)
  100 ms = Network round trip (cross-continent)

Scale:
  1 million seconds = ~11.5 days
  1 billion seconds = ~31.7 years

Example estimation: Imagine you are building a photo sharing app with 10 million daily active users. Each user uploads 2 photos per day. Each photo is 2 MB on average.

Daily uploads:
  10,000,000 users x 2 photos = 20,000,000 photos per day

Daily storage:
  20,000,000 photos x 2 MB = 40 TB per day

Yearly storage:
  40 TB x 365 days = 14.6 PB per year

Upload rate:
  20,000,000 photos / 86,400 seconds = ~231 uploads per second

Now you know you need a storage system that handles 40 TB per day and at least 231 writes per second. A single server cannot do this. You need a distributed storage system like Amazon S3 or a similar object store.

This kind of math helps you make design decisions before writing any code.

The System Design Framework

Whether you are in an interview or designing a real system, follow these five steps:

Step 1: Clarify Requirements (5 minutes)

Never jump into designing. First, ask questions:

  • Who are the users? How many?
  • What are the most important features? (Focus on 3-5 core features)
  • What are the non-functional requirements? (Latency, availability, scale)
  • Are there any constraints? (Budget, existing tech stack, timeline)

Example: “Design a URL shortener”

  • How many URLs per day? (100 million)
  • How long should shortened URLs last? (5 years default)
  • Should users see click analytics? (Yes, basic counts)
  • What is the expected read-to-write ratio? (100:1 — reads are much more common)

Step 2: Estimate Scale (3 minutes)

Use back-of-the-envelope math:

URL Shortener estimates:
  Writes: 100 million URLs/day = ~1,160 URLs/second
  Reads: 100:1 ratio = 116,000 reads/second
  Storage: 100M URLs x 500 bytes = 50 GB/day
  5 years: 50 GB x 365 x 5 = ~91 TB total

Step 3: Design the High-Level Architecture (10 minutes)

Draw the main components and how they connect:

URL Shortener — High-Level Design:

[Client] --> [Load Balancer] --> [App Server 1] --> [Database]
                              --> [App Server 2] --> [Cache (Redis)]
                              --> [App Server N]

Write flow:
  1. Client sends long URL
  2. App server generates short code
  3. Store mapping in database
  4. Return short URL

Read flow:
  1. Client visits short URL
  2. App server checks cache first
  3. If not in cache, check database
  4. Redirect to long URL

Step 4: Deep Dive (15 minutes)

Pick the most important or complex parts and go deeper:

  • Short code generation: Use Base62 encoding (a-z, A-Z, 0-9). A 7-character code gives 62^7 = 3.5 trillion unique URLs.
  • Database choice: Key-value store like DynamoDB or a simple SQL table. Reads are much more frequent than writes.
  • Caching: Use Redis to cache popular URLs. Most URLs follow a power law — 20% of URLs get 80% of traffic.

Step 5: Discuss Trade-offs (5 minutes)

Every design has trade-offs. Good engineers explain them:

  • SQL vs NoSQL? SQL gives consistency, NoSQL gives scale.
  • Cache everything vs cache popular only? More cache costs more money but reduces latency.
  • Single datacenter vs multiple? Multiple datacenters give better availability but add complexity.

Common System Design Topics

This series will cover all of these topics:

Foundations:
  [1] What is System Design (this article)
  [2] Scalability — Horizontal vs Vertical Scaling
  [3] Load Balancers — How They Work
  [4] Caching — Redis, Memcached, CDN
  [5] Databases — SQL vs NoSQL, Sharding, Replication
  [6] CAP Theorem and Consistency Patterns

Building Blocks:
  [7]  Message Queues — Kafka, RabbitMQ, SQS
  [8]  API Design — REST, GraphQL, gRPC
  [9]  Microservices vs Monolith
  [10] Proxies — Forward, Reverse, and API Gateway
  [11] Rate Limiting and Throttling
  [12] Consistent Hashing

Real System Designs:
  [13] Design a URL Shortener
  [14] Design a Chat System (WhatsApp)
  [15] Design a News Feed (Twitter/Instagram)
  [16] Design a Video Streaming Platform (YouTube/Netflix)
  [17] Design a Notification System
  [18] Design a Search Engine

Advanced:
  [19] Distributed Transactions and Saga Pattern
  [20] Observability — Monitoring, Logging, Tracing

Who Needs System Design?

You might think system design is only for senior engineers at big tech companies. That is not true.

Junior developers benefit from system design because it teaches you to think about the big picture. When you understand how systems work, you write better code. You make better decisions about data structures, APIs, and error handling.

Mid-level developers need system design to advance their careers. The gap between mid-level and senior is not about writing more code. It is about making better architecture decisions. System design is that skill.

Frontend developers benefit because modern frontends interact with complex backends. Understanding caching, CDNs, and API design helps you build faster, more reliable user interfaces.

Mobile developers benefit because mobile apps are clients to backend systems. Understanding how the backend scales helps you design better APIs, handle offline mode, and optimize network usage.

DevOps engineers benefit because they deploy and maintain the systems that developers design. Understanding system design helps you make better infrastructure decisions.

In short, if you build software that other people use, you need system design.

Interview Tips

If you are preparing for system design interviews, remember these tips:

  1. Always clarify requirements first. Interviewers want to see that you ask questions before jumping into solutions.
  2. Think out loud. Explain your reasoning. Say “I am choosing a NoSQL database here because the data has no fixed schema and read performance is critical.”
  3. Start simple, then optimize. Begin with a basic design that works. Then add caching, load balancing, and sharding as needed.
  4. Discuss trade-offs. There is no perfect design. Show that you understand the pros and cons of your choices.
  5. Use real numbers. Back-of-the-envelope estimation shows you understand scale.

What’s Next?

In the next article, System Design #2: Scalability — Horizontal vs Vertical Scaling, you will learn:

  • What scalability means and why it matters
  • Vertical scaling (bigger machines) vs horizontal scaling (more machines)
  • Stateless vs stateful services
  • How Netflix, Uber, and Instagram handle scale
  • When to scale up vs scale out

This is part 1 of the System Design Tutorial series. Follow along to learn system design from scratch.