In the previous article, you designed a news feed. Now let us design a video streaming service like YouTube or Netflix.

Video streaming is a complex system with two major pipelines: uploading and processing videos, and streaming them to viewers. Let us break it down step by step.

Step 1: Requirements

Functional Requirements

  1. Upload videos
  2. Stream/watch videos
  3. Search for videos
  4. Like, comment, and subscribe
  5. Video recommendations
  6. Multiple video quality options (360p, 720p, 1080p, 4K)

Non-Functional Requirements

  1. High availability — videos should always be watchable
  2. Low latency — video should start playing within 2 seconds
  3. Smooth playback — no buffering on stable connections
  4. Support 1 billion daily active users
  5. Support 5 billion video views per day

Step 2: Estimation

Daily Active Users: 1 billion
Video views per day: 5 billion
Videos uploaded per day: 500,000

Average video size (original): 500 MB
Average video duration: 5 minutes

Upload storage per day:
  500,000 videos * 500 MB = 250 TB/day (original files)

After transcoding (multiple resolutions + formats):
  Each video -> 5 resolutions * 3 formats = 15 versions
  Average transcoded version: 100 MB
  500,000 * 15 * 100 MB = 750 TB/day (transcoded files)

Total storage per day: ~1 PB/day
Total storage per year: ~365 PB/year

Streaming bandwidth:
  5 billion views/day
  Average bitrate: 5 Mbps (1080p)
  Average watch time: 3 minutes
  Total bandwidth: 5B * 5 Mbps * 180 sec = 4.5 exabits/day
  ~52 Tbps average bandwidth

Step 3: Two Main Pipelines

A video streaming service has two distinct pipelines that work independently.

Pipeline 1: Video Upload + Processing
  [Creator uploads video] --> [Process] --> [Store in multiple formats]

Pipeline 2: Video Streaming
  [Viewer requests video] --> [Serve from CDN] --> [Adaptive playback]

These are separate systems with different requirements:
  Upload: write-heavy, can be slow (minutes), needs processing
  Streaming: read-heavy, must be fast (milliseconds), needs low latency

Step 4: Video Upload Pipeline

Upload Flow

Video Upload Flow:

  1. Creator selects a video file on their device
  2. Client splits the file into chunks (5 MB each)
  3. Chunks are uploaded in parallel to the upload service
  4. Upload service reassembles chunks and stores the original
  5. Upload service sends a "video uploaded" event to Kafka
  6. Transcoding pipeline picks up the event

  Why chunked upload?
    - Resume interrupted uploads (re-upload only failed chunks)
    - Parallel upload for faster speeds
    - Progress tracking per chunk

  [Client] --chunks--> [Upload Service] --store--> [Original Storage (S3)]
                                |
                          [Kafka: "video_uploaded"]
                                |
                     [Transcoding Pipeline]

Video Transcoding

Transcoding converts the original video into multiple resolutions and formats so it plays on any device and network speed.

Transcoding Pipeline (DAG):

  Original video (1080p, 2 GB, MOV format)
        |
   [Split into segments] (10-second chunks for parallel processing)
        |
   +----+----+----+----+
   |    |    |    |    |
   v    v    v    v    v
  [Segment 1] [Segment 2] [Segment 3] ... [Segment N]
   Each segment is transcoded independently:
        |
   +----+----+----+
   |    |    |    |
   v    v    v    v
  [360p] [720p] [1080p] [4K]
   H.264   H.264   H.264   H.264
   H.265   H.265   H.265   H.265
   VP9     VP9     VP9     VP9
        |
   [Generate thumbnails] (every 10 seconds for preview scrubbing)
   [Extract audio tracks] (separate audio for different languages)
   [Generate subtitles] (auto-generated captions with speech-to-text)
        |
   [Reassemble segments into complete video files]
        |
   [Store all versions in blob storage (S3)]
        |
   [Update video metadata: "transcoding complete"]
        |
   [Replicate to CDN edge servers]

  Output for one 5-minute video:
    12 video files (4 resolutions * 3 formats)
    + thumbnails
    + audio tracks
    + subtitles
    Total: ~3 GB stored per video

Why Multiple Formats?

Video Formats:

  H.264 (AVC):
    - Universal compatibility (plays on everything)
    - Older codec, larger file sizes
    - Default choice for maximum device support

  H.265 (HEVC):
    - 30-50% smaller files than H.264 at same quality
    - Requires newer devices
    - Used by Apple devices, newer Android phones

  VP9:
    - Google's open-source codec
    - Similar compression to H.265
    - Used by YouTube, Chrome, Android

  AV1 (newest):
    - 30% smaller than H.265
    - Open-source, royalty-free
    - Slow to encode but great quality
    - YouTube is migrating to AV1 for popular videos

Step 5: Video Streaming Pipeline

Adaptive Bitrate Streaming

The key technology behind smooth video playback. The video player automatically switches between quality levels based on the viewer’s bandwidth.

Adaptive Bitrate Streaming:

  Video is split into small segments (2-10 seconds each).
  Each segment is available in multiple quality levels.

  Manifest file (playlist):
    #EXTM3U
    #EXT-X-STREAM-INF:BANDWIDTH=400000,RESOLUTION=640x360
    video_360p/playlist.m3u8
    #EXT-X-STREAM-INF:BANDWIDTH=1500000,RESOLUTION=1280x720
    video_720p/playlist.m3u8
    #EXT-X-STREAM-INF:BANDWIDTH=4000000,RESOLUTION=1920x1080
    video_1080p/playlist.m3u8
    #EXT-X-STREAM-INF:BANDWIDTH=15000000,RESOLUTION=3840x2160
    video_4k/playlist.m3u8

  The player:
    1. Downloads the manifest file
    2. Starts with a low quality segment
    3. Measures download speed
    4. If bandwidth is high: switch to higher quality
    5. If bandwidth drops: switch to lower quality
    6. Switches happen per segment (every 2-10 seconds)

  User experience:
    Fast network: 1080p or 4K playback
    Slow network: 360p or 720p (but no buffering!)
    Network fluctuates: quality adjusts automatically

Streaming Protocols

HLS (HTTP Live Streaming):
  - Created by Apple
  - Uses .m3u8 manifest files
  - Segment size: 6-10 seconds (default)
  - Supported by: Safari, iOS, most devices
  - Most widely used streaming protocol

DASH (Dynamic Adaptive Streaming over HTTP):
  - Open standard (ISO)
  - Uses .mpd manifest files
  - Segment size: 2-10 seconds
  - Supported by: Chrome, Firefox, Android
  - Used by YouTube, Netflix

Both protocols:
  - Use HTTP (works with standard CDNs and firewalls)
  - Support adaptive bitrate
  - Split video into segments

YouTube uses DASH. Netflix uses both HLS and DASH.

Step 6: CDN (Content Delivery Network)

CDN is the backbone of video streaming. It stores copies of video segments on edge servers close to viewers.

CDN Architecture:

  Without CDN:
    Viewer in Tokyo --> Origin server in US --> 200ms latency per segment
    Buffering, slow start

  With CDN:
    Viewer in Tokyo --> CDN edge in Tokyo --> 5ms latency per segment
    Smooth playback, instant start

  CDN Strategy:
    Popular videos: cached on edge servers worldwide (push)
    Long-tail videos: cached on-demand when requested (pull)

  [Viewer] --> [CDN Edge Server (Tokyo)]
                    |
                    | Cache miss? Fetch from origin
                    v
              [CDN Regional Hub (Singapore)]
                    |
                    | Cache miss? Fetch from origin
                    v
              [Origin Storage (US)]

  Cache layers:
    L1: CDN edge (closest to user) -- 70% hit rate
    L2: CDN regional hub            -- 20% hit rate
    L3: Origin storage              -- 10% of requests reach here

Cost Optimization

Video Popularity Distribution:

  Pareto principle (80/20 rule):
    Top 1% of videos: 80% of all views
    Top 10% of videos: 95% of all views
    Remaining 90% of videos: 5% of views (long tail)

  Cost strategy:
    Popular videos (top 10%):
      - Cached on all CDN edge servers worldwide
      - Transcoded in all formats and resolutions
      - Highest priority for CDN replication

    Long-tail videos (bottom 90%):
      - Cached only in the region where viewers are
      - Transcoded in fewer formats (H.264 only)
      - Lower resolution options only (up to 1080p)
      - Stored on cheaper storage (S3 Infrequent Access)

  This saves 50-70% on CDN and storage costs.

Step 7: Video Metadata

Video metadata (title, description, view count, likes) is stored separately from video files.

Video Metadata:

  Database: PostgreSQL (or MySQL)
    Strong consistency for view counts and likes
    Full-text search capability

  Schema:
    videos:
      | id        | title              | description | uploader_id | status      |
      |-----------|--------------------|-----------  |-------------|-------------|
      | vid_001   | "Go Tutorial #1"   | "Learn..." | user_123    | published   |
      | vid_002   | "System Design"    | "How to..." | user_456    | transcoding |

    video_stats:
      | video_id  | views      | likes    | dislikes | comments |
      |-----------|------------|----------|----------|----------|
      | vid_001   | 1,234,567  | 45,000   | 200      | 3,400    |

  View count problem:
    Updating view count on every view would overwhelm the database.
    Solution: batch updates.
      1. Count views in Redis: INCR views:vid_001
      2. Every 60 seconds, flush Redis counts to PostgreSQL
      3. View count is eventually consistent (acceptable)
Video Search:

  Technology: Elasticsearch

  Index fields:
    - title (highest weight)
    - description
    - tags
    - channel name
    - auto-generated captions

  When a video is uploaded:
    1. Extract metadata
    2. Generate captions (speech-to-text)
    3. Index in Elasticsearch

  Query: "go tutorial for beginners"
    --> Elasticsearch matches title, description, tags
    --> Results ranked by: relevance + view count + recency + channel authority
    --> Return top 20 results

  Autocomplete:
    - Trie data structure or Elasticsearch "completion suggester"
    - Shows suggestions as the user types
    - Based on popular search queries

Step 9: Recommendation Engine

Recommendation (simplified):

  Two main approaches:

  1. Collaborative Filtering:
     "Users who watched X also watched Y"
     Based on viewing patterns of similar users.

  2. Content-Based Filtering:
     "This video is about Go programming, here are more Go videos"
     Based on video attributes (title, tags, category).

  In practice (YouTube's approach):
    1. Candidate generation:
       - Get 1000 candidate videos from multiple sources
       - "Popular in your country"
       - "Similar to what you watched"
       - "From channels you subscribe to"

    2. Ranking:
       - ML model scores each candidate
       - Features: watch history, click-through rate, video freshness
       - Top 20-50 videos shown to the user

  For interview purposes: mention both approaches and say
  "the ranking model considers watch history, engagement, and
  content similarity." Do not go deep into ML unless asked.

Step 10: Complete Architecture

Video Upload Pipeline:

  [Creator] --chunks--> [Upload Service]
                              |
                        [Original Storage (S3)]
                              |
                        [Kafka: video_uploaded]
                              |
                    [Transcoding Service (DAG)]
                       /    |    \
               [360p] [720p] [1080p] [4K]
                       \    |    /
                    [Transcoded Storage (S3)]
                              |
                    [CDN Replication]
                              |
                    [Update Metadata: "ready"]

Video Streaming Pipeline:

  [Viewer] --> [CDN Edge]
                   |
              [Cache hit?] -- yes --> [Stream segments]
                   |
                  no
                   |
              [CDN Regional Hub]
                   |
              [Cache hit?] -- yes --> [Stream + cache at edge]
                   |
                  no
                   |
              [Origin Storage (S3)]
                   |
              [Stream + cache at hub + edge]

Supporting Services:

  [Metadata Service] --> [PostgreSQL + Redis]
  [Search Service] --> [Elasticsearch]
  [Recommendation Service] --> [ML Model + Feature Store]
  [Analytics Service] --> [Kafka + ClickHouse]
  [Notification Service] --> [APNs / FCM]

Handling Scale

Scaling Strategy:

  Upload Service:
    - Horizontally scaled (stateless)
    - 500K uploads/day = ~6 uploads/sec
    - 10-20 upload servers is enough

  Transcoding Service:
    - The most compute-intensive part
    - Each video takes 10-30 minutes to transcode
    - 500K videos/day = ~350 videos/min
    - Need hundreds of transcoding workers
    - Use spot instances (AWS) for cost savings (70% cheaper)
    - Priority queue: process popular channels first

  CDN:
    - Use a major CDN provider (Akamai, Cloudflare, AWS CloudFront)
    - Or build your own (YouTube, Netflix)
    - Netflix built Open Connect: custom servers in ISP data centers
    - YouTube uses Google's private network

  Database:
    - Metadata: PostgreSQL with read replicas
    - View counts: Redis + periodic flush to PostgreSQL
    - Comments: sharded by video_id
    - Search: Elasticsearch cluster

  Storage:
    - Hot storage (S3 Standard): popular videos, recent uploads
    - Cold storage (S3 Glacier): old, rarely viewed videos
    - Lifecycle policy: move to cold after 90 days of no views

Common Mistakes

  1. Streaming the original file. Always transcode to multiple resolutions and use adaptive bitrate streaming. The original file is too large and only in one format.

  2. Not using a CDN. Serving video from a central origin server adds latency and costs a fortune in bandwidth. CDN is not optional for video streaming.

  3. Synchronous transcoding. Transcoding takes minutes. Never make the user wait. Use async processing with a message queue and notify when ready.

  4. Counting views synchronously. Updating a database counter for every view at 5 billion views/day would kill the database. Use in-memory counting with periodic batch updates.

Interview Tips

  1. Separate the two pipelines early. “There are two main flows: the upload/transcoding pipeline and the streaming pipeline. Let me design each.”

  2. Mention adaptive bitrate streaming. “I will use HLS or DASH with adaptive bitrate. The player adjusts quality based on the viewer’s bandwidth.”

  3. Discuss the DAG for transcoding. “Transcoding is a directed acyclic graph of tasks: split into segments, transcode each in parallel, generate thumbnails, extract audio.”

  4. Talk about CDN strategy. “Popular videos are pushed to all CDN edges. Long-tail videos are cached on-demand.”

  5. Mention cost optimization. “90% of videos get 5% of views. I will use cheaper storage and fewer transcoded formats for long-tail content.”

What’s Next?

In the next article, System Design #17: Design a File Storage System, you will learn:

  • Block storage and file chunking
  • File sync across devices
  • Deduplication to save storage
  • How Google Drive and Dropbox handle file sync

This is part 16 of the System Design Tutorial series. Follow along to learn system design from scratch.