In the previous article, you designed a news feed. Now let us design a video streaming service like YouTube or Netflix.
Video streaming is a complex system with two major pipelines: uploading and processing videos, and streaming them to viewers. Let us break it down step by step.
Step 1: Requirements
Functional Requirements
- Upload videos
- Stream/watch videos
- Search for videos
- Like, comment, and subscribe
- Video recommendations
- Multiple video quality options (360p, 720p, 1080p, 4K)
Non-Functional Requirements
- High availability — videos should always be watchable
- Low latency — video should start playing within 2 seconds
- Smooth playback — no buffering on stable connections
- Support 1 billion daily active users
- Support 5 billion video views per day
Step 2: Estimation
Daily Active Users: 1 billion
Video views per day: 5 billion
Videos uploaded per day: 500,000
Average video size (original): 500 MB
Average video duration: 5 minutes
Upload storage per day:
500,000 videos * 500 MB = 250 TB/day (original files)
After transcoding (multiple resolutions + formats):
Each video -> 5 resolutions * 3 formats = 15 versions
Average transcoded version: 100 MB
500,000 * 15 * 100 MB = 750 TB/day (transcoded files)
Total storage per day: ~1 PB/day
Total storage per year: ~365 PB/year
Streaming bandwidth:
5 billion views/day
Average bitrate: 5 Mbps (1080p)
Average watch time: 3 minutes
Total bandwidth: 5B * 5 Mbps * 180 sec = 4.5 exabits/day
~52 Tbps average bandwidth
Step 3: Two Main Pipelines
A video streaming service has two distinct pipelines that work independently.
Pipeline 1: Video Upload + Processing
[Creator uploads video] --> [Process] --> [Store in multiple formats]
Pipeline 2: Video Streaming
[Viewer requests video] --> [Serve from CDN] --> [Adaptive playback]
These are separate systems with different requirements:
Upload: write-heavy, can be slow (minutes), needs processing
Streaming: read-heavy, must be fast (milliseconds), needs low latency
Step 4: Video Upload Pipeline
Upload Flow
Video Upload Flow:
1. Creator selects a video file on their device
2. Client splits the file into chunks (5 MB each)
3. Chunks are uploaded in parallel to the upload service
4. Upload service reassembles chunks and stores the original
5. Upload service sends a "video uploaded" event to Kafka
6. Transcoding pipeline picks up the event
Why chunked upload?
- Resume interrupted uploads (re-upload only failed chunks)
- Parallel upload for faster speeds
- Progress tracking per chunk
[Client] --chunks--> [Upload Service] --store--> [Original Storage (S3)]
|
[Kafka: "video_uploaded"]
|
[Transcoding Pipeline]
Video Transcoding
Transcoding converts the original video into multiple resolutions and formats so it plays on any device and network speed.
Transcoding Pipeline (DAG):
Original video (1080p, 2 GB, MOV format)
|
[Split into segments] (10-second chunks for parallel processing)
|
+----+----+----+----+
| | | | |
v v v v v
[Segment 1] [Segment 2] [Segment 3] ... [Segment N]
Each segment is transcoded independently:
|
+----+----+----+
| | | |
v v v v
[360p] [720p] [1080p] [4K]
H.264 H.264 H.264 H.264
H.265 H.265 H.265 H.265
VP9 VP9 VP9 VP9
|
[Generate thumbnails] (every 10 seconds for preview scrubbing)
[Extract audio tracks] (separate audio for different languages)
[Generate subtitles] (auto-generated captions with speech-to-text)
|
[Reassemble segments into complete video files]
|
[Store all versions in blob storage (S3)]
|
[Update video metadata: "transcoding complete"]
|
[Replicate to CDN edge servers]
Output for one 5-minute video:
12 video files (4 resolutions * 3 formats)
+ thumbnails
+ audio tracks
+ subtitles
Total: ~3 GB stored per video
Why Multiple Formats?
Video Formats:
H.264 (AVC):
- Universal compatibility (plays on everything)
- Older codec, larger file sizes
- Default choice for maximum device support
H.265 (HEVC):
- 30-50% smaller files than H.264 at same quality
- Requires newer devices
- Used by Apple devices, newer Android phones
VP9:
- Google's open-source codec
- Similar compression to H.265
- Used by YouTube, Chrome, Android
AV1 (newest):
- 30% smaller than H.265
- Open-source, royalty-free
- Slow to encode but great quality
- YouTube is migrating to AV1 for popular videos
Step 5: Video Streaming Pipeline
Adaptive Bitrate Streaming
The key technology behind smooth video playback. The video player automatically switches between quality levels based on the viewer’s bandwidth.
Adaptive Bitrate Streaming:
Video is split into small segments (2-10 seconds each).
Each segment is available in multiple quality levels.
Manifest file (playlist):
#EXTM3U
#EXT-X-STREAM-INF:BANDWIDTH=400000,RESOLUTION=640x360
video_360p/playlist.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=1500000,RESOLUTION=1280x720
video_720p/playlist.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=4000000,RESOLUTION=1920x1080
video_1080p/playlist.m3u8
#EXT-X-STREAM-INF:BANDWIDTH=15000000,RESOLUTION=3840x2160
video_4k/playlist.m3u8
The player:
1. Downloads the manifest file
2. Starts with a low quality segment
3. Measures download speed
4. If bandwidth is high: switch to higher quality
5. If bandwidth drops: switch to lower quality
6. Switches happen per segment (every 2-10 seconds)
User experience:
Fast network: 1080p or 4K playback
Slow network: 360p or 720p (but no buffering!)
Network fluctuates: quality adjusts automatically
Streaming Protocols
HLS (HTTP Live Streaming):
- Created by Apple
- Uses .m3u8 manifest files
- Segment size: 6-10 seconds (default)
- Supported by: Safari, iOS, most devices
- Most widely used streaming protocol
DASH (Dynamic Adaptive Streaming over HTTP):
- Open standard (ISO)
- Uses .mpd manifest files
- Segment size: 2-10 seconds
- Supported by: Chrome, Firefox, Android
- Used by YouTube, Netflix
Both protocols:
- Use HTTP (works with standard CDNs and firewalls)
- Support adaptive bitrate
- Split video into segments
YouTube uses DASH. Netflix uses both HLS and DASH.
Step 6: CDN (Content Delivery Network)
CDN is the backbone of video streaming. It stores copies of video segments on edge servers close to viewers.
CDN Architecture:
Without CDN:
Viewer in Tokyo --> Origin server in US --> 200ms latency per segment
Buffering, slow start
With CDN:
Viewer in Tokyo --> CDN edge in Tokyo --> 5ms latency per segment
Smooth playback, instant start
CDN Strategy:
Popular videos: cached on edge servers worldwide (push)
Long-tail videos: cached on-demand when requested (pull)
[Viewer] --> [CDN Edge Server (Tokyo)]
|
| Cache miss? Fetch from origin
v
[CDN Regional Hub (Singapore)]
|
| Cache miss? Fetch from origin
v
[Origin Storage (US)]
Cache layers:
L1: CDN edge (closest to user) -- 70% hit rate
L2: CDN regional hub -- 20% hit rate
L3: Origin storage -- 10% of requests reach here
Cost Optimization
Video Popularity Distribution:
Pareto principle (80/20 rule):
Top 1% of videos: 80% of all views
Top 10% of videos: 95% of all views
Remaining 90% of videos: 5% of views (long tail)
Cost strategy:
Popular videos (top 10%):
- Cached on all CDN edge servers worldwide
- Transcoded in all formats and resolutions
- Highest priority for CDN replication
Long-tail videos (bottom 90%):
- Cached only in the region where viewers are
- Transcoded in fewer formats (H.264 only)
- Lower resolution options only (up to 1080p)
- Stored on cheaper storage (S3 Infrequent Access)
This saves 50-70% on CDN and storage costs.
Step 7: Video Metadata
Video metadata (title, description, view count, likes) is stored separately from video files.
Video Metadata:
Database: PostgreSQL (or MySQL)
Strong consistency for view counts and likes
Full-text search capability
Schema:
videos:
| id | title | description | uploader_id | status |
|-----------|--------------------|----------- |-------------|-------------|
| vid_001 | "Go Tutorial #1" | "Learn..." | user_123 | published |
| vid_002 | "System Design" | "How to..." | user_456 | transcoding |
video_stats:
| video_id | views | likes | dislikes | comments |
|-----------|------------|----------|----------|----------|
| vid_001 | 1,234,567 | 45,000 | 200 | 3,400 |
View count problem:
Updating view count on every view would overwhelm the database.
Solution: batch updates.
1. Count views in Redis: INCR views:vid_001
2. Every 60 seconds, flush Redis counts to PostgreSQL
3. View count is eventually consistent (acceptable)
Step 8: Search
Video Search:
Technology: Elasticsearch
Index fields:
- title (highest weight)
- description
- tags
- channel name
- auto-generated captions
When a video is uploaded:
1. Extract metadata
2. Generate captions (speech-to-text)
3. Index in Elasticsearch
Query: "go tutorial for beginners"
--> Elasticsearch matches title, description, tags
--> Results ranked by: relevance + view count + recency + channel authority
--> Return top 20 results
Autocomplete:
- Trie data structure or Elasticsearch "completion suggester"
- Shows suggestions as the user types
- Based on popular search queries
Step 9: Recommendation Engine
Recommendation (simplified):
Two main approaches:
1. Collaborative Filtering:
"Users who watched X also watched Y"
Based on viewing patterns of similar users.
2. Content-Based Filtering:
"This video is about Go programming, here are more Go videos"
Based on video attributes (title, tags, category).
In practice (YouTube's approach):
1. Candidate generation:
- Get 1000 candidate videos from multiple sources
- "Popular in your country"
- "Similar to what you watched"
- "From channels you subscribe to"
2. Ranking:
- ML model scores each candidate
- Features: watch history, click-through rate, video freshness
- Top 20-50 videos shown to the user
For interview purposes: mention both approaches and say
"the ranking model considers watch history, engagement, and
content similarity." Do not go deep into ML unless asked.
Step 10: Complete Architecture
Video Upload Pipeline:
[Creator] --chunks--> [Upload Service]
|
[Original Storage (S3)]
|
[Kafka: video_uploaded]
|
[Transcoding Service (DAG)]
/ | \
[360p] [720p] [1080p] [4K]
\ | /
[Transcoded Storage (S3)]
|
[CDN Replication]
|
[Update Metadata: "ready"]
Video Streaming Pipeline:
[Viewer] --> [CDN Edge]
|
[Cache hit?] -- yes --> [Stream segments]
|
no
|
[CDN Regional Hub]
|
[Cache hit?] -- yes --> [Stream + cache at edge]
|
no
|
[Origin Storage (S3)]
|
[Stream + cache at hub + edge]
Supporting Services:
[Metadata Service] --> [PostgreSQL + Redis]
[Search Service] --> [Elasticsearch]
[Recommendation Service] --> [ML Model + Feature Store]
[Analytics Service] --> [Kafka + ClickHouse]
[Notification Service] --> [APNs / FCM]
Handling Scale
Scaling Strategy:
Upload Service:
- Horizontally scaled (stateless)
- 500K uploads/day = ~6 uploads/sec
- 10-20 upload servers is enough
Transcoding Service:
- The most compute-intensive part
- Each video takes 10-30 minutes to transcode
- 500K videos/day = ~350 videos/min
- Need hundreds of transcoding workers
- Use spot instances (AWS) for cost savings (70% cheaper)
- Priority queue: process popular channels first
CDN:
- Use a major CDN provider (Akamai, Cloudflare, AWS CloudFront)
- Or build your own (YouTube, Netflix)
- Netflix built Open Connect: custom servers in ISP data centers
- YouTube uses Google's private network
Database:
- Metadata: PostgreSQL with read replicas
- View counts: Redis + periodic flush to PostgreSQL
- Comments: sharded by video_id
- Search: Elasticsearch cluster
Storage:
- Hot storage (S3 Standard): popular videos, recent uploads
- Cold storage (S3 Glacier): old, rarely viewed videos
- Lifecycle policy: move to cold after 90 days of no views
Common Mistakes
Streaming the original file. Always transcode to multiple resolutions and use adaptive bitrate streaming. The original file is too large and only in one format.
Not using a CDN. Serving video from a central origin server adds latency and costs a fortune in bandwidth. CDN is not optional for video streaming.
Synchronous transcoding. Transcoding takes minutes. Never make the user wait. Use async processing with a message queue and notify when ready.
Counting views synchronously. Updating a database counter for every view at 5 billion views/day would kill the database. Use in-memory counting with periodic batch updates.
Interview Tips
Separate the two pipelines early. “There are two main flows: the upload/transcoding pipeline and the streaming pipeline. Let me design each.”
Mention adaptive bitrate streaming. “I will use HLS or DASH with adaptive bitrate. The player adjusts quality based on the viewer’s bandwidth.”
Discuss the DAG for transcoding. “Transcoding is a directed acyclic graph of tasks: split into segments, transcode each in parallel, generate thumbnails, extract audio.”
Talk about CDN strategy. “Popular videos are pushed to all CDN edges. Long-tail videos are cached on-demand.”
Mention cost optimization. “90% of videos get 5% of views. I will use cheaper storage and fewer transcoded formats for long-tail content.”
Related Articles
- System Design #15: Design a News Feed — Fan-out strategies and ranking
- System Design #4: Caching — CDN and caching strategies
- System Design #3: Load Balancers — Distributing traffic across servers
- System Design #7: Message Queues — Kafka for async transcoding
What’s Next?
In the next article, System Design #17: Design a File Storage System, you will learn:
- Block storage and file chunking
- File sync across devices
- Deduplication to save storage
- How Google Drive and Dropbox handle file sync
This is part 16 of the System Design Tutorial series. Follow along to learn system design from scratch.