URL Shortener
Design a service like bit.ly that converts long URLs into short, unique links and redirects users. Covers hashing, base62 encoding, and read-heavy optimization.
The best way to learn system design is by working through real problems. This page presents three classic system design case studies, each following the structured interview framework: requirements gathering, estimation, high-level design, detailed design, and scaling considerations.
URL Shortener
Design a service like bit.ly that converts long URLs into short, unique links and redirects users. Covers hashing, base62 encoding, and read-heavy optimization.
Chat System
Design a real-time messaging system like WhatsApp supporting 1:1 and group chat. Covers WebSockets, message delivery guarantees, and presence detection.
News Feed
Design a social media feed like Twitterβs home timeline. Covers fan-out strategies, ranking algorithms, and feed generation at scale.
Functional Requirements:
Non-Functional Requirements:
Out of Scope: User accounts, rate limiting per user, link preview generation.
Traffic:- 100M new URLs created per month (writes)- Read:Write ratio = 100:1- 10B redirects per month (reads)- Writes: 100M / (30 * 24 * 3600) β 40 URLs/sec- Reads: 10B / (30 * 24 * 3600) β 3,800 reads/sec
Storage:- Each URL record: ~500 bytes (short URL, long URL, timestamps, metadata)- 100M * 500 bytes = 50 GB/month- 5 years: 50 * 60 = 3 TB total
Memory (caching top 20%):- 10B requests/month β 330M requests/day- Cache top 20% of daily URLs: 66M * 500 bytes β 33 GB- Fits in a single Redis instance or small clusterββββββββββββ ββββββββββββββββ βββββββββββββββββ β POST β β β ββ Client βββββββββΊβ API βββββββββΊβ Database ββ β β Server β β (URL Store) ββ β GET β β βββββΊβ ββ βββββββββΊβ β β βββββββββββββββββ ββββββββββ ββββββ β 302 β βββββββββΊββββββββββββββββββββββββββββ ββββββββββββββββ β Cache β β (Redis) β ββββββββββββββββAPI Design:
POST /api/v1/shorten Request: { "long_url": "https://...", "custom_alias": "my-link", "expiry": "2029-01-01" } Response: { "short_url": "https://short.ly/aB3xK9", "expires_at": "2029-01-01T00:00:00Z" }
GET /{short_code} Response: 302 Redirect to original URL Headers: Location: https://original-long-url.com/path
GET /api/v1/stats/{short_code} Response: { "short_url": "...", "long_url": "...", "clicks": 12345, "created_at": "..." }We need to generate a unique, short string for each URL. A 7-character base62 string provides 62^7 = 3.5 trillion unique combinations β more than enough.
Approach 1: Hash + Truncate
long_url = "https://www.example.com/very/long/path?query=value"hash = MD5(long_url) = "a1b2c3d4e5f6a7b8c9d0e1f2a3b4c5d6"take first 7 chars β "d41d8cd" (but this is hex, not base62)convert to base62 β "aB3xK9p"
Problem: Hash collisions are possible with truncation.Solution: Check database. If collision, append a counter and rehash.Approach 2: Counter-Based with Base62 Encoding
import string
ALPHABET = string.digits + string.ascii_lowercase + string.ascii_uppercaseBASE = len(ALPHABET) # 62
def encode_base62(num: int) -> str: """Convert a number to base62 string.""" if num == 0: return ALPHABET[0] result = [] while num > 0: result.append(ALPHABET[num % BASE]) num //= BASE return ''.join(reversed(result))
def decode_base62(s: str) -> int: """Convert a base62 string back to a number.""" num = 0 for char in s: num = num * BASE + ALPHABET.index(char) return num
# Examples:# encode_base62(1000000) β "4c92"# encode_base62(56800235584) β "aB3xK9p" (7 chars)Approach 3: Pre-Generated Key Service (Recommended)
ββββββββββββββββ βββββββββββββββββββββββββ Key Gen ββββββΊβ Key Database ββ Service β β ββ β β Used: [aB3, xK9...]ββ Pre-generates β Unused: [pQ7, mN2..]ββ millions of β βββββββββββββββββββββββββ unique keys βββββββββ¬ββββββββ β β Batch of unused keys βΌβββββββββββββββββ App Server β Takes a key from local batchβ (in-memory β when creating new short URL.β key batch) β Requests new batch when running low.ββββββββββββββββCREATE TABLE urls ( id BIGSERIAL PRIMARY KEY, short_code VARCHAR(7) UNIQUE NOT NULL, long_url TEXT NOT NULL, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, expires_at TIMESTAMP, click_count BIGINT DEFAULT 0, user_id BIGINT -- optional, for registered users);
CREATE INDEX idx_expires_at ON urls(expires_at) WHERE expires_at IS NOT NULL;Database choice: Use a NoSQL key-value store (DynamoDB or Cassandra) for the URL mapping since the access pattern is simple key-value lookup. Use a relational database for analytics if complex queries are needed.
1. Client requests GET /aB3xK9p2. Check Redis cache for key "aB3xK9p"3. Cache HIT β redirect immediately (< 10ms)4. Cache MISS β query database β store in cache β redirect5. Increment click counter (async, via message queue)
ββββββββββ GET /aB3xK9p βββββββββββββ Client ββββββββββββββββββββΊβ App Svr ββ β β ββ β β 1. Check ββββΊββββββββββ β β cache β β Redis ββ β β βββββ ββ β β β ββββββββββ β β ββ β 302 Redirect β 2. (miss)ββββΊβββββββββ βββββββββββββββββββββ query β β DB ββ β Location: ... β DB βββββ βββββββββββ ββββββββββββ ββββββββ| Code | Meaning | Browser Behavior | Use When |
|---|---|---|---|
| 301 | Moved Permanently | Browser caches redirect | You want to reduce server load |
| 302 | Found (Temporary) | Browser asks server every time | You need accurate click analytics |
For a URL shortener, use 302 if analytics are important (so every click hits the server), or 301 for better performance.
Full Architecture at Scale:
ββββββββββββ βββββββββββββ Clients βββββββββΊβ CDN β (cache 302 redirects)ββββββββββββ ββββββ¬ββββββ β ββββββΌββββββ β Load β β Balancer β ββββββ¬ββββββ ββββββββββββΌβββββββββββ βΌ βΌ βΌ ββββββββββ ββββββββββ ββββββββββ βApp Svr β βApp Svr β βApp Svr β β + Key β β + Key β β + Key β β Batch β β Batch β β Batch β βββββ¬βββββ βββββ¬βββββ βββββ¬βββββ β β β ββββββ΄βββββββββββ΄βββββββββββ΄βββββ β Redis Cluster β β (cache hot short URLs) β βββββββββββββββββ¬βββββββββββββββββ β (cache miss) βββββββββββΌβββββββββββ βΌ βΌ βΌ ββββββββββ ββββββββββ ββββββββββ βDB Shardβ βDB Shardβ βDB Shardβ β 1 β β 2 β β 3 β ββββββββββ ββββββββββ ββββββββββ β ββββββΌβββββ β Kafka ββββΊ Analytics Pipeline β (clicks)ββββΊ Click Aggregation βββββββββββScaling strategies:
Functional Requirements:
Non-Functional Requirements:
Users: 500M monthly active users, 100M daily activeMessages: 50 messages/day per user- 100M * 50 = 5B messages/day- 5B / 86400 β 58,000 messages/sec
Storage per message: ~200 bytes (text + metadata)- 5B * 200 bytes = 1 TB/day- 365 TB/year
Connections: 100M concurrent WebSocket connections- Each connection: ~10KB memory- Total: 100M * 10KB = 1 TB memory for connections- Need ~1000 servers (each handling 100K connections)ββββββββββββ WebSocket ββββββββββββββββ βββββββββββββββββ User A ββββββββββββββΊβ Chat βββββββΊβ Message ββ (sender) β β Server β β Queue βββββββββββββ β Cluster β β (Kafka) β ββββββββ¬ββββββββ ββββββββ¬ββββββββ β β ββββββββΌββββββββ ββββββββΌββββββββββββββββββββ WebSocket β Chat β β Message ββ User B ββββββββββββββΊβ Server β β Store ββ(receiver)β β Cluster β β (Cassandra) βββββββββββββ ββββββββββββββββ ββββββββββββββββEach user maintains a persistent WebSocket connection to a chat server. A connection registry tracks which user is connected to which server.
ββββββββ WS ββββββββββββ ββββββββββββ WS βββββββββUser AβββββββΊβChat Svr 1β βChat Svr 2βββββββΊβUser Bβββββββββ ββββββ¬ββββββ ββββββ¬ββββββ ββββββββ β β βΌ βΌ ββββββββββββββββββββββββββββββββββββ β Connection Registry (Redis) β β β β user_A β chat_server_1 β β user_B β chat_server_2 β β user_C β chat_server_1 β ββββββββββββββββββββββββββββββββββββUser A sends "Hello" to User B:
1. User A ββ(WebSocket)βββΊ Chat Server 12. Chat Server 1: a. Generate message_id (timestamp + sequence) b. Store message in database c. Look up User B's connection in registry d. User B is on Chat Server 23. Chat Server 1 ββ(message queue)βββΊ Chat Server 24. Chat Server 2 ββ(WebSocket)βββΊ User B5. User B sends delivery ACK βββΊ Chat Server 26. Chat Server 2 βββΊ Update message status to "delivered"
Timeline:User A Server User B β β β βββ msg "Hello" ββββββΊβ β β βββ store in DB β ββββ ACK (sent) βββββββ β β βββ forward ββββββββββΊβ β β β β ββββ ACK (delivered) βββ ββββ delivered ββββββββ β β β β β ββββ ACK (read) βββββββ ββββ read βββββββββββββ βFor group messages, the message is sent once and fanned out to all group members.
User A sends "Hi everyone" to Group (A, B, C, D):
ββββββββββ βββββββββββ User A ββββΊ Chat Svr 1 βββΊ Kafka ββββΊ β User B β (online, on Svr 2)ββββββββββ ββββΊ β User C β (online, on Svr 3) ββββΊ β User D β (offline β push notification) ββββββββββ
Small groups (< 500): β Fan-out on write: send to each member's queue
Large groups / channels (500+): β Fan-out on read: members pull when they come online-- Messages table (in Cassandra for write-heavy workload)-- Partition key: conversation_id (ensures all messages in a-- conversation are on the same partition for efficient reads)-- Clustering key: message_id (sorted by timestamp)
CREATE TABLE messages ( conversation_id UUID, message_id TIMEUUID, -- timestamp-based UUID sender_id UUID, content TEXT, message_type TEXT, -- 'text', 'image', 'system' status TEXT, -- 'sent', 'delivered', 'read' created_at TIMESTAMP, PRIMARY KEY (conversation_id, message_id)) WITH CLUSTERING ORDER BY (message_id DESC);
-- Conversation list per userCREATE TABLE user_conversations ( user_id UUID, conversation_id UUID, last_message_at TIMESTAMP, last_message_text TEXT, unread_count INT, PRIMARY KEY (user_id, last_message_at)) WITH CLUSTERING ORDER BY (last_message_at DESC);Why Cassandra?
Approach: Heartbeat-based detection
User A connects: β Set "user_A:status" = "online" in Redis (with 30s TTL) β Client sends heartbeat every 10 seconds β Each heartbeat resets the TTL
User A disconnects (graceful): β Delete "user_A:status" from Redis β Notify relevant users via pub/sub
User A disconnects (crash/network loss): β Heartbeat stops β TTL expires after 30 seconds β Status automatically becomes "offline"
ββββββββββββ heartbeat ββββββββββββ ββββββββββ User A βββ(every 10s)ββΊβ Chat Svr βββSETEXβββΊβ Redis ββ β β β 30s TTL β βββββββββββββ ββββββββββββ βββββββββWhen a user is offline, messages are stored and delivered when they reconnect:
1. User B sends message to User A (offline)2. Chat Server stores message in database with status = "sent"3. Send push notification to User A's mobile device4. When User A comes online: a. Query all messages with status = "sent" for User A's conversations b. Deliver all pending messages via WebSocket c. Update status to "delivered"Full Architecture:
ββββββββββββββββββββ β API Gateway β β (REST for auth, βMobile ββββββββββββββββββΊβ profile, etc.) β Apps ββββββββββ¬ββββββββββ β βββββββββΌβββββββββββ WebSocket βββββββββββββΊβ WS Load Balancerβ (sticky sessions by user) Connections βββββ¬βββββ¬βββββ¬βββββ β β β ββββββββββ β ββββββββββ βΌ βΌ βΌ ββββββββββββ ββββββββββββ ββββββββββββ βChat Svr 1β βChat Svr 2β βChat Svr Nβ β (100K β β (100K β β (100K β β conns) β β conns) β β conns) β βββββββ¬βββββ ββββββ¬ββββββ ββββββ¬ββββββ β β β ββββββΌββββββββββββΌββββββββββββββΌβββββ β Redis Cluster β β (connection registry + presence) β ββββββββββββββββββ¬βββββββββββββββββββ β ββββββββββββββββββΌβββββββββββββββββββ β Kafka Cluster β β (message routing between servers) β ββββββββββββββββββ¬βββββββββββββββββββ β ββββββββββββββββββΌβββββββββββββββββββ β Cassandra Cluster β β (message persistence) β β β β Shard by conversation_id β β RF=3 for durability β βββββββββββββββββββββββββββββββββββββ β ββββββββββββββββββΌβββββββββββββββββββ β Push Notification Service β β (APNs for iOS, FCM for Android) β βββββββββββββββββββββββββββββββββββββKey scaling decisions:
Functional Requirements:
Non-Functional Requirements:
Users: 500M monthly active, 200M daily activePosts: 10M new posts per dayFollows: Average user follows 200 peopleFeed: Each user views feed ~10 times/day
Feed generation:- 200M users * 10 views = 2B feed requests/day- 2B / 86400 β 23,000 feed requests/sec
Post writes:- 10M posts/day = ~115 posts/sec
Storage per post: ~1KB (text, metadata, references)- 10M * 1KB = 10 GB/day- 3.65 TB/yearββββββββββββ ββββββββββββββββ βββββββββββββββββ Client βββββββββΊβ Post Service βββββββΊβ Post Store ββ β POST β β β ββ β ββββββββ¬ββββββββ βββββββββββββββββ β ββ β ββββββΌββββββββ β β Fan-out ββ β β Service ββ β ββββββ¬ββββββββ β ββ β ββββββββΌββββββββ βββββββββββββββββ βββββββββΊβ Feed Service βββββββΊβ Feed Cache ββ β GET β β β (Redis) βββββββββββββ ββββββββββββββββ ββββββββββββββββWhen a user publishes a post, how do we get it into all their followersβ feeds?
Approach 1: Fan-Out on Write (Push Model)
When a post is created, immediately push it to every followerβs feed cache.
Alice posts "Hello world!" (Alice has 1000 followers)
1. Store post in posts table2. Fetch Alice's follower list: [Bob, Carol, Dave, ... 997 more]3. For each follower, prepend post to their feed cache: - LPUSH feed:Bob post_id_123 - LPUSH feed:Carol post_id_123 - LPUSH feed:Dave post_id_123 - ... (1000 operations)
βββββββββ Post ββββββββββββ Fan-out βββββββββββββββββββ Alice ββββββββββββΊβ Post Svc ββββββββββββββΊβ Feed Cache ββββββββββ ββββββββββββ β β β feed:Bob [123, 120, 118...]β β feed:Carol [123, 122, 115...]β β feed:Dave [123, 119, 117...]β ββββββββββββββββββPros: Feed reads are instant (pre-computed), simple read path Cons: Write amplification for users with millions of followers, wasted work for inactive users
Approach 2: Fan-Out on Read (Pull Model)
When a user opens their feed, fetch posts from all the people they follow and merge them.
Bob opens his feed (Bob follows 200 people):
1. Fetch Bob's following list: [Alice, Charlie, Eve, ... 197 more]2. For each followed user, fetch their recent posts: - SELECT * FROM posts WHERE user_id = Alice ORDER BY created_at DESC LIMIT 20 - SELECT * FROM posts WHERE user_id = Charlie ORDER BY created_at DESC LIMIT 20 - ... (200 queries)3. Merge all posts and sort by relevance/time4. Return top 20
βββββββ GET /feed ββββββββββββ 200 queries βββββββββββββ Bob ββββββββββββββΊβ Feed Svc ββββββββββββββββΊβ Post DB ββ βββββββββββββββ (merge) βββββββββββββββββ ββββββββ sorted feedββββββββββββ ββββββββββββPros: No write amplification, no wasted work for inactive users Cons: Slow feed reads (many queries to merge), high read latency
Approach 3: Hybrid (Recommended)
Use fan-out on write for most users, fan-out on read for celebrities (users with millions of followers).
Hybrid Strategy:
Post by regular user (< 10K followers): β Fan-out on write (push to all follower feeds)
Post by celebrity (> 10K followers): β Do NOT fan-out on write β When a user fetches their feed: 1. Read pre-computed feed from cache (regular users' posts) 2. Fetch recent posts from followed celebrities 3. Merge and rank
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ Feed Generation ββ ββ User opens feed: ββ 1. Get pre-built feed from cache βββΊ [P1, P3, P5]ββ 2. Get celebrity posts (fan-out read) βββΊ [P2, P4] ββ 3. Merge + Rank βββΊ [P2, P1, P4, P3, P5]ββ 4. Return top 20 posts ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ-- Users tableCREATE TABLE users ( user_id BIGSERIAL PRIMARY KEY, username VARCHAR(50) UNIQUE NOT NULL, display_name VARCHAR(100), follower_count BIGINT DEFAULT 0, is_celebrity BOOLEAN DEFAULT FALSE -- true if follower_count > 10K);
-- Posts tableCREATE TABLE posts ( post_id BIGSERIAL PRIMARY KEY, user_id BIGINT REFERENCES users(user_id), content TEXT NOT NULL, media_urls TEXT[], like_count BIGINT DEFAULT 0, retweet_count BIGINT DEFAULT 0, reply_count BIGINT DEFAULT 0, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP);
CREATE INDEX idx_posts_user_time ON posts(user_id, created_at DESC);
-- Follows table (who follows whom)CREATE TABLE follows ( follower_id BIGINT, followee_id BIGINT, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, PRIMARY KEY (follower_id, followee_id));
CREATE INDEX idx_followee ON follows(followee_id);Key: "feed:{user_id}"Value: Sorted Set of post_ids scored by timestamp (or relevance score)
ZADD feed:bob 1700000001 "post_123"ZADD feed:bob 1700000050 "post_456"ZADD feed:bob 1700000100 "post_789"
# Fetch feed (latest 20 posts)ZREVRANGE feed:bob 0 19
# Trim old entries (keep last 1000)ZREMRANGEBYRANK feed:bob 0 -1001A simple chronological feed is straightforward, but a ranked feed improves engagement. A basic ranking formula:
def calculate_relevance_score(post, user): """Score a post for a specific user's feed.""" score = 0.0
# Recency (decays over time) age_hours = (now() - post.created_at).total_seconds() / 3600 recency_score = 1.0 / (1.0 + age_hours) score += recency_score * 40 # weight: 40%
# Engagement signals engagement = ( post.like_count * 1.0 + post.retweet_count * 2.0 + post.reply_count * 3.0 ) engagement_score = math.log(1 + engagement) score += engagement_score * 30 # weight: 30%
# Relationship strength (how often user interacts with author) interaction_count = get_interaction_count(user.id, post.author_id) relationship_score = math.log(1 + interaction_count) score += relationship_score * 20 # weight: 20%
# Content type boost if post.has_media: score += 5 # weight: 10% if post.is_thread: score += 3
return score1. User creates post via API2. Post Service: a. Validate content (length, prohibited content) b. Store in posts table c. Upload media to object storage (S3) d. Publish event to Kafka topic "new_posts"3. Fan-out Service (consumes from Kafka): a. Check if author is celebrity (follower_count > 10K) b. If regular user: fetch follower list, push to each feed cache c. If celebrity: skip fan-out (handled at read time)4. Notification Service (consumes from Kafka): a. Send push notifications to users with notifications enabled5. Search Indexing Service (consumes from Kafka): a. Index post in Elasticsearch for search
βββββββββββ POST ββββββββββββ Event ββββββββββββ User βββββββββββΊβ Post Svc ββββββββββΊβ Kafka ββββββββββββ ββββββββββββ ββββββ¬βββββ ββββββΌβββββββββββββ βΌ βΌ βΌ βββββββββββββββββββββββββββββββββ βFan-out ββNotif. ββSearch β βService ββService ββIndexer β βββββ¬ββββββββββββββββββββββββββββ β Push to follower feed caches β ββββββΌβββββββββββ β Redis Cluster β β (feed caches) β βββββββββββββββββFull Architecture:
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ Clients ββ (Mobile Apps, Web Browsers, Third-Party via API) βββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββ β ββββββΌββββββ β CDN β (images, videos, static assets) ββββββ¬ββββββ β ββββββΌββββββ β API GW / β (auth, rate limiting, routing) β Load Bal β ββββββ¬ββββββ ββββββββββββΌβββββββββββββββββββββββ βΌ βΌ βΌ ββββββββββββ ββββββββββββ ββββββββββββ β Post Svc β β Feed Svc β β User Svc β β β β β β β ββββββ¬ββββββ ββββββ¬ββββββ ββββββ¬ββββββ β β β ββββββΌβββββ ββββββΌβββββββ ββββββΌββββββ βPost DB β βRedis Feed β β User DB β β(sharded)β βCache Clusterβ β(sharded) β βββββββββββ βββββββββββββ ββββββββββββ β ββββββΌββββββ β Kafka ββββΊ Fan-out Workers (auto-scaled) β Cluster ββββΊ Notification Workers β ββββΊ Search Index Workers βββββββββββββββΊ Analytics Pipeline β ββββββΌβββββββββββ β Elasticsearch β (post search) βββββββββββββββββKey scaling decisions:
| Component | Strategy | Rationale |
|---|---|---|
| Post DB | Shard by user_id | Co-locate a userβs posts for efficient retrieval |
| Feed cache | Redis Cluster, shard by user_id | Even distribution of feed data |
| Fan-out workers | Auto-scale based on Kafka lag | Handle spikes when celebrities post |
| Celebrity detection | Threshold-based (> 10K followers) | Avoid massive fan-out for celebrities |
| Feed TTL | Keep last 1000 posts per user, expire after 7 days | Bound memory usage |
| Media storage | S3 + CDN | Scalable blob storage with global delivery |
| Search | Elasticsearch cluster | Full-text search across all posts |
Hot partition mitigation: When a celebrity with 50M followers posts, the fan-out on read approach means followers query the celebrityβs posts table partition. Solve this by caching the celebrityβs recent posts in Redis and replicating across multiple read replicas.
These topics apply to all three case studies and most system design problems:
Monitoring and Alerting
Every system needs metrics (latency p50/p95/p99, error rates, throughput), logs (structured, searchable), and alerts (PagerDuty, Slack). Use tools like Prometheus, Grafana, Datadog, and ELK stack.
Security
Authentication (OAuth 2.0, JWT), authorization (RBAC), input validation, rate limiting, encryption in transit (TLS) and at rest (AES-256), and protection against common attacks (SQL injection, XSS, CSRF).
Deployment and CI/CD
Blue-green deployments, canary releases, feature flags, automated testing pipelines, infrastructure as code (Terraform), and containerization (Docker, Kubernetes).
Data Privacy and Compliance
GDPR, CCPA, data retention policies, right to deletion, data anonymization, audit logging, and geographic data residency requirements.