Fundamentals
Client-server model, networking basics, HTTP/HTTPS, DNS, TCP/UDP, API design patterns (REST, GraphQL, gRPC), and core trade-offs like latency vs throughput and the CAP theorem.
System design is the process of defining the architecture, components, modules, interfaces, and data flow of a system to satisfy specified requirements. It bridges the gap between problem definition and implementation, requiring you to think about scalability, reliability, availability, and maintainability from day one.
Whether you are preparing for a senior engineering interview or architecting a production system that serves millions of users, mastering system design is non-negotiable.
Fundamentals
Client-server model, networking basics, HTTP/HTTPS, DNS, TCP/UDP, API design patterns (REST, GraphQL, gRPC), and core trade-offs like latency vs throughput and the CAP theorem.
Databases
SQL vs NoSQL, ACID properties, sharding, replication, indexing strategies, normalization and denormalization, and a practical database selection guide.
Scalability
Horizontal and vertical scaling, load balancing algorithms, caching strategies (CDN, application, database), message queues, and rate limiting patterns.
Case Studies
Full design walkthroughs for real-world systems: URL Shortener, Chat System, and News Feed — each with requirements, architecture diagrams, and scaling considerations.
Caching & CDNs
Cache invalidation strategies, write-through vs write-back, cache eviction policies (LRU, LFU, FIFO), and CDN architecture for global content delivery.
Microservices
Service decomposition, inter-service communication, service discovery, API gateways, circuit breakers, and distributed tracing in microservice architectures.
Use this step-by-step framework to structure any system design interview. Spending the right amount of time in each phase is critical.
Never jump into designing before understanding the problem. Ask clarifying questions:
Example questions for "Design a URL Shortener":- How many URLs per day? (Write volume)- How many redirects per day? (Read volume)- How long should shortened URLs be valid?- Should users be able to customize short URLs?- Do we need analytics (click tracking)?Quantify the scale to guide design decisions:
Example estimation:- 100M new URLs/month → ~40 URLs/sec (write)- Read:Write ratio = 100:1 → 4,000 reads/sec- Each URL record ~500 bytes → 50 GB/month → 600 GB/year- Cache top 20% → ~120 GB memory neededSketch the major components and how they interact:
Dive deep into the most critical components:
Identify and address potential issues:
| Concept | Description | Why It Matters |
|---|---|---|
| Horizontal Scaling | Adding more machines to distribute load | Enables near-linear capacity growth |
| Vertical Scaling | Adding more resources (CPU, RAM) to a single machine | Simpler but has hard upper limits |
| Load Balancing | Distributing requests across multiple servers | Prevents overloading any single server |
| Caching | Storing frequently accessed data in fast storage | Reduces latency and database load by 10-100x |
| CDN | Content Delivery Network for static assets | Serves content from geographically close servers |
| Database Sharding | Splitting data across multiple database instances | Enables horizontal database scaling |
| Replication | Maintaining copies of data across nodes | Increases availability and read throughput |
| CAP Theorem | Consistency, Availability, Partition tolerance — pick two | Guides database and architecture decisions |
| Consistent Hashing | Hash ring for distributing data across nodes | Minimizes data movement when nodes change |
| Message Queue | Asynchronous communication between services | Decouples components and handles traffic spikes |
| Rate Limiting | Throttling request frequency per client | Prevents abuse and ensures fair resource usage |
| Circuit Breaker | Stops cascading failures between services | Improves resilience in distributed systems |
| API Gateway | Single entry point for all client requests | Handles auth, routing, rate limiting, and logging |
| Idempotency | Same request produces same result if repeated | Critical for retry logic and exactly-once semantics |
| Eventual Consistency | Data will converge to consistent state over time | Enables higher availability at the cost of staleness |
These latency and throughput numbers help you make informed estimation decisions during system design.
| Operation | Latency |
|---|---|
| L1 cache reference | 0.5 ns |
| L2 cache reference | 7 ns |
| Main memory reference | 100 ns |
| SSD random read | 150 us |
| HDD sequential read (1 MB) | 20 ms |
| Send packet CA → Netherlands → CA | 150 ms |
| Read 1 MB sequentially from memory | 250 us |
| Read 1 MB sequentially from SSD | 1 ms |
| Read 1 MB sequentially from HDD | 20 ms |
| Scale | Requests/sec | Notes |
|---|---|---|
| Single web server | 1,000-10,000 | Depends on complexity |
| Single database | 5,000-10,000 | Read-heavy workloads |
| Redis/Memcached | 100,000+ | In-memory operations |
| Kafka (single broker) | 100,000+ | Append-only log |
Week 1-2: Fundamentals
Start with networking basics, the client-server model, and API design. Understand the core trade-offs that underpin every design decision.
Week 3-4: Storage & Data
Deep dive into database selection, schema design, indexing, sharding, and replication. These concepts appear in every system design problem.
Week 5-6: Scaling Patterns
Learn load balancing, caching, message queues, and other patterns that enable systems to handle millions of users.
Week 7-8: Practice
Apply everything by working through complete case studies. Practice the interview framework with real problems.