Skip to content

Databases

Choosing the right database is one of the most consequential decisions in system design. It affects scalability, consistency, latency, and operational complexity. This page covers the core concepts you need to make informed database decisions.


SQL vs NoSQL

Relational Databases (SQL)

Relational databases store data in tables with predefined schemas. They use Structured Query Language (SQL) for querying and enforce relationships through foreign keys.

-- Schema definition
CREATE TABLE users (
id SERIAL PRIMARY KEY,
name VARCHAR(100) NOT NULL,
email VARCHAR(255) UNIQUE NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE TABLE posts (
id SERIAL PRIMARY KEY,
user_id INTEGER REFERENCES users(id),
title VARCHAR(255) NOT NULL,
body TEXT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- Query with JOIN
SELECT u.name, p.title, p.created_at
FROM users u
JOIN posts p ON u.id = p.user_id
WHERE u.id = 123
ORDER BY p.created_at DESC
LIMIT 10;

Examples: PostgreSQL, MySQL, Oracle, SQL Server, SQLite

Non-Relational Databases (NoSQL)

NoSQL databases provide flexible schemas and are designed for specific access patterns. There are four main categories:

Document Stores

Store data as JSON-like documents. Each document can have a different structure.

// MongoDB document
{
"_id": "user_123",
"name": "Alice",
"email": "alice@example.com",
"posts": [
{
"title": "My First Post",
"body": "Hello world!",
"tags": ["intro", "personal"],
"created_at": "2024-01-15T10:30:00Z"
}
],
"preferences": {
"theme": "dark",
"notifications": true
}
}

Examples: MongoDB, CouchDB, Amazon DocumentDB

Key-Value Stores

Simplest NoSQL model. Optimized for fast lookups by key.

Key: "session:abc123" → Value: {"user_id": 123, "expires": 1700000000}
Key: "user:123:profile" → Value: {"name": "Alice", "avatar": "..."}
Key: "rate:ip:10.0.0.1" → Value: 47 (request count)

Examples: Redis, Memcached, Amazon DynamoDB, etcd

Wide-Column Stores

Store data in column families. Each row can have different columns.

Row Key: "user_123"
┌──────────────────────────────────────────────────────┐
│ Column Family: "profile" │
│ name: "Alice" │ email: "alice@ex.com" │ age: 30│
├──────────────────────────────────────────────────────┤
│ Column Family: "activity" │
│ last_login: "2024-01-15" │ post_count: 42 │
└──────────────────────────────────────────────────────┘

Examples: Apache Cassandra, HBase, Google Bigtable

Graph Databases

Optimized for data with complex relationships. Nodes represent entities, edges represent relationships.

(Alice)──FRIENDS_WITH──►(Bob)
│ │
AUTHORED LIKED
│ │
▼ ▼
(Post1)◄──COMMENTED──(Comment1)

Examples: Neo4j, Amazon Neptune, ArangoDB

SQL vs NoSQL Comparison

FeatureSQLNoSQL
SchemaFixed, predefinedFlexible, dynamic
Query languageSQL (standardized)Database-specific APIs
ScalingPrimarily verticalPrimarily horizontal
JoinsNative, efficientLimited or none
TransactionsFull ACID supportVaries (some support ACID)
ConsistencyStrong by defaultEventual by default (configurable)
Data modelTables with rowsDocuments, key-value, columns, graphs
Best forStructured data, complex queries, transactionsUnstructured data, high write volume, flexible schema

ACID Properties

ACID properties guarantee that database transactions are processed reliably.

PropertyDefinitionExample
AtomicityAll operations in a transaction succeed or all failA bank transfer debits one account AND credits another — both or neither
ConsistencyTransaction brings DB from one valid state to anotherAccount balance cannot go below zero if that constraint exists
IsolationConcurrent transactions do not interfere with each otherTwo people buying the last item do not both succeed
DurabilityOnce committed, data survives crashesAfter “transfer complete,” data persists even if server crashes

Isolation Levels

Higher isolation provides stronger guarantees but lower concurrency (throughput).

LevelDirty ReadsNon-Repeatable ReadsPhantom ReadsPerformance
Read UncommittedPossiblePossiblePossibleFastest
Read CommittedPreventedPossiblePossibleFast
Repeatable ReadPreventedPreventedPossibleModerate
SerializablePreventedPreventedPreventedSlowest
Dirty Read:
TX1: UPDATE balance SET amount = 0 WHERE user = 'Alice'
TX2: SELECT amount FROM balance WHERE user = 'Alice' → reads 0
TX1: ROLLBACK (amount goes back to 100)
TX2 read invalid data!
Non-Repeatable Read:
TX1: SELECT amount FROM balance WHERE user = 'Alice' → 100
TX2: UPDATE balance SET amount = 50 WHERE user = 'Alice'
TX2: COMMIT
TX1: SELECT amount FROM balance WHERE user = 'Alice' → 50
TX1 got different results for the same query!
Phantom Read:
TX1: SELECT * FROM orders WHERE amount > 100 → 5 rows
TX2: INSERT INTO orders (amount) VALUES (200)
TX2: COMMIT
TX1: SELECT * FROM orders WHERE amount > 100 → 6 rows
TX1 sees a new "phantom" row!

Indexing

Indexes are data structures that speed up data retrieval at the cost of additional storage and slower writes.

How Indexes Work

Without an index, the database performs a full table scan (O(n)). With an index, it can locate data in O(log n) or O(1).

Without Index (Full Table Scan):
┌────┬────────┬──────────────┐
│ id │ name │ email │ ← Scan every row
├────┼────────┼──────────────┤ to find email =
│ 1 │ Alice │ alice@ex.com │ "dave@ex.com"
│ 2 │ Bob │ bob@ex.com │
│ 3 │ Carol │ carol@ex.com │ O(n) - slow!
│ 4 │ Dave │ dave@ex.com │ ← Found!
│ .. │ ... │ ... │
└────┴────────┴──────────────┘
With B-Tree Index on email:
┌─────────────┐
│ carol@ │
└──┬──────┬───┘
│ │
┌────────▼┐ ┌─▼────────┐
│ alice@ │ │ dave@ │
│ bob@ │ │ eve@ │
└─────────┘ └──────────┘
O(log n) - fast!

Index Types

Index TypeStructureBest For
B-TreeBalanced treeRange queries, sorting, equality (=, <, >, BETWEEN)
HashHash tableExact match only (=) — O(1) lookup
CompositeMulti-column B-TreeQueries filtering on multiple columns
Full-textInverted indexText search (LIKE '%keyword%')
SpatialR-TreeGeographic queries (nearby points)
CoveringIncludes all queried columnsEliminates table lookup entirely

Index Best Practices

-- Create an index on frequently queried columns
CREATE INDEX idx_users_email ON users(email);
-- Composite index for multi-column queries
-- Column order matters! Follows leftmost prefix rule.
CREATE INDEX idx_posts_user_date ON posts(user_id, created_at DESC);
-- This index supports:
-- WHERE user_id = 123 ✓
-- WHERE user_id = 123 AND created_at > '2024' ✓
-- WHERE created_at > '2024' ✗ (can't use index)
-- Partial index (only index active users)
CREATE INDEX idx_active_users ON users(email) WHERE active = true;
-- Check which indexes exist
SELECT indexname, tablename FROM pg_indexes WHERE schemaname = 'public';

When NOT to index:

  • Columns with very low cardinality (e.g., boolean is_active)
  • Tables with very few rows
  • Columns that are rarely queried
  • Write-heavy tables where index maintenance overhead is too high

Normalization vs Denormalization

Normalization

Normalization organizes data to reduce redundancy and ensure data integrity. Each fact is stored in exactly one place.

Normalized Design (3NF):
users posts comments
┌────┬────────┐ ┌────┬─────────┬────┐ ┌────┬─────────┬────┐
│ id │ name │ │ id │ title │u_id│ │ id │ text │p_id│
├────┼────────┤ ├────┼─────────┼────┤ ├────┼─────────┼────┤
│ 1 │ Alice │ │ 10 │ Post A │ 1 │ │100 │ Great! │ 10 │
│ 2 │ Bob │ │ 11 │ Post B │ 2 │ │101 │ Thanks │ 10 │
└────┴────────┘ └────┴─────────┴────┘ └────┴─────────┴────┘
To get "Alice's posts with comments":
→ JOIN users + posts + comments (3 tables)
→ Data integrity guaranteed
→ Updates only need to change one row

Denormalization

Denormalization intentionally introduces redundancy to optimize read performance by reducing the number of joins needed.

Denormalized Design:
posts_with_details
┌────┬─────────┬────────────┬─────────┬───────────────┐
│ id │ title │ author_name│ comment │ comment_count │
├────┼─────────┼────────────┼─────────┼───────────────┤
│ 10 │ Post A │ Alice │ Great! │ 2 │
│ 10 │ Post A │ Alice │ Thanks │ 2 │
│ 11 │ Post B │ Bob │ (null) │ 0 │
└────┴─────────┴────────────┴─────────┴───────────────┘
To get "Alice's posts with comments":
→ Single table read (no JOINs!)
→ But if Alice changes her name, must update many rows
→ comment_count might get out of sync

When to Use Each

AspectNormalizationDenormalization
Read speedSlower (requires JOINs)Faster (pre-joined data)
Write speedFaster (update one place)Slower (update many places)
StorageLess (no redundancy)More (duplicated data)
Data integrityHigh (single source of truth)Risk of inconsistency
Best forWrite-heavy, transactional systemsRead-heavy, analytics systems

Replication

Replication maintains copies of data across multiple nodes for availability, fault tolerance, and read scalability.

Leader-Follower (Master-Slave) Replication

Writes
Client ──────────►┌────────┐
│ Leader │──────────┐
Client ──────────►│(Master)│ │ Replication
└────────┘ │ (async or sync)
┌───────────────────────┼───────────────┐
│ │ │
┌─────▼───┐ ┌─────▼───┐ ┌─────▼───┐
│Follower │ │Follower │ │Follower │
Reads◄──│ (Slave) │ Reads◄──│ (Slave) │ │ (Slave) │
│ Node 1 │ │ Node 2 │ │ Node 3 │
└─────────┘ └─────────┘ └─────────┘
  • All writes go to the leader
  • Reads can go to any follower (read scaling)
  • Replication can be synchronous (strong consistency, slower) or asynchronous (eventual consistency, faster)
  • If the leader fails, a follower is promoted (failover)

Leader-Leader (Multi-Master) Replication

Client ──────────►┌─────────┐◄────────── Client
│ Leader │
│ Node 1 │
└────┬────┘
│ Bidirectional
│ Replication
┌────▼────┐
Client ──────────►│ Leader │◄────────── Client
│ Node 2 │
└─────────┘
  • Both nodes accept writes
  • Requires conflict resolution (last-write-wins, merge, custom logic)
  • Useful for multi-region deployments where each region needs write access

Replication Comparison

StrategyWrite ThroughputRead ThroughputConsistencyComplexity
Single leader, syncLowerHighStrongLow
Single leader, asyncModerateHighEventualLow
Multi-leaderHighHighEventualHigh
Leaderless (quorum)HighHighTunableModerate

Sharding (Partitioning)

Sharding splits data across multiple database instances so no single database holds all the data. This enables horizontal scaling of databases.

Sharding Strategies

1. Range-Based Sharding

Partition data by ranges of a key (e.g., user IDs 1-1M on Shard 1, 1M-2M on Shard 2).

┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Shard 1 │ │ Shard 2 │ │ Shard 3 │
│ Users A-H │ │ Users I-P │ │ Users Q-Z │
│ (30% of data) │ │ (45% of data) │ │ (25% of data) │
└─────────────────┘ └─────────────────┘ └─────────────────┘
↑ UNEVEN!
  • Pros: Simple to implement, supports range queries
  • Cons: Can create hot spots (uneven distribution)

2. Hash-Based Sharding

Apply a hash function to the shard key and use modular arithmetic.

shard_id = hash(user_id) % num_shards
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Shard 1 │ │ Shard 2 │ │ Shard 3 │
│ hash % 3 == 0 │ │ hash % 3 == 1 │ │ hash % 3 == 2 │
│ (~33% of data) │ │ (~33% of data) │ │ (~33% of data) │
└─────────────────┘ └─────────────────┘ └─────────────────┘
Even distribution!
  • Pros: Even distribution of data
  • Cons: Range queries require querying all shards, resharding is expensive (use consistent hashing to mitigate)

3. Directory-Based Sharding

A lookup service maps each key to its shard.

┌──────────────────┐
│ Shard Directory │
│ user_123 → S1 │
│ user_456 → S2 │
│ user_789 → S1 │
└────────┬─────────┘
┌────┼────┐
▼ ▼ ▼
[S1] [S2] [S3]
  • Pros: Flexible, can rebalance without changing logic
  • Cons: Directory is a single point of failure and bottleneck

Sharding Challenges

ChallengeDescriptionMitigation
Cross-shard queriesJOINs across shards are expensiveDenormalize, application-level joins
ReshardingAdding shards requires data migrationConsistent hashing, virtual shards
HotspotsUneven access patterns overload some shardsBetter shard keys, splitting hot shards
Referential integrityForeign keys do not span shardsApplication-level enforcement
Unique constraintsGlobal uniqueness is harderCentralized ID generation (Snowflake)

Database Selection Guide

Decision Tree

Start: What kind of data do you have?
├── Structured data with relationships?
│ ├── Need ACID transactions?
│ │ ├── Yes → PostgreSQL / MySQL
│ │ └── No → Consider NoSQL for performance
│ ├── Complex queries and reporting?
│ │ └── Yes → PostgreSQL (best query planner)
│ └── Simple key-value access pattern?
│ └── Consider DynamoDB or Redis
├── Semi-structured (JSON, varying fields)?
│ ├── Need flexible schema?
│ │ └── Yes → MongoDB / CouchDB
│ └── Need full-text search?
│ └── Yes → Elasticsearch
├── High write throughput, time-series?
│ ├── IoT / metrics data?
│ │ └── Yes → InfluxDB / TimescaleDB
│ └── Event logging at scale?
│ └── Yes → Cassandra / ScyllaDB
├── Graph/relationship-heavy data?
│ ├── Social networks, recommendations?
│ │ └── Yes → Neo4j / Amazon Neptune
│ └── Knowledge graphs?
│ └── Yes → Neo4j
├── Caching / session storage?
│ └── Yes → Redis / Memcached
└── Search and analytics?
└── Yes → Elasticsearch / ClickHouse

Database Comparison Matrix

DatabaseTypeConsistencyScalingBest For
PostgreSQLRelationalStrong (ACID)Vertical + read replicasGeneral purpose, complex queries, GIS
MySQLRelationalStrong (ACID)Vertical + read replicasWeb applications, WordPress, simple queries
MongoDBDocumentEventual (tunable)Horizontal (sharding)Flexible schema, rapid prototyping
RedisKey-ValueStrong (single node)Cluster modeCaching, sessions, leaderboards, pub/sub
CassandraWide-ColumnEventual (tunable)Horizontal (peer-to-peer)High write throughput, time-series, IoT
DynamoDBKey-Value/DocumentEventual (tunable)AutomaticServerless, predictable performance at scale
ElasticsearchSearch engineEventualHorizontalFull-text search, log analytics
Neo4jGraphStrong (ACID)Limited horizontalSocial networks, fraud detection
ClickHouseColumn-orientedEventualHorizontalAnalytics, OLAP, aggregation queries
InfluxDBTime-seriesEventualHorizontalMetrics, monitoring, IoT

Database Scaling Patterns

Read Replicas

Add follower databases that handle read queries, offloading the primary.

Application Server
├── Writes ──► Primary DB
└── Reads ───► Read Replica 1
───► Read Replica 2
───► Read Replica 3

Best for: Read-heavy workloads (e.g., 90% reads, 10% writes).

Connection Pooling

Reuse database connections instead of creating new ones for each request.

# Without pooling: new connection per request (expensive)
def handle_request():
conn = psycopg2.connect(...) # ~50ms overhead
# ... execute query ...
conn.close()
# With pooling: reuse connections (fast)
pool = psycopg2.pool.ThreadedConnectionPool(minconn=5, maxconn=20, ...)
def handle_request():
conn = pool.getconn() # ~0ms, reused connection
# ... execute query ...
pool.putconn(conn)

Query Optimization Checklist

-- 1. Use EXPLAIN ANALYZE to understand query plans
EXPLAIN ANALYZE
SELECT * FROM orders WHERE user_id = 123 AND status = 'active';
-- 2. Add indexes for frequently filtered columns
CREATE INDEX idx_orders_user_status ON orders(user_id, status);
-- 3. Avoid SELECT * — fetch only needed columns
SELECT id, total, created_at FROM orders WHERE user_id = 123;
-- 4. Use pagination instead of fetching all rows
SELECT * FROM orders WHERE user_id = 123
ORDER BY created_at DESC
LIMIT 20 OFFSET 40;
-- 5. Use cursor-based pagination for large datasets
SELECT * FROM orders
WHERE user_id = 123 AND created_at < '2024-01-15'
ORDER BY created_at DESC
LIMIT 20;

Summary

SQL vs NoSQL

SQL for structured data, strong consistency, and complex queries. NoSQL for flexible schemas, horizontal scaling, and specific access patterns. Often the best approach is a polyglot persistence strategy — use the right database for each use case.

ACID and Isolation

ACID guarantees reliability for transactions. Understand isolation levels to balance correctness and performance. Most production systems use Read Committed as a practical default.

Indexing

Indexes are the single most impactful optimization for read performance. B-Tree indexes cover most use cases. Always analyze query patterns before creating indexes — every index has a write cost.

Sharding and Replication

Replication for read scaling and availability. Sharding for write scaling and storage. Both add operational complexity — scale vertically first, then add read replicas, then shard only when necessary.