Connection Pooling & Load Balancing
Opening a new database connection or TCP socket for every request is expensive. Connection pooling maintains a set of reusable connections that are checked out when needed and returned when done, amortizing the setup cost across many requests. Load balancing distributes those requests across multiple servers. Together, pooling and load balancing are essential for building scalable, high-throughput systems.
Why Connection Pooling Matters
The Cost of a New Connection
Establishing a database connection involves multiple round trips:
Without pooling (every request):
App ──── TCP handshake (3-way) ────► DB ~1 ms App ──── TLS handshake ────────────► DB ~2 ms App ──── Authentication ───────────► DB ~1 ms App ──── Query ────────────────────► DB ~5 ms App ──── Close connection ─────────► DB ~0.5 ms Total: ~9.5 ms
With pooling (reused connection):
App ──── Checkout from pool ───────► Pool ~0.01 ms App ──── Query ────────────────────► DB ~5 ms App ──── Return to pool ──────────► Pool ~0.01 ms Total: ~5.02 ms
Savings: ~47% latency reduction per requestAt 1000 req/s: 4,480 ms of overhead eliminated per secondConnection Pool Architecture
┌──────────────────────────────────────────────────┐│ Application ││ ││ Thread 1 ─┐ ││ Thread 2 ─┤ ┌────────────────────────────┐ ││ Thread 3 ─┼──►│ Connection Pool │ ││ Thread 4 ─┤ │ │ ││ Thread N ─┘ │ ┌────┐ ┌────┐ ┌────┐ │ ││ │ │Conn│ │Conn│ │Conn│ ... │ ││ │ │ 1 │ │ 2 │ │ 3 │ │ ││ │ └──┬─┘ └──┬─┘ └──┬─┘ │ ││ └─────┼──────┼──────┼────────┘ │└──────────────────────┼──────┼──────┼──────────────┘ │ │ │ ▼ ▼ ▼ ┌──────────────────────┐ │ Database │ └──────────────────────┘
Pool States: ● Active (checked out by a thread) ○ Idle (available for checkout) ✕ Closed (removed from pool)Database Connection Pools
# SQLAlchemy connection pool (most common in Python)from sqlalchemy import create_engine
engine = create_engine( "postgresql://user:pass@localhost:5432/mydb", pool_size=10, # Steady-state pool size max_overflow=20, # Extra connections beyond pool_size pool_timeout=30, # Seconds to wait for a connection pool_recycle=1800, # Recycle connections after 30 min pool_pre_ping=True, # Test connections before use)
# Usage: connections are automatically pooledwith engine.connect() as conn: result = conn.execute("SELECT * FROM users WHERE id = %s", 42) user = result.fetchone()# Connection returned to pool automatically
# psycopg2 pool (lower-level)from psycopg2 import pool
connection_pool = pool.ThreadedConnectionPool( minconn=5, # Minimum connections maintained maxconn=20, # Maximum connections allowed host="localhost", database="mydb", user="user", password="pass")
# Checkoutconn = connection_pool.getconn()try: cursor = conn.cursor() cursor.execute("SELECT * FROM users WHERE id = %s", (42,)) user = cursor.fetchone() conn.commit()finally: # Return to pool (do not close!) connection_pool.putconn(conn)// pg (node-postgres) built-in poolconst { Pool } = require('pg');
const pool = new Pool({ host: 'localhost', database: 'mydb', user: 'user', password: 'pass', max: 20, // Maximum connections idleTimeoutMillis: 30000, // Close idle connections after 30s connectionTimeoutMillis: 5000, // Timeout waiting for connection});
// Method 1: Automatic checkout/returnasync function getUser(userId) { const result = await pool.query( 'SELECT * FROM users WHERE id = $1', [userId] ); return result.rows[0]; // Connection returned to pool automatically}
// Method 2: Manual checkout (for transactions)async function transferFunds(from, to, amount) { const client = await pool.connect(); try { await client.query('BEGIN'); await client.query( 'UPDATE accounts SET balance = balance - $1 WHERE id = $2', [amount, from] ); await client.query( 'UPDATE accounts SET balance = balance + $1 WHERE id = $2', [amount, to] ); await client.query('COMMIT'); } catch (err) { await client.query('ROLLBACK'); throw err; } finally { client.release(); // Return to pool }}
// Monitor pool healthpool.on('error', (err) => { console.error('Idle client error:', err.message);});
setInterval(() => { console.log({ total: pool.totalCount, idle: pool.idleCount, waiting: pool.waitingCount, });}, 10000);// HikariCP: the fastest Java connection poolimport com.zaxxer.hikari.HikariConfig;import com.zaxxer.hikari.HikariDataSource;
HikariConfig config = new HikariConfig();config.setJdbcUrl("jdbc:postgresql://localhost:5432/mydb");config.setUsername("user");config.setPassword("pass");
// Pool sizingconfig.setMinimumIdle(5); // Minimum idle connectionsconfig.setMaximumPoolSize(20); // Maximum total connectionsconfig.setConnectionTimeout(30000); // 30s wait for connectionconfig.setIdleTimeout(600000); // 10 min idle before closeconfig.setMaxLifetime(1800000); // 30 min max connection ageconfig.setLeakDetectionThreshold(60000); // Warn if held > 60s
// Connection validationconfig.setConnectionTestQuery("SELECT 1");
HikariDataSource dataSource = new HikariDataSource(config);
// Usage with try-with-resourcestry (Connection conn = dataSource.getConnection()) { PreparedStatement stmt = conn.prepareStatement( "SELECT * FROM users WHERE id = ?" ); stmt.setInt(1, 42); ResultSet rs = stmt.executeQuery(); if (rs.next()) { String name = rs.getString("name"); }} // Connection returned to pool automatically
// Monitor pool metricsHikariPoolMXBean poolMXBean = dataSource.getHikariPoolMXBean();System.out.println("Active: " + poolMXBean.getActiveConnections());System.out.println("Idle: " + poolMXBean.getIdleConnections());System.out.println("Waiting: " + poolMXBean.getThreadsAwaitingConnection());Pool Sizing
Choosing the right pool size is critical. Too few connections cause request queueing; too many overwhelm the database.
The Formula
A widely cited guideline from the HikariCP documentation:
Optimal pool size = (core_count * 2) + effective_spindle_count
Where: core_count = number of CPU cores on the DB server effective_spindle_count = number of disk spindles (SSDs count as 0)
Example: DB server: 4 cores, SSD storage Pool size = (4 * 2) + 0 = 8 connections
This is a starting point. Benchmark and adjust based on actual workload.Monitoring Pool Health
| Metric | Healthy | Warning | Critical |
|---|---|---|---|
| Active connections | less than 70% of max | 70-90% of max | greater than 90% of max |
| Wait time for connection | less than 10ms | 10-100ms | greater than 100ms |
| Connection creation rate | Low (pool reuses) | Moderate | High (pool too small or leaking) |
| Idle connections | Matches minIdle | Fluctuates normally | 0 (pool exhausted) |
| Connection age | Within maxLifetime | Approaching limit | Exceeding limit |
HTTP Keep-Alive
HTTP keep-alive (persistent connections) reuses TCP connections for multiple HTTP requests instead of opening a new connection for each request.
Without Keep-Alive: Request 1: TCP handshake → HTTP request → response → close Request 2: TCP handshake → HTTP request → response → close Request 3: TCP handshake → HTTP request → response → close
With Keep-Alive: TCP handshake → HTTP request 1 → response 1 → HTTP request 2 → response 2 → HTTP request 3 → response 3 → ... (connection reused) → close (after idle timeout)
Savings: 2 round trips per request (TCP + TLS handshake)HTTP/1.1 vs HTTP/2
| Feature | HTTP/1.1 Keep-Alive | HTTP/2 |
|---|---|---|
| Connections per host | 6-8 (browser limit) | 1 (multiplexed) |
| Request ordering | Sequential per connection | Parallel streams |
| Head-of-line blocking | Yes (per connection) | No (at HTTP level) |
| Header compression | No | HPACK |
| Server push | No | Yes |
Load Balancing Algorithms
When you have multiple backend servers, a load balancer distributes incoming requests across them. The algorithm determines which server handles each request.
┌──────────┐ │ Client │ └────┬─────┘ │ ┌────┴─────┐ │ Load │ │ Balancer │ └────┬─────┘ ┌────────┼────────┐ ▼ ▼ ▼ ┌─────────┐ ┌─────────┐ ┌─────────┐ │Server A │ │Server B │ │Server C │ └─────────┘ └─────────┘ └─────────┘Round Robin
Sends each request to the next server in sequence.
Request 1 → Server ARequest 2 → Server BRequest 3 → Server CRequest 4 → Server A (wraps around)Request 5 → Server B...- Pros: Simple, even distribution when servers are identical
- Cons: Ignores server load; a slow server gets the same traffic as a fast one
Weighted Round Robin
Like round robin, but servers with higher weights receive more requests.
Weights: A=5, B=3, C=2 (total=10)
Server A handles 50% of requestsServer B handles 30% of requestsServer C handles 20% of requestsLeast Connections
Sends each request to the server with the fewest active connections.
Active connections: A=12, B=8, C=15
Next request → Server B (fewest active)
After: A=12, B=9, C=15- Pros: Adapts to server speed; slow servers naturally get fewer requests
- Cons: More overhead (must track active connections)
IP Hash
Hashes the client’s IP address to determine which server handles the request. The same client always goes to the same server (session affinity).
hash(client_ip) % num_servers = server_index
hash("192.168.1.100") % 3 = 1 → Server B (always)hash("10.0.0.50") % 3 = 0 → Server A (always)- Pros: Session stickiness without cookies
- Cons: Uneven distribution if some IPs are more active; adding/removing servers reshuffles most clients
Consistent Hashing
Maps both servers and keys to a hash ring. Each key is handled by the next server clockwise on the ring. Adding or removing a server only redistributes keys near that server, not all keys.
Hash Ring:
0 │ S3 ──┤── S1 S1 handles keys from S3 to S1 │ S2 handles keys from S1 to S2 │ S3 handles keys from S2 to S3 S2 ──┘
Adding S4 between S1 and S2: Only keys between S1 and S4 move from S2 to S4. All other keys stay on their current server.Algorithm Comparison
| Algorithm | Distribution | Session Affinity | Adaptability | Complexity |
|---|---|---|---|---|
| Round Robin | Even (static) | No | None | Very low |
| Weighted Round Robin | Proportional | No | None | Low |
| Least Connections | Adaptive | No | High | Medium |
| IP Hash | Varies | Yes (by IP) | Low | Low |
| Consistent Hashing | Even (dynamic) | Yes (by key) | High | Medium |
| Random | Statistically even | No | None | Very low |
import hashlibfrom bisect import bisect_right
class ConsistentHashRing: """Consistent hashing with virtual nodes."""
def __init__(self, nodes=None, replicas=150): self.replicas = replicas self.ring = {} # hash -> node self.sorted_keys = []
if nodes: for node in nodes: self.add_node(node)
def _hash(self, key: str) -> int: return int(hashlib.md5( key.encode() ).hexdigest(), 16)
def add_node(self, node: str): for i in range(self.replicas): virtual_key = f"{node}:vn{i}" h = self._hash(virtual_key) self.ring[h] = node self.sorted_keys.append(h) self.sorted_keys.sort()
def remove_node(self, node: str): for i in range(self.replicas): virtual_key = f"{node}:vn{i}" h = self._hash(virtual_key) del self.ring[h] self.sorted_keys.remove(h)
def get_node(self, key: str) -> str: if not self.ring: return None h = self._hash(key) idx = bisect_right(self.sorted_keys, h) if idx == len(self.sorted_keys): idx = 0 # Wrap around return self.ring[self.sorted_keys[idx]]
# Usagering = ConsistentHashRing(["server-a", "server-b", "server-c"])print(ring.get_node("user:123")) # server-bprint(ring.get_node("user:456")) # server-a
# Adding a server only moves ~1/N of the keysring.add_node("server-d")print(ring.get_node("user:123")) # may still be server-bconst crypto = require('crypto');
class ConsistentHashRing { constructor(nodes = [], replicas = 150) { this.replicas = replicas; this.ring = new Map(); this.sortedKeys = [];
nodes.forEach(node => this.addNode(node)); }
_hash(key) { return crypto.createHash('md5') .update(key) .digest('hex'); }
addNode(node) { for (let i = 0; i < this.replicas; i++) { const virtualKey = `${node}:vn${i}`; const hash = this._hash(virtualKey); this.ring.set(hash, node); this.sortedKeys.push(hash); } this.sortedKeys.sort(); }
removeNode(node) { for (let i = 0; i < this.replicas; i++) { const virtualKey = `${node}:vn${i}`; const hash = this._hash(virtualKey); this.ring.delete(hash); this.sortedKeys = this.sortedKeys.filter(k => k !== hash); } }
getNode(key) { if (this.ring.size === 0) return null; const hash = this._hash(key); let idx = this.sortedKeys.findIndex(k => k >= hash); if (idx === -1) idx = 0; // Wrap around return this.ring.get(this.sortedKeys[idx]); }}
const ring = new ConsistentHashRing([ 'server-a', 'server-b', 'server-c']);console.log(ring.getNode('user:123')); // server-bconsole.log(ring.getNode('user:456')); // server-aThread Pools
A thread pool maintains a fixed set of worker threads that process tasks from a shared queue. This avoids the overhead of creating and destroying threads for each task.
┌─────────────────────────────────────────────────┐│ Thread Pool ││ ││ ┌──────────────────────────────────────┐ ││ │ Task Queue │ ││ │ [Task5] [Task4] [Task3] [Task2] │ ││ └──────────────────┬───────────────────┘ ││ │ ││ ┌───────────┼───────────┐ ││ ▼ ▼ ▼ ││ ┌─────────┐ ┌─────────┐ ┌─────────┐ ││ │Worker 1 │ │Worker 2 │ │Worker 3 │ ││ │(Task 1) │ │ (idle) │ │(Task 6) │ ││ └─────────┘ └─────────┘ └─────────┘ │└─────────────────────────────────────────────────┘
Workers pull tasks from the queue.Idle workers wait for new tasks.If all workers are busy, tasks queue up.Thread Pool Sizing
| Workload Type | Recommended Pool Size | Reason |
|---|---|---|
| CPU-bound | Number of CPU cores | More threads cause context-switching overhead |
| I/O-bound | Cores * (1 + wait_time / compute_time) | Threads spend time waiting, so more threads keep CPUs busy |
| Mixed | Profile and tune | Start with 2x cores and adjust |
Example: I/O-bound workload CPU cores: 4 Average wait time per task: 200ms (DB query) Average compute time per task: 10ms
Pool size = 4 * (1 + 200/10) = 4 * 21 = 84 threads
This keeps all 4 cores busy despite the long I/O waits.from concurrent.futures import ThreadPoolExecutor, as_completedimport time
def fetch_url(url: str) -> str: """Simulate an I/O-bound task.""" time.sleep(0.5) # Simulate network latency return f"Response from {url}"
urls = [f"https://api.example.com/item/{i}" for i in range(20)]
# ThreadPoolExecutor: built-in thread poolwith ThreadPoolExecutor(max_workers=10) as executor: # Submit all tasks future_to_url = { executor.submit(fetch_url, url): url for url in urls }
# Collect results as they complete for future in as_completed(future_to_url): url = future_to_url[future] try: result = future.result() print(f"{url}: {result}") except Exception as exc: print(f"{url} raised {exc}")
# Without pool: 20 * 0.5s = 10 seconds# With 10 threads: ~1 second (20/10 * 0.5s)
# ProcessPoolExecutor: for CPU-bound tasksfrom concurrent.futures import ProcessPoolExecutor
def compute_heavy(n: int) -> int: return sum(i ** 2 for i in range(n))
with ProcessPoolExecutor(max_workers=4) as executor: results = list(executor.map(compute_heavy, [10**6] * 8))// Node.js worker_threads pool (for CPU-bound work)const { Worker, isMainThread, workerData } = require('worker_threads');const os = require('os');
class ThreadPool { constructor(workerScript, poolSize = os.cpus().length) { this.workerScript = workerScript; this.poolSize = poolSize; this.workers = []; this.queue = [];
for (let i = 0; i < poolSize; i++) { this._createWorker(); } }
_createWorker() { const worker = new Worker(this.workerScript); worker.busy = false; worker.on('message', (result) => { worker.busy = false; worker.resolve(result); this._processQueue(); }); worker.on('error', (err) => { worker.busy = false; worker.reject(err); this._processQueue(); }); this.workers.push(worker); }
execute(data) { return new Promise((resolve, reject) => { const task = { data, resolve, reject }; const freeWorker = this.workers.find(w => !w.busy); if (freeWorker) { freeWorker.busy = true; freeWorker.resolve = resolve; freeWorker.reject = reject; freeWorker.postMessage(data); } else { this.queue.push(task); } }); }
_processQueue() { if (this.queue.length === 0) return; const freeWorker = this.workers.find(w => !w.busy); if (!freeWorker) return; const task = this.queue.shift(); freeWorker.busy = true; freeWorker.resolve = task.resolve; freeWorker.reject = task.reject; freeWorker.postMessage(task.data); }}import java.util.concurrent.*;
public class ThreadPoolExample { public static void main(String[] args) throws Exception { // Fixed thread pool: best for known, stable workloads ExecutorService fixedPool = Executors.newFixedThreadPool( Runtime.getRuntime().availableProcessors() );
// Custom thread pool with full control ThreadPoolExecutor customPool = new ThreadPoolExecutor( 4, // Core pool size 16, // Maximum pool size 60L, // Keep-alive time for idle threads TimeUnit.SECONDS, new LinkedBlockingQueue<>(1000), // Task queue new ThreadPoolExecutor.CallerRunsPolicy() // Backpressure );
// Submit tasks List<Future<String>> futures = new ArrayList<>(); for (int i = 0; i < 100; i++) { final int taskId = i; futures.add(customPool.submit(() -> { Thread.sleep(100); // Simulate work return "Result-" + taskId; })); }
// Collect results for (Future<String> future : futures) { System.out.println(future.get()); }
// Monitor pool health System.out.printf( "Pool: active=%d, completed=%d, queue=%d%n", customPool.getActiveCount(), customPool.getCompletedTaskCount(), customPool.getQueue().size() );
// Virtual threads (Java 21+): no pool needed try (var executor = Executors.newVirtualThreadPerTaskExecutor()) { for (int i = 0; i < 10_000; i++) { executor.submit(() -> { // Each virtual thread is lightweight (~1 KB) Thread.sleep(100); return "done"; }); } }
customPool.shutdown(); fixedPool.shutdown(); }}Connection Pool Anti-Patterns
Real-World Architecture
┌─────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────┐│ │ │ │ │ │ │ ││ Clients │────►│ Load Balancer│────►│ App Servers │────►│ Database ││ │ │ (L7/L4) │ │ (Pool: 20) │ │ (max: 100││ │ │ │ │ │ │ conn) │└─────────┘ └──────────────┘ └──────────────┘ └──────────┘
Sizing example: 5 app servers * 20 connections each = 100 DB connections Database max_connections = 100 (match!)
If you add a 6th server: 6 * 20 = 120 > 100 Solution: use a connection proxy (PgBouncer, ProxySQL)
┌──────────┐ ┌───────────┐ ┌──────────┐│ App (20) │────►│ │ │ ││ App (20) │────►│ PgBouncer │────►│ Database ││ App (20) │────►│ (pool: 50)│ │ (max:100)││ App (20) │────►│ │ │ ││ App (20) │────►│ │ │ │└──────────┘ └───────────┘ └──────────┘ 100 app-side 50 actual DB connections connections (multiplexed)Summary
| Concept | Key Takeaway |
|---|---|
| Connection pool | Reuses connections to avoid per-request setup overhead |
| Pool sizing | Start with (cores * 2) + spindles; benchmark and adjust |
| HTTP keep-alive | Reuses TCP connections for multiple HTTP requests |
| Round robin | Simple, even distribution; ignores server load |
| Least connections | Adaptive; sends requests to the least-busy server |
| Consistent hashing | Minimal key redistribution when servers are added/removed |
| Thread pool | Pre-created worker threads that process tasks from a shared queue |
| Connection proxy | Multiplexes many app connections into fewer DB connections |