Skip to content

Transactions, ACID & NoSQL

What Is a Transaction?

A transaction is a sequence of one or more database operations that are treated as a single, indivisible unit of work. Either all operations succeed and are permanently saved, or none of them take effect.

The classic example is a bank transfer:

-- Transfer $500 from Alice's account to Bob's account
BEGIN;
UPDATE accounts SET balance = balance - 500 WHERE owner = 'Alice';
UPDATE accounts SET balance = balance + 500 WHERE owner = 'Bob';
COMMIT;

If the system crashes between the two UPDATE statements, without transactions you would have $500 vanish into thin air — Alice’s balance decreased but Bob’s never increased. Transactions prevent this.

ACID Properties

Every reliable relational database guarantees four properties for transactions, collectively known as ACID:

Atomicity — “All or Nothing”

All operations within a transaction either complete successfully or are entirely rolled back. There is no partial state.

BEGIN;
INSERT INTO orders (customer_id, order_date) VALUES (1, '2024-03-15');
-- Suppose this fails due to a constraint violation:
INSERT INTO order_items (order_id, product_id, quantity)
VALUES (999, 9999, 1); -- product_id 9999 doesn't exist
ROLLBACK; -- Automatically rolled back; the order is NOT created

Consistency — “Valid State to Valid State”

A transaction moves the database from one valid state to another. All constraints (foreign keys, unique constraints, check constraints) must be satisfied after the transaction completes. If any constraint is violated, the entire transaction is rolled back.

-- The database enforces that account balances cannot go negative
ALTER TABLE accounts ADD CONSTRAINT positive_balance CHECK (balance >= 0);
BEGIN;
UPDATE accounts SET balance = balance - 10000 WHERE owner = 'Alice';
-- If Alice only has $500, this violates the CHECK constraint
-- The transaction is rolled back, maintaining consistency
COMMIT;

Isolation — “Concurrent Transactions Do Not Interfere”

Even when multiple transactions execute concurrently, each transaction behaves as if it were running alone. The isolation level determines exactly how much of other transactions’ work is visible.

Durability — “Committed Data Survives Failures”

Once a transaction is committed, the changes are permanently saved — even if the system crashes immediately afterward. Databases achieve this through write-ahead logging (WAL): changes are written to a durable log before being applied to the actual data files.

Transaction Lifecycle:
BEGIN ──→ Operation 1 ──→ Operation 2 ──→ ... ──→ COMMIT
│ │
│ (changes are tentative) │
│ ↓
│ Changes are
│ permanent (durable)
└──→ ROLLBACK (at any point before COMMIT)
All changes undone

Isolation Levels and Concurrency Anomalies

Isolation levels represent a trade-off between correctness and performance. Higher isolation prevents more anomalies but reduces concurrency (throughput).

Concurrency Anomalies

Three types of problems can occur when transactions run concurrently:

Dirty Read: Transaction A reads data that Transaction B has modified but not yet committed. If B rolls back, A has read data that never existed.

Timeline:
T1: BEGIN
T1: UPDATE products SET price = 25.00 WHERE id = 1; (was 20.00)
T2: BEGIN
T2: SELECT price FROM products WHERE id = 1;
T2: → reads 25.00 ← DIRTY READ (T1 hasn't committed)
T1: ROLLBACK (price goes back to 20.00)
T2: ... uses 25.00, which was never real

Non-Repeatable Read: Transaction A reads a row, Transaction B modifies and commits it, then A reads the same row again and gets a different value.

Timeline:
T1: BEGIN
T1: SELECT price FROM products WHERE id = 1; → 20.00
T2: BEGIN
T2: UPDATE products SET price = 25.00 WHERE id = 1;
T2: COMMIT
T1: SELECT price FROM products WHERE id = 1; → 25.00 ← DIFFERENT!
T1: (same query, different result within the same transaction)

Phantom Read: Transaction A queries a set of rows, Transaction B inserts or deletes rows that match A’s query, then A runs the same query again and gets different rows.

Timeline:
T1: BEGIN
T1: SELECT COUNT(*) FROM orders WHERE status = 'pending'; → 5
T2: BEGIN
T2: INSERT INTO orders (status) VALUES ('pending');
T2: COMMIT
T1: SELECT COUNT(*) FROM orders WHERE status = 'pending'; → 6 ← PHANTOM!

Isolation Level Matrix

Isolation LevelDirty ReadNon-Repeatable ReadPhantom ReadPerformance
Read UncommittedPossiblePossiblePossibleFastest
Read CommittedPreventedPossiblePossibleFast
Repeatable ReadPreventedPreventedPossibleModerate
SerializablePreventedPreventedPreventedSlowest

Setting Isolation Levels

-- Set for a single transaction
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
SELECT * FROM accounts WHERE owner = 'Alice';
UPDATE accounts SET balance = balance - 100 WHERE owner = 'Alice';
COMMIT;
-- Set for the entire session
SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL REPEATABLE READ;

MVCC — Multi-Version Concurrency Control

Most modern databases (PostgreSQL, MySQL InnoDB, Oracle) use MVCC to provide isolation without heavy locking. Instead of blocking readers when a writer modifies data, MVCC keeps multiple versions of each row.

How MVCC Works:
Row: product_id=1, price=20.00
T1 (writer): UPDATE price = 25.00
┌─────────────────────────────────────────────┐
│ Version 1: price=20.00 (visible to T2) │
│ Version 2: price=25.00 (visible only to T1 │
│ until COMMIT) │
└─────────────────────────────────────────────┘
T2 (reader): SELECT price → sees 20.00 (Version 1)
(T2 is never blocked by T1's write)
T1: COMMIT
┌─────────────────────────────────────────────┐
│ Version 1: price=20.00 (old, can be GC'd) │
│ Version 2: price=25.00 (now visible to all) │
└─────────────────────────────────────────────┘
T2: SELECT price → sees 25.00 (at Read Committed level)
or still 20.00 (at Repeatable Read / snapshot level)

Key benefits of MVCC:

  • Readers never block writers
  • Writers never block readers
  • Only writers block other writers (on the same row)

Transaction Code Examples

-- Bank transfer with error handling
BEGIN;
SAVEPOINT before_transfer;
UPDATE accounts SET balance = balance - 500
WHERE owner = 'Alice' AND balance >= 500;
-- Check if the debit succeeded (row was found and updated)
-- In a real application, your code checks the affected row count
UPDATE accounts SET balance = balance + 500
WHERE owner = 'Bob';
-- If anything went wrong:
-- ROLLBACK TO SAVEPOINT before_transfer;
COMMIT;

NoSQL Databases

NoSQL (“Not Only SQL”) databases emerged to handle use cases where relational databases struggle: massive scale, flexible schemas, high write throughput, and specific access patterns. They trade some relational guarantees for other advantages.

Document Databases

Store data as JSON/BSON documents. Each document is self-contained and can have a different structure.

Best for: Content management, user profiles, product catalogs, event logging

Representative database: MongoDB

// MongoDB document example
{
"_id": ObjectId("65a1b2c3d4e5f6a7b8c9d0e1"),
"name": "Alice Johnson",
"email": "alice@example.com",
"addresses": [
{
"type": "home",
"street": "123 Main St",
"city": "New York",
"zip": "10001"
},
{
"type": "work",
"street": "456 Office Blvd",
"city": "New York",
"zip": "10018"
}
],
"orders": [
{
"order_id": "ORD-001",
"date": ISODate("2024-01-15"),
"items": [
{ "product": "Laptop", "price": 999.99, "qty": 1 },
{ "product": "Mouse", "price": 29.99, "qty": 1 }
],
"total": 1029.98
}
]
}
// MongoDB queries
// Find customers in New York
db.customers.find({ "addresses.city": "New York" });
// Aggregation: total spending per customer
db.customers.aggregate([
{ $unwind: "$orders" },
{ $group: {
_id: "$name",
totalSpent: { $sum: "$orders.total" },
orderCount: { $sum: 1 }
}},
{ $sort: { totalSpent: -1 } }
]);

Key-Value Stores

The simplest NoSQL model: every entry is a key mapped to a value. Values are opaque to the database — all operations are by key.

Best for: Caching, session storage, real-time leaderboards, rate limiting, pub/sub messaging

Representative database: Redis

# Redis command examples
# Simple key-value
SET user:1001:name "Alice Johnson"
GET user:1001:name
# → "Alice Johnson"
# Hash (structured data under one key)
HSET user:1001 name "Alice Johnson" email "alice@example.com" city "New York"
HGET user:1001 name
# → "Alice Johnson"
HGETALL user:1001
# → {"name": "Alice Johnson", "email": "alice@example.com", "city": "New York"}
# Sorted set (leaderboard)
ZADD leaderboard 1500 "Alice" 2300 "Bob" 1800 "Carol"
ZREVRANGE leaderboard 0 2 WITHSCORES
# → [("Bob", 2300), ("Carol", 1800), ("Alice", 1500)]
# Expiring keys (caching with TTL)
SET session:abc123 "{\"user_id\": 1001}" EX 3600
# → Key expires automatically after 3600 seconds (1 hour)
# Atomic increment (rate limiting)
INCR api:rate:user:1001
EXPIRE api:rate:user:1001 60
# → Increment counter, expire after 60 seconds

Column-Family Stores

Data is organized by column families (groups of columns). Rows can have different columns, and data is stored column-by-column on disk rather than row-by-row, making them excellent for write-heavy workloads and time-series data.

Best for: Time-series data, IoT sensor data, messaging systems, activity feeds, large-scale logging

Representative database: Apache Cassandra

-- Cassandra CQL (Cassandra Query Language) examples
-- Create a keyspace (like a database)
CREATE KEYSPACE iot_data
WITH replication = {
'class': 'NetworkTopologyStrategy',
'datacenter1': 3
};
-- Table for sensor readings (partitioned by sensor, clustered by time)
CREATE TABLE sensor_readings (
sensor_id UUID,
reading_time TIMESTAMP,
temperature DOUBLE,
humidity DOUBLE,
pressure DOUBLE,
PRIMARY KEY (sensor_id, reading_time)
) WITH CLUSTERING ORDER BY (reading_time DESC);
-- Insert a reading
INSERT INTO sensor_readings (sensor_id, reading_time, temperature, humidity, pressure)
VALUES (uuid(), '2024-03-15T10:30:00Z', 22.5, 45.2, 1013.25);
-- Query: last 24 hours of readings for a specific sensor
SELECT reading_time, temperature, humidity
FROM sensor_readings
WHERE sensor_id = a1b2c3d4-e5f6-7890-abcd-ef1234567890
AND reading_time > '2024-03-14T10:30:00Z';

Graph Databases

Store data as nodes (entities) and edges (relationships). Both nodes and edges can have properties. Traversing relationships is a first-class, highly optimized operation.

Best for: Social networks, recommendation engines, fraud detection, knowledge graphs, access control

Representative database: Neo4j

// Neo4j Cypher query language examples
// Create nodes and relationships
CREATE (alice:Person {name: 'Alice', age: 30})
CREATE (bob:Person {name: 'Bob', age: 28})
CREATE (laptop:Product {name: 'Laptop', price: 999.99})
CREATE (alice)-[:FRIENDS_WITH {since: 2020}]->(bob)
CREATE (alice)-[:PURCHASED {date: '2024-01-15'}]->(laptop)
CREATE (bob)-[:PURCHASED {date: '2024-02-20'}]->(laptop);
// Find Alice's friends
MATCH (alice:Person {name: 'Alice'})-[:FRIENDS_WITH]->(friend)
RETURN friend.name;
// → "Bob"
// Friends of friends (2 hops)
MATCH (alice:Person {name: 'Alice'})-[:FRIENDS_WITH*2]->(fof)
WHERE fof <> alice
RETURN DISTINCT fof.name;
// Recommendation: products bought by Alice's friends but not by Alice
MATCH (alice:Person {name: 'Alice'})-[:FRIENDS_WITH]->(friend)-[:PURCHASED]->(product)
WHERE NOT (alice)-[:PURCHASED]->(product)
RETURN product.name, COUNT(friend) AS friend_count
ORDER BY friend_count DESC;
Graph Visualization:
(Alice)──FRIENDS_WITH──▶(Bob)
│ │
PURCHASED PURCHASED
│ │
▼ ▼
(Laptop) (Laptop)

CAP Theorem

The CAP theorem (Brewer’s theorem) states that a distributed database system can provide at most two out of three guarantees simultaneously:

Consistency
/\
/ \
/ \
/ CA \
/ \
/ CP \
/____________\
Availability ──── Partition
Tolerance
CA: Consistent + Available (not partition tolerant)
→ Traditional RDBMS (single node)
CP: Consistent + Partition-tolerant (may sacrifice availability)
→ MongoDB (with majority write concern), HBase, etcd
AP: Available + Partition-tolerant (may sacrifice consistency)
→ Cassandra, DynamoDB, CouchDB
PropertyMeaning
ConsistencyEvery read receives the most recent write (all nodes see the same data)
AvailabilityEvery request receives a response (no timeouts or errors)
Partition ToleranceThe system continues operating despite network partitions

BASE vs ACID

NoSQL databases often follow the BASE model instead of ACID:

PropertyACIDBASE
Full nameAtomicity, Consistency, Isolation, DurabilityBasically Available, Soft state, Eventually consistent
ConsistencyStrong (immediate)Eventual (converges over time)
AvailabilityMay block during contentionPrioritizes availability
Design goalCorrectness above allAvailability and partition tolerance
Best forFinancial transactions, inventorySocial feeds, analytics, caching

Eventual Consistency in Practice

Eventual Consistency Example (3-node cluster):
Write "price = 25.00" to Node A:
Time 0: Node A: 25.00 Node B: 20.00 Node C: 20.00
Time 1: Node A: 25.00 ──▶ Node B: 25.00 Node C: 20.00
Time 2: Node A: 25.00 Node B: 25.00 ──▶ Node C: 25.00
After replication completes, all nodes are consistent.
During replication, different clients may read different values
depending on which node they connect to.

Decision Framework: SQL vs NoSQL

Use this framework to guide your database choice:

QuestionSQL (Relational)NoSQL
Is your data highly structured with clear relationships?Strong fitPossible but requires denormalization
Do you need complex JOINs and ad-hoc queries?Strong fitLimited or no JOIN support
Do you need strict ACID transactions?Strong fitLimited (some offer per-doc ACID)
Is your schema evolving rapidly?Migrations requiredFlexible schema — strong fit
Do you need horizontal scaling for writes?Difficult (sharding is complex)Designed for this — strong fit
Is your access pattern simple key-based lookups?OverkillPerfect fit (key-value store)
Are relationships between entities the primary query pattern?Possible with JOINsGraph database — strong fit
Do you need sub-millisecond latency?Possible with tuningKey-value stores excel

Practical Decision Tree

Start Here
├── Do you need ACID transactions across multiple entities?
│ └── YES → Use a relational database (PostgreSQL, MySQL)
├── Is your primary access pattern key-based lookup?
│ └── YES → Key-Value store (Redis for cache, DynamoDB for persistence)
├── Is your data document-shaped with nested objects?
│ └── YES → Document database (MongoDB)
├── Are you storing time-series or high-volume append-only data?
│ └── YES → Column-family store (Cassandra, TimescaleDB)
├── Are relationships the core of your data model?
│ └── YES → Graph database (Neo4j)
└── Not sure?
└── Start with PostgreSQL — it handles JSON, full-text search,
and most workloads well. Add specialized stores as needs emerge.

Next Steps