Memory Management

Memory management is one of the most critical functions of an operating system. The OS must efficiently allocate and deallocate memory for processes, protect processes from accessing each other’s memory, and provide the illusion that each process has a large, contiguous address space — even when physical memory is limited and fragmented. Understanding these concepts is essential for debugging performance issues, optimizing systems, and passing OS exams and interviews.

Memory Hierarchy

Modern computer systems use a hierarchy of memory types, trading off speed, size, and cost.

           ┌───────────┐
           │ CPU Regs  │  < 1 ns    ~1 KB
           │           │
           ├───────────┤
           │ L1 Cache  │  ~1 ns     ~64 KB
           │           │
           ├───────────┤
           │ L2 Cache  │  ~4 ns     ~256 KB
           │           │
           ├───────────┤
           │ L3 Cache  │  ~10 ns    ~8-32 MB
           │           │
           ├───────────┤
           │   Main    │  ~100 ns   ~8-64 GB
           │  Memory   │
           │  (RAM)    │
           ├───────────┤
           │   SSD     │  ~100 us   ~256 GB-4 TB
           │           │
           ├───────────┤
           │   HDD     │  ~10 ms    ~1-20 TB
           │           │
           └───────────┘

       Faster, smaller,          Slower, larger,
       more expensive            cheaper

The OS primarily manages the interaction between main memory (RAM) and secondary storage (disk/SSD), using techniques like virtual memory and paging.

Address Binding

Programs reference memory through addresses. The binding of program instructions and data to actual memory addresses can happen at three stages:

Stage	When	Flexibility	Example
Compile Time	Addresses determined at compile time	None — must recompile to change	MS-DOS .COM programs
Load Time	Addresses determined when program is loaded into memory	Moderate — can load at different addresses	Relocatable code
Execution Time	Addresses translated on every memory access at runtime	Maximum — process can be moved during execution	All modern OSes use this

Modern systems use execution-time binding with hardware support from the Memory Management Unit (MMU).

Logical vs Physical Addresses

Concept	Logical (Virtual) Address	Physical Address
Generated by	CPU during program execution	Memory unit (actual RAM location)
Visible to	User programs	Hardware only
Range	0 to max virtual address space	0 to max physical memory
Translation	MMU translates logical → physical	Hardware operates on physical addresses

┌─────────┐     Logical        ┌─────────┐     Physical      ┌──────────┐
│   CPU   │────Address────────>│   MMU   │────Address────────>│  Memory  │
│         │    (virtual)       │         │    (physical)      │  (RAM)   │
└─────────┘                    └─────────┘                    └──────────┘

Example with base register (simplest form):
  Logical address: 346
  Base register:   14000
  Physical address: 14000 + 346 = 14346

Memory Allocation Strategies

When a process needs a contiguous block of memory, the OS must find a suitable free block (hole). Three common strategies:

Memory state:
┌────────┬──────┬────────┬──────┬────────┬──────┐
│ In Use │ Free │ In Use │ Free │ In Use │ Free │
│  8 KB  │ 20KB │  6 KB  │ 12KB │  4 KB  │ 18KB │
└────────┴──────┴────────┴──────┴────────┴──────┘

Request: 15 KB block

Strategy	Description	Selection	Result
First Fit	Allocate the first hole that is big enough	Scan from start, pick first fit	Uses 20 KB hole (fast, leaves 5 KB fragment)
Best Fit	Allocate the smallest hole that is big enough	Search all holes, pick tightest fit	Uses 18 KB hole (leaves 3 KB fragment)
Worst Fit	Allocate the largest hole	Search all holes, pick largest	Uses 20 KB hole (leaves 5 KB fragment)

First Fit is generally the fastest and produces results comparable to Best Fit. Best Fit tends to produce the smallest leftover fragments. Worst Fit performs poorly in practice.

Fragmentation

Type	Description	Cause	Solution
External Fragmentation	Total free memory is sufficient, but it is not contiguous	Repeated allocation and deallocation of variable-size blocks	Compaction, paging
Internal Fragmentation	Allocated memory is slightly larger than requested; the extra space is wasted	Fixed-size allocation units (e.g., pages)	Smaller allocation units (trade-off with overhead)

External Fragmentation:
┌──────┬──────┬──────┬──────┬──────┬──────┬──────┐
│ Used │ Free │ Used │ Free │ Used │ Free │ Used │
│ 4KB  │ 3KB  │ 6KB  │ 4KB  │ 2KB  │ 5KB  │ 8KB  │
└──────┴──────┴──────┴──────┴──────┴──────┴──────┘
Total free: 3+4+5 = 12 KB  but largest contiguous block = 5 KB
Request for 10 KB fails despite 12 KB free!

Internal Fragmentation:
┌─────────────────────────────────┐
│ Page size = 4 KB                │
│ Process needs 14,500 bytes      │
│ Allocated: 4 pages = 16,384 B   │
│ Wasted: 16,384 - 14,500 = 1,884 bytes (internal fragmentation) │
└─────────────────────────────────┘

Paging

Paging eliminates external fragmentation by dividing memory into fixed-size blocks:

Logical memory is divided into fixed-size blocks called pages
Physical memory is divided into fixed-size blocks called frames (same size as pages)
A page table maps each page to its corresponding frame

Typical page size: 4 KB (most common on x86 systems).

Logical Address Space              Physical Memory (RAM)
(Process View)                     (Actual Hardware)

┌──────────┐ Page 0               ┌──────────┐ Frame 0
│ Code     │──────────────┐       │ OS Data  │
├──────────┤ Page 1       │       ├──────────┤ Frame 1
│ Data     │──────────┐   │       │  Page 2  │◄────────────┐
├──────────┤ Page 2   │   │       ├──────────┤ Frame 2     │
│ Heap     │──────┐   │   │       │  Page 0  │◄────┐       │
├──────────┤      │   │   │       ├──────────┤     │       │
│ Stack    │──┐   │   │   │       │ (other)  │     │       │
└──────────┘  │   │   │   │       ├──────────┤     │       │
              │   │   │   │       │  Page 1  │◄──┐ │       │
   Page Table │   │   │   │       ├──────────┤   │ │       │
  ┌───┬─────┐ │   │   │   │       │  Page 3  │◄┐ │ │       │
  │ 0 │  2  │─┼───┼───┼───┘       ├──────────┤ │ │ │       │
  ├───┼─────┤ │   │   │           │ (other)  │ │ │ │       │
  │ 1 │  4  │─┼───┼───┘           └──────────┘ │ │ │       │
  ├───┼─────┤ │   │                             │ │ │       │
  │ 2 │  1  │─┼───┘                      Frame 5│ │ Frame 4│
  ├───┼─────┤ │                                  │ │        │
  │ 3 │  5  │─┘                                  │ │        │
  └───┴─────┘                                    │ │        │
       │                                         │ │        │
       └─────────────────────────────────────────┘ └────────┘

Address Translation with Paging

A logical address is split into two parts:

Logical Address (m bits):
┌─────────────────────┬──────────────────┐
│     Page Number      │   Page Offset    │
│     (p bits)         │   (d bits)       │
└──────────┬──────────┴────────┬─────────┘
           │                   │
           v                   │
      ┌─────────┐              │
      │  Page   │              │
      │  Table  │              │
      └────┬────┘              │
           │                   │
           v                   v
┌─────────────────────┬──────────────────┐
│    Frame Number      │   Page Offset    │
│     (f bits)         │   (d bits)       │
└─────────────────────┴──────────────────┘
Physical Address

Example: Page size = 4 KB (2^12), logical address = 13,500

Page number = 13,500 / 4,096 = 3 (integer division)
Page offset = 13,500 % 4,096 = 1,212
Look up page 3 in page table, find frame number (e.g., frame 5)
Physical address = (5 * 4,096) + 1,212 = 21,692

Translation Lookaside Buffer (TLB)

Every memory access in a paged system requires two memory accesses: one to read the page table, one to access the actual data. This doubles memory access time. The TLB is a fast, hardware cache that stores recent page table entries.

                  ┌──────────────────────────┐
                  │   CPU generates logical   │
                  │   address (page, offset)  │
                  └────────────┬─────────────┘
                               │
                               v
                  ┌──────────────────────────┐
                  │   Check TLB for page     │
                  │   number                  │
                  └──────┬──────────┬────────┘
                         │          │
                    TLB Hit      TLB Miss
                    (fast!)      (slower)
                         │          │
                         │          v
                         │   ┌──────────────┐
                         │   │ Access page   │
                         │   │ table in RAM  │
                         │   └──────┬───────┘
                         │          │
                         │          │ Update TLB
                         │          │
                         v          v
                  ┌──────────────────────────┐
                  │   Use frame number +     │
                  │   offset to access       │
                  │   physical memory        │
                  └──────────────────────────┘

TLB Performance

TLB hit ratio (h): Fraction of address translations found in TLB (typically 95-99%)
TLB access time: ~1 ns
Memory access time: ~100 ns

Effective Access Time (EAT):

EAT = h * (TLB_time + memory_time) + (1 - h) * (TLB_time + 2 * memory_time)

Example with h = 0.98:
EAT = 0.98 * (1 + 100) + 0.02 * (1 + 200)
    = 0.98 * 101 + 0.02 * 201
    = 98.98 + 4.02
    = 103 ns  (only 3% overhead vs no paging!)

Segmentation

Segmentation divides the logical address space into variable-sized segments based on the logical structure of a program (code, data, stack, heap, etc.).

Logical Address Space (Segmented)          Physical Memory

Segment 0: Code    (4 KB)                 ┌──────────────┐
Segment 1: Data    (6 KB)                 │              │
Segment 2: Stack   (8 KB)                 ├──────────────┤
Segment 3: Heap    (10 KB)                │  Segment 0   │ (4 KB)
                                          ├──────────────┤
Segment Table:                            │              │
┌─────┬──────┬───────┐                    ├──────────────┤
│ Seg │ Base │ Limit │                    │  Segment 2   │ (8 KB)
├─────┼──────┼───────┤                    │              │
│  0  │ 1400 │  4000 │                    ├──────────────┤
│  1  │ 6300 │  6000 │                    │              │
│  2  │ 4400 │  8000 │                    ├──────────────┤
│  3  │ 8700 │ 10000 │                    │  Segment 1   │ (6 KB)
└─────┴──────┴───────┘                    ├──────────────┤
                                          │              │
Logical address: (segment, offset)        ├──────────────┤
Physical address = base[segment] + offset │  Segment 3   │ (10 KB)
(if offset < limit; else → trap)          │              │
                                          └──────────────┘

Paging vs Segmentation

Feature	Paging	Segmentation
Block size	Fixed (e.g., 4 KB pages)	Variable (segments vary in size)
External fragmentation	None	Yes
Internal fragmentation	Yes (last page may be partially filled)	None
Programmer visibility	Transparent to programmer	Programmer-aware (code, data, stack)
Sharing	Page-level sharing	Segment-level sharing (more natural)

Modern systems like x86-64 primarily use paging (segmentation is largely vestigial in 64-bit mode).

Virtual Memory

Virtual memory allows processes to use more memory than is physically available. Not all pages of a process need to be in RAM simultaneously — only the pages currently being accessed. Pages that are not in RAM are stored on disk in a swap space (or page file).

Benefits

Programs can be larger than physical memory
More processes can run concurrently (each only needs a fraction of their pages in RAM)
Simplifies memory allocation (every process gets a clean 0-to-max address space)
Enables memory-mapped files and shared libraries

Demand Paging

With demand paging, a page is loaded into memory only when it is accessed (not preloaded). If a process accesses a page that is not in RAM, a page fault occurs.

Page Fault Handling:

1. CPU generates address → MMU checks page table
2. Page table entry has "valid bit = 0" → PAGE FAULT (trap to OS)
3. OS checks: is this a valid address?
   - No → segmentation fault (terminate process)
   - Yes → page is on disk
4. OS finds a free frame in RAM
   (if no free frame → run page replacement algorithm)
5. OS reads the page from disk into the free frame
6. OS updates the page table (set valid bit = 1, record frame number)
7. Restart the instruction that caused the page fault

Page Fault Performance

Page faults are expensive because disk I/O is involved (milliseconds vs nanoseconds for RAM).

Effective Access Time with page faults:

EAT = (1 - p) * memory_access_time + p * page_fault_time

Where:
  p = page fault rate (0 ≤ p ≤ 1)
  memory_access_time ≈ 100 ns
  page_fault_time ≈ 8 ms (= 8,000,000 ns)

Example with p = 0.001 (1 in 1000 accesses):
EAT = 0.999 * 100 + 0.001 * 8,000,000
    = 99.9 + 8,000
    = 8,099.9 ns (80x slower than no page faults!)

To keep slowdown below 10%:
  110 > (1-p)*100 + p*8,000,000
  10 > p*7,999,900
  p < 0.00000125 (about 1 fault per 800,000 accesses)

Page Replacement Algorithms

When a page fault occurs and there are no free frames, the OS must choose a victim page to evict from RAM. The goal is to minimize the number of future page faults.

FIFO (First-In, First-Out)

Replace the oldest page in memory (the one that has been in RAM the longest).

Example: 3 frames, reference string: 7, 0, 1, 2, 0, 3, 0, 4, 2, 3, 0, 3, 2, 1, 2, 0, 1, 7, 0, 1

Ref:  7  0  1  2  0  3  0  4  2  3  0  3  2  1  2  0  1  7  0  1
     ┌──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┐
F1:  │7 │7 │7 │2 │2 │2 │2 │4 │4 │4 │0 │0 │0 │0 │0 │0 │0 │7 │7 │7 │
     ├──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┤
F2:  │  │0 │0 │0 │0 │3 │3 │3 │2 │2 │2 │2 │2 │1 │1 │1 │1 │1 │0 │0 │
     ├──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┤
F3:  │  │  │1 │1 │1 │1 │0 │0 │0 │3 │3 │3 │3 │3 │2 │2 │2 │2 │2 │1 │
     └──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┘
Fault: *  *  *  *     *  *  *  *  *  *        *  *        *  *  *

Total page faults: 15

FIFO anomaly (Belady’s anomaly): With FIFO, increasing the number of frames can sometimes increase page faults! This counter-intuitive behavior does not occur with LRU or Optimal.

Optimal (OPT)

Replace the page that will not be used for the longest time in the future. This is provably optimal (fewest page faults possible) but requires future knowledge, so it is used only as a benchmark.

Example: Same reference string, 3 frames

Ref:  7  0  1  2  0  3  0  4  2  3  0  3  2  1  2  0  1  7  0  1
     ┌──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┐
F1:  │7 │7 │7 │2 │2 │2 │2 │2 │2 │2 │2 │2 │2 │2 │2 │2 │2 │7 │7 │7 │
     ├──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┤
F2:  │  │0 │0 │0 │0 │0 │0 │4 │4 │4 │0 │0 │0 │0 │0 │0 │0 │0 │0 │0 │
     ├──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┤
F3:  │  │  │1 │1 │1 │3 │3 │3 │3 │3 │3 │3 │3 │1 │1 │1 │1 │1 │1 │1 │
     └──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┘
Fault: *  *  *  *     *     *        *        *           *

Total page faults: 9 (optimal -- no algorithm can do better with 3 frames)

LRU (Least Recently Used)

Replace the page that has not been used for the longest time (uses past behavior to predict future). LRU is a good practical approximation of Optimal.

Example: Same reference string, 3 frames

Ref:  7  0  1  2  0  3  0  4  2  3  0  3  2  1  2  0  1  7  0  1
     ┌──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┬──┐
F1:  │7 │7 │7 │2 │2 │2 │2 │4 │4 │4 │0 │0 │0 │1 │1 │1 │1 │1 │1 │1 │
     ├──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┤
F2:  │  │0 │0 │0 │0 │0 │0 │0 │0 │3 │3 │3 │3 │3 │3 │0 │0 │0 │0 │0 │
     ├──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┼──┤
F3:  │  │  │1 │1 │1 │3 │3 │3 │2 │2 │2 │2 │2 │2 │2 │2 │2 │7 │7 │7 │
     └──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┴──┘
Fault: *  *  *  *     *     *  *  *  *        *     *     *

Total page faults: 12

Clock Algorithm (Second Chance)

A practical approximation of LRU that uses a circular list and a reference bit. Much cheaper to implement than true LRU.

Clock Algorithm:
                      ┌───┐
              ┌──────>│ 0 │──────┐
              │       │ P3│      │
              │       └───┘      │
              │                  │
           ┌──┴┐              ┌──┴┐
           │ 1 │              │ 0 │
           │ P1│              │ P5│
           └───┘              └───┘
              │                  │
              │       ┌───┐      │
              └──────>│ 1 │<─────┘
                      │ P2│
                      └───┘
                        ▲
                     clock hand

Algorithm:
1. Check page at clock hand
2. If reference bit = 0 → replace this page
3. If reference bit = 1 → set bit to 0, advance clock hand
4. Repeat until a victim is found

The reference bit is set to 1 by hardware whenever a page is accessed. The clock algorithm gives each page a “second chance” before eviction.

Page Replacement Comparison

Algorithm	Page Faults (example above)	Belady’s Anomaly	Implementation	Performance
FIFO	15	Yes	Simple queue	Poor
Optimal	9	No	Requires future knowledge	Best (theoretical)
LRU	12	No	Stack or counter-based	Good (close to Optimal)
Clock	~12-13	No	Circular list + ref bit	Good (practical LRU approx)

Thrashing

Thrashing occurs when a process spends more time paging (swapping pages in and out of disk) than executing. This happens when the process does not have enough frames allocated to hold its working set of pages.

CPU Utilization vs Degree of Multiprogramming:

CPU Utilization
  ^
  │           ┌────────
  │          /
  │         /
  │        /   ← Optimal point
  │       /
  │      /
  │     /  ← Thrashing begins here
  │    / \
  │   /   \
  │  /     \
  │ /       ──────────
  │/
  └───────────────────────> Degree of Multiprogramming
                            (number of processes)

Working Set Model

The working set W(t, delta) of a process at time t is the set of pages accessed in the most recent delta time units. The OS should allocate at least enough frames to hold each process’s working set.

If sum of all working sets > total frames available, the OS should suspend a process to free frames
The working set window (delta) is typically set to 10,000-100,000 memory references

Worked Example: Address Translation

Given: Page size = 4 KB (4096 bytes), process has 4 pages, and the following page table:

Page	Frame	Valid
0	5	1
1	3	1
2	-	0
3	8	1

Translate logical address 9000:

Page number = floor(9000 / 4096) = 2
Offset = 9000 mod 4096 = 9000 - 2*4096 = 808
Look up page 2: Valid bit = 0 → PAGE FAULT
OS loads page 2 from disk, assigns frame (say frame 6), updates table
Now: Physical address = 6 * 4096 + 808 = 25,384

Translate logical address 5000:

Page number = floor(5000 / 4096) = 1
Offset = 5000 mod 4096 = 904
Look up page 1: Frame = 3, Valid = 1 → OK
Physical address = 3 * 4096 + 904 = 13,192

Key Takeaways

The memory hierarchy trades off speed, size, and cost — the OS manages the interaction between RAM and disk
Paging eliminates external fragmentation by dividing memory into fixed-size pages and frames
The TLB caches recent page table entries, reducing the double-memory-access penalty to near-zero
Virtual memory lets processes use more memory than physically available through demand paging
Page replacement algorithms determine which page to evict on a page fault — LRU is the best practical choice, Optimal is the theoretical benchmark
FIFO is simple but suffers from Belady’s anomaly; LRU and Clock do not
Thrashing occurs when processes do not have enough frames — the working set model helps prevent it
Address translation: split the logical address into page number and offset, look up the frame in the page table, combine frame and offset

Next Steps

Deadlocks & Synchronization Learn about synchronization primitives, deadlock conditions, and prevention strategies

CPU Scheduling Explore scheduling algorithms that decide which process uses the CPU

Processes & Threads Review process lifecycle, context switching, and inter-process communication

Operating Systems Overview Return to the OS overview for the complete topic map

« PreviousCPU Scheduling Next »Deadlocks & Synchronization