Memory Management
Memory management is one of the most critical functions of an operating system. The OS must efficiently allocate and deallocate memory for processes, protect processes from accessing each otherβs memory, and provide the illusion that each process has a large, contiguous address space β even when physical memory is limited and fragmented. Understanding these concepts is essential for debugging performance issues, optimizing systems, and passing OS exams and interviews.
Memory Hierarchy
Modern computer systems use a hierarchy of memory types, trading off speed, size, and cost.
βββββββββββββ β CPU Regs β < 1 ns ~1 KB β β βββββββββββββ€ β L1 Cache β ~1 ns ~64 KB β β βββββββββββββ€ β L2 Cache β ~4 ns ~256 KB β β βββββββββββββ€ β L3 Cache β ~10 ns ~8-32 MB β β βββββββββββββ€ β Main β ~100 ns ~8-64 GB β Memory β β (RAM) β βββββββββββββ€ β SSD β ~100 us ~256 GB-4 TB β β βββββββββββββ€ β HDD β ~10 ms ~1-20 TB β β βββββββββββββ
Faster, smaller, Slower, larger, more expensive cheaperThe OS primarily manages the interaction between main memory (RAM) and secondary storage (disk/SSD), using techniques like virtual memory and paging.
Address Binding
Programs reference memory through addresses. The binding of program instructions and data to actual memory addresses can happen at three stages:
| Stage | When | Flexibility | Example |
|---|---|---|---|
| Compile Time | Addresses determined at compile time | None β must recompile to change | MS-DOS .COM programs |
| Load Time | Addresses determined when program is loaded into memory | Moderate β can load at different addresses | Relocatable code |
| Execution Time | Addresses translated on every memory access at runtime | Maximum β process can be moved during execution | All modern OSes use this |
Modern systems use execution-time binding with hardware support from the Memory Management Unit (MMU).
Logical vs Physical Addresses
| Concept | Logical (Virtual) Address | Physical Address |
|---|---|---|
| Generated by | CPU during program execution | Memory unit (actual RAM location) |
| Visible to | User programs | Hardware only |
| Range | 0 to max virtual address space | 0 to max physical memory |
| Translation | MMU translates logical β physical | Hardware operates on physical addresses |
βββββββββββ Logical βββββββββββ Physical βββββββββββββ CPU βββββAddressββββββββ>β MMU βββββAddressββββββββ>β Memory ββ β (virtual) β β (physical) β (RAM) ββββββββββββ βββββββββββ ββββββββββββ
Example with base register (simplest form): Logical address: 346 Base register: 14000 Physical address: 14000 + 346 = 14346Memory Allocation Strategies
When a process needs a contiguous block of memory, the OS must find a suitable free block (hole). Three common strategies:
Memory state:ββββββββββ¬βββββββ¬βββββββββ¬βββββββ¬βββββββββ¬ββββββββ In Use β Free β In Use β Free β In Use β Free ββ 8 KB β 20KB β 6 KB β 12KB β 4 KB β 18KB βββββββββββ΄βββββββ΄βββββββββ΄βββββββ΄βββββββββ΄βββββββ
Request: 15 KB block| Strategy | Description | Selection | Result |
|---|---|---|---|
| First Fit | Allocate the first hole that is big enough | Scan from start, pick first fit | Uses 20 KB hole (fast, leaves 5 KB fragment) |
| Best Fit | Allocate the smallest hole that is big enough | Search all holes, pick tightest fit | Uses 18 KB hole (leaves 3 KB fragment) |
| Worst Fit | Allocate the largest hole | Search all holes, pick largest | Uses 20 KB hole (leaves 5 KB fragment) |
First Fit is generally the fastest and produces results comparable to Best Fit. Best Fit tends to produce the smallest leftover fragments. Worst Fit performs poorly in practice.
Fragmentation
| Type | Description | Cause | Solution |
|---|---|---|---|
| External Fragmentation | Total free memory is sufficient, but it is not contiguous | Repeated allocation and deallocation of variable-size blocks | Compaction, paging |
| Internal Fragmentation | Allocated memory is slightly larger than requested; the extra space is wasted | Fixed-size allocation units (e.g., pages) | Smaller allocation units (trade-off with overhead) |
External Fragmentation:ββββββββ¬βββββββ¬βββββββ¬βββββββ¬βββββββ¬βββββββ¬ββββββββ Used β Free β Used β Free β Used β Free β Used ββ 4KB β 3KB β 6KB β 4KB β 2KB β 5KB β 8KB βββββββββ΄βββββββ΄βββββββ΄βββββββ΄βββββββ΄βββββββ΄βββββββTotal free: 3+4+5 = 12 KB but largest contiguous block = 5 KBRequest for 10 KB fails despite 12 KB free!
Internal Fragmentation:ββββββββββββββββββββββββββββββββββββ Page size = 4 KB ββ Process needs 14,500 bytes ββ Allocated: 4 pages = 16,384 B ββ Wasted: 16,384 - 14,500 = 1,884 bytes (internal fragmentation) ββββββββββββββββββββββββββββββββββββPaging
Paging eliminates external fragmentation by dividing memory into fixed-size blocks:
- Logical memory is divided into fixed-size blocks called pages
- Physical memory is divided into fixed-size blocks called frames (same size as pages)
- A page table maps each page to its corresponding frame
Typical page size: 4 KB (most common on x86 systems).
Logical Address Space Physical Memory (RAM)(Process View) (Actual Hardware)
ββββββββββββ Page 0 ββββββββββββ Frame 0β Code ββββββββββββββββ β OS Data βββββββββββββ€ Page 1 β ββββββββββββ€ Frame 1β Data ββββββββββββ β β Page 2 βββββββββββββββββββββββββββ€ Page 2 β β ββββββββββββ€ Frame 2 ββ Heap ββββββββ β β β Page 0 βββββββ βββββββββββββ€ β β β ββββββββββββ€ β ββ Stack ββββ β β β β (other) β β βββββββββββββ β β β β ββββββββββββ€ β β β β β β β Page 1 βββββ β β Page Table β β β β ββββββββββββ€ β β β βββββ¬ββββββ β β β β β Page 3 βββ β β β β 0 β 2 βββΌββββΌββββΌββββ ββββββββββββ€ β β β β βββββΌββββββ€ β β β β (other) β β β β β β 1 β 4 βββΌββββΌββββ ββββββββββββ β β β β βββββΌββββββ€ β β β β β β β 2 β 1 βββΌββββ Frame 5β β Frame 4β βββββΌββββββ€ β β β β β 3 β 5 βββ β β β βββββ΄ββββββ β β β β β β β βββββββββββββββββββββββββββββββββββββββββββ ββββββββββAddress Translation with Paging
A logical address is split into two parts:
Logical Address (m bits):βββββββββββββββββββββββ¬ββββββββββββββββββββ Page Number β Page Offset ββ (p bits) β (d bits) βββββββββββββ¬βββββββββββ΄βββββββββ¬ββββββββββ β β v β βββββββββββ β β Page β β β Table β β ββββββ¬βββββ β β β v vβββββββββββββββββββββββ¬ββββββββββββββββββββ Frame Number β Page Offset ββ (f bits) β (d bits) ββββββββββββββββββββββββ΄βββββββββββββββββββPhysical AddressExample: Page size = 4 KB (2^12), logical address = 13,500
- Page number = 13,500 / 4,096 = 3 (integer division)
- Page offset = 13,500 % 4,096 = 1,212
- Look up page 3 in page table, find frame number (e.g., frame 5)
- Physical address = (5 * 4,096) + 1,212 = 21,692
Translation Lookaside Buffer (TLB)
Every memory access in a paged system requires two memory accesses: one to read the page table, one to access the actual data. This doubles memory access time. The TLB is a fast, hardware cache that stores recent page table entries.
ββββββββββββββββββββββββββββ β CPU generates logical β β address (page, offset) β ββββββββββββββ¬ββββββββββββββ β v ββββββββββββββββββββββββββββ β Check TLB for page β β number β ββββββββ¬βββββββββββ¬βββββββββ β β TLB Hit TLB Miss (fast!) (slower) β β β v β ββββββββββββββββ β β Access page β β β table in RAM β β ββββββββ¬ββββββββ β β β β Update TLB β β v v ββββββββββββββββββββββββββββ β Use frame number + β β offset to access β β physical memory β ββββββββββββββββββββββββββββTLB Performance
- TLB hit ratio (h): Fraction of address translations found in TLB (typically 95-99%)
- TLB access time: ~1 ns
- Memory access time: ~100 ns
Effective Access Time (EAT):
EAT = h * (TLB_time + memory_time) + (1 - h) * (TLB_time + 2 * memory_time)
Example with h = 0.98:EAT = 0.98 * (1 + 100) + 0.02 * (1 + 200) = 0.98 * 101 + 0.02 * 201 = 98.98 + 4.02 = 103 ns (only 3% overhead vs no paging!)Segmentation
Segmentation divides the logical address space into variable-sized segments based on the logical structure of a program (code, data, stack, heap, etc.).
Logical Address Space (Segmented) Physical Memory
Segment 0: Code (4 KB) ββββββββββββββββSegment 1: Data (6 KB) β βSegment 2: Stack (8 KB) ββββββββββββββββ€Segment 3: Heap (10 KB) β Segment 0 β (4 KB) ββββββββββββββββ€Segment Table: β ββββββββ¬βββββββ¬ββββββββ ββββββββββββββββ€β Seg β Base β Limit β β Segment 2 β (8 KB)βββββββΌβββββββΌββββββββ€ β ββ 0 β 1400 β 4000 β ββββββββββββββββ€β 1 β 6300 β 6000 β β ββ 2 β 4400 β 8000 β ββββββββββββββββ€β 3 β 8700 β 10000 β β Segment 1 β (6 KB)βββββββ΄βββββββ΄ββββββββ ββββββββββββββββ€ β βLogical address: (segment, offset) ββββββββββββββββ€Physical address = base[segment] + offset β Segment 3 β (10 KB)(if offset < limit; else β trap) β β ββββββββββββββββPaging vs Segmentation
| Feature | Paging | Segmentation |
|---|---|---|
| Block size | Fixed (e.g., 4 KB pages) | Variable (segments vary in size) |
| External fragmentation | None | Yes |
| Internal fragmentation | Yes (last page may be partially filled) | None |
| Programmer visibility | Transparent to programmer | Programmer-aware (code, data, stack) |
| Sharing | Page-level sharing | Segment-level sharing (more natural) |
Modern systems like x86-64 primarily use paging (segmentation is largely vestigial in 64-bit mode).
Virtual Memory
Virtual memory allows processes to use more memory than is physically available. Not all pages of a process need to be in RAM simultaneously β only the pages currently being accessed. Pages that are not in RAM are stored on disk in a swap space (or page file).
Benefits
- Programs can be larger than physical memory
- More processes can run concurrently (each only needs a fraction of their pages in RAM)
- Simplifies memory allocation (every process gets a clean 0-to-max address space)
- Enables memory-mapped files and shared libraries
Demand Paging
With demand paging, a page is loaded into memory only when it is accessed (not preloaded). If a process accesses a page that is not in RAM, a page fault occurs.
Page Fault Handling:
1. CPU generates address β MMU checks page table2. Page table entry has "valid bit = 0" β PAGE FAULT (trap to OS)3. OS checks: is this a valid address? - No β segmentation fault (terminate process) - Yes β page is on disk4. OS finds a free frame in RAM (if no free frame β run page replacement algorithm)5. OS reads the page from disk into the free frame6. OS updates the page table (set valid bit = 1, record frame number)7. Restart the instruction that caused the page faultPage Fault Performance
Page faults are expensive because disk I/O is involved (milliseconds vs nanoseconds for RAM).
Effective Access Time with page faults:
EAT = (1 - p) * memory_access_time + p * page_fault_time
Where: p = page fault rate (0 β€ p β€ 1) memory_access_time β 100 ns page_fault_time β 8 ms (= 8,000,000 ns)
Example with p = 0.001 (1 in 1000 accesses):EAT = 0.999 * 100 + 0.001 * 8,000,000 = 99.9 + 8,000 = 8,099.9 ns (80x slower than no page faults!)
To keep slowdown below 10%: 110 > (1-p)*100 + p*8,000,000 10 > p*7,999,900 p < 0.00000125 (about 1 fault per 800,000 accesses)Page Replacement Algorithms
When a page fault occurs and there are no free frames, the OS must choose a victim page to evict from RAM. The goal is to minimize the number of future page faults.
FIFO (First-In, First-Out)
Replace the oldest page in memory (the one that has been in RAM the longest).
Example: 3 frames, reference string: 7, 0, 1, 2, 0, 3, 0, 4, 2, 3, 0, 3, 2, 1, 2, 0, 1, 7, 0, 1
Ref: 7 0 1 2 0 3 0 4 2 3 0 3 2 1 2 0 1 7 0 1 ββββ¬βββ¬βββ¬βββ¬βββ¬βββ¬βββ¬βββ¬βββ¬βββ¬βββ¬βββ¬βββ¬βββ¬βββ¬βββ¬βββ¬βββ¬βββ¬βββF1: β7 β7 β7 β2 β2 β2 β2 β4 β4 β4 β0 β0 β0 β0 β0 β0 β0 β7 β7 β7 β ββββΌβββΌβββΌβββΌβββΌβββΌβββΌβββΌβββΌβββΌβββΌβββΌβββΌβββΌβββΌβββΌβββΌβββΌβββΌβββ€F2: β β0 β0 β0 β0 β3 β3 β3 β2 β2 β2 β2 β2 β1 β1 β1 β1 β1 β0 β0 β ββββΌβββΌβββΌβββΌβββΌβββΌβββΌβββΌβββΌβββΌβββΌβββΌβββΌβββΌβββΌβββΌβββΌβββΌβββΌβββ€F3: β β β1 β1 β1 β1 β0 β0 β0 β3 β3 β3 β3 β3 β2 β2 β2 β2 β2 β1 β ββββ΄βββ΄βββ΄βββ΄βββ΄βββ΄βββ΄βββ΄βββ΄βββ΄βββ΄βββ΄βββ΄βββ΄βββ΄βββ΄βββ΄βββ΄βββ΄βββFault: * * * * * * * * * * * * * * *
Total page faults: 15FIFO anomaly (Beladyβs anomaly): With FIFO, increasing the number of frames can sometimes increase page faults! This counter-intuitive behavior does not occur with LRU or Optimal.
Optimal (OPT)
Replace the page that will not be used for the longest time in the future. This is provably optimal (fewest page faults possible) but requires future knowledge, so it is used only as a benchmark.
Example: Same reference string, 3 frames
Ref: 7 0 1 2 0 3 0 4 2 3 0 3 2 1 2 0 1 7 0 1 ββββ¬βββ¬βββ¬βββ¬βββ¬βββ¬βββ¬βββ¬βββ¬βββ¬βββ¬βββ¬βββ¬βββ¬βββ¬βββ¬βββ¬βββ¬βββ¬βββF1: β7 β7 β7 β2 β2 β2 β2 β2 β2 β2 β2 β2 β2 β2 β2 β2 β2 β7 β7 β7 β ββββΌβββΌβββΌβββΌβββΌβββΌβββΌβββΌβββΌβββΌβββΌβββΌβββΌβββΌβββΌβββΌβββΌβββΌβββΌβββ€F2: β β0 β0 β0 β0 β0 β0 β4 β4 β4 β0 β0 β0 β0 β0 β0 β0 β0 β0 β0 β ββββΌβββΌβββΌβββΌβββΌβββΌβββΌβββΌβββΌβββΌβββΌβββΌβββΌβββΌβββΌβββΌβββΌβββΌβββΌβββ€F3: β β β1 β1 β1 β3 β3 β3 β3 β3 β3 β3 β3 β1 β1 β1 β1 β1 β1 β1 β ββββ΄βββ΄βββ΄βββ΄βββ΄βββ΄βββ΄βββ΄βββ΄βββ΄βββ΄βββ΄βββ΄βββ΄βββ΄βββ΄βββ΄βββ΄βββ΄βββFault: * * * * * * * * *
Total page faults: 9 (optimal -- no algorithm can do better with 3 frames)LRU (Least Recently Used)
Replace the page that has not been used for the longest time (uses past behavior to predict future). LRU is a good practical approximation of Optimal.
Example: Same reference string, 3 frames
Ref: 7 0 1 2 0 3 0 4 2 3 0 3 2 1 2 0 1 7 0 1 ββββ¬βββ¬βββ¬βββ¬βββ¬βββ¬βββ¬βββ¬βββ¬βββ¬βββ¬βββ¬βββ¬βββ¬βββ¬βββ¬βββ¬βββ¬βββ¬βββF1: β7 β7 β7 β2 β2 β2 β2 β4 β4 β4 β0 β0 β0 β1 β1 β1 β1 β1 β1 β1 β ββββΌβββΌβββΌβββΌβββΌβββΌβββΌβββΌβββΌβββΌβββΌβββΌβββΌβββΌβββΌβββΌβββΌβββΌβββΌβββ€F2: β β0 β0 β0 β0 β0 β0 β0 β0 β3 β3 β3 β3 β3 β3 β0 β0 β0 β0 β0 β ββββΌβββΌβββΌβββΌβββΌβββΌβββΌβββΌβββΌβββΌβββΌβββΌβββΌβββΌβββΌβββΌβββΌβββΌβββΌβββ€F3: β β β1 β1 β1 β3 β3 β3 β2 β2 β2 β2 β2 β2 β2 β2 β2 β7 β7 β7 β ββββ΄βββ΄βββ΄βββ΄βββ΄βββ΄βββ΄βββ΄βββ΄βββ΄βββ΄βββ΄βββ΄βββ΄βββ΄βββ΄βββ΄βββ΄βββ΄βββFault: * * * * * * * * * * * *
Total page faults: 12Clock Algorithm (Second Chance)
A practical approximation of LRU that uses a circular list and a reference bit. Much cheaper to implement than true LRU.
Clock Algorithm: βββββ βββββββ>β 0 ββββββββ β β P3β β β βββββ β β β ββββ΄β ββββ΄β β 1 β β 0 β β P1β β P5β βββββ βββββ β β β βββββ β βββββββ>β 1 β<ββββββ β P2β βββββ β² clock hand
Algorithm:1. Check page at clock hand2. If reference bit = 0 β replace this page3. If reference bit = 1 β set bit to 0, advance clock hand4. Repeat until a victim is foundThe reference bit is set to 1 by hardware whenever a page is accessed. The clock algorithm gives each page a βsecond chanceβ before eviction.
Page Replacement Comparison
| Algorithm | Page Faults (example above) | Beladyβs Anomaly | Implementation | Performance |
|---|---|---|---|---|
| FIFO | 15 | Yes | Simple queue | Poor |
| Optimal | 9 | No | Requires future knowledge | Best (theoretical) |
| LRU | 12 | No | Stack or counter-based | Good (close to Optimal) |
| Clock | ~12-13 | No | Circular list + ref bit | Good (practical LRU approx) |
Thrashing
Thrashing occurs when a process spends more time paging (swapping pages in and out of disk) than executing. This happens when the process does not have enough frames allocated to hold its working set of pages.
CPU Utilization vs Degree of Multiprogramming:
CPU Utilization ^ β βββββββββ β / β / β / β Optimal point β / β / β / β Thrashing begins here β / \ β / \ β / \ β / ββββββββββ β/ ββββββββββββββββββββββββ> Degree of Multiprogramming (number of processes)Working Set Model
The working set W(t, delta) of a process at time t is the set of pages accessed in the most recent delta time units. The OS should allocate at least enough frames to hold each processβs working set.
- If sum of all working sets > total frames available, the OS should suspend a process to free frames
- The working set window (delta) is typically set to 10,000-100,000 memory references
Worked Example: Address Translation
Given: Page size = 4 KB (4096 bytes), process has 4 pages, and the following page table:
| Page | Frame | Valid |
|---|---|---|
| 0 | 5 | 1 |
| 1 | 3 | 1 |
| 2 | - | 0 |
| 3 | 8 | 1 |
Translate logical address 9000:
- Page number = floor(9000 / 4096) = 2
- Offset = 9000 mod 4096 = 9000 - 2*4096 = 808
- Look up page 2: Valid bit = 0 β PAGE FAULT
- OS loads page 2 from disk, assigns frame (say frame 6), updates table
- Now: Physical address = 6 * 4096 + 808 = 25,384
Translate logical address 5000:
- Page number = floor(5000 / 4096) = 1
- Offset = 5000 mod 4096 = 904
- Look up page 1: Frame = 3, Valid = 1 β OK
- Physical address = 3 * 4096 + 904 = 13,192
Key Takeaways
- The memory hierarchy trades off speed, size, and cost β the OS manages the interaction between RAM and disk
- Paging eliminates external fragmentation by dividing memory into fixed-size pages and frames
- The TLB caches recent page table entries, reducing the double-memory-access penalty to near-zero
- Virtual memory lets processes use more memory than physically available through demand paging
- Page replacement algorithms determine which page to evict on a page fault β LRU is the best practical choice, Optimal is the theoretical benchmark
- FIFO is simple but suffers from Beladyβs anomaly; LRU and Clock do not
- Thrashing occurs when processes do not have enough frames β the working set model helps prevent it
- Address translation: split the logical address into page number and offset, look up the frame in the page table, combine frame and offset