Skip to content

Hash Tables

A hash table (hash map) is a data structure that maps keys to values using a hash function. It provides O(1) average-case time complexity for insert, delete, and search operations.

How Hash Tables Work

  1. Hash Function: Converts a key into an array index
  2. Array Storage: Values are stored at the computed index
  3. Collision Handling: Multiple keys may hash to the same index
Key: "apple" → hash("apple") = 3
Key: "banana" → hash("banana") = 7
Index: 0 1 2 3 4 5 6 7
┌────┬────┬────┬───────┬────┬────┬────┬────────┐
│ │ │ │"apple"│ │ │ │"banana"│
└────┴────┴────┴───────┴────┴────┴────┴────────┘

Time Complexity

OperationAverageWorst
InsertO(1)O(n)
DeleteO(1)O(n)
SearchO(1)O(n)
SpaceO(n)O(n)

Worst case occurs with many collisions (poor hash function or high load factor)

Basic Operations

# Using built-in dict
hash_map = {}
# Insert / Update
hash_map["apple"] = 5
hash_map["banana"] = 3
# Access
count = hash_map["apple"] # 5
count = hash_map.get("grape", 0) # 0 (default if not found)
# Check existence
if "apple" in hash_map:
print("Found!")
# Delete
del hash_map["apple"]
hash_map.pop("banana", None) # Safe delete
# Iterate
for key, value in hash_map.items():
print(f"{key}: {value}")
# Common methods
keys = list(hash_map.keys())
values = list(hash_map.values())
items = list(hash_map.items())

Hash Function

A good hash function should:

  • Be deterministic (same key → same hash)
  • Distribute keys uniformly
  • Be fast to compute
def simple_hash(key: str, size: int) -> int:
"""Simple hash function for strings"""
hash_value = 0
for char in key:
hash_value = (hash_value * 31 + ord(char)) % size
return hash_value
# Example
print(simple_hash("apple", 10)) # Some index 0-9

Collision Handling

1. Chaining (Separate Chaining)

Store colliding elements in a linked list at each index.

class HashTableChaining:
def __init__(self, size=10):
self.size = size
self.table = [[] for _ in range(size)]
def _hash(self, key):
return hash(key) % self.size
def put(self, key, value):
index = self._hash(key)
# Check if key exists
for i, (k, v) in enumerate(self.table[index]):
if k == key:
self.table[index][i] = (key, value)
return
self.table[index].append((key, value))
def get(self, key):
index = self._hash(key)
for k, v in self.table[index]:
if k == key:
return v
return None
def remove(self, key):
index = self._hash(key)
for i, (k, v) in enumerate(self.table[index]):
if k == key:
del self.table[index][i]
return True
return False

2. Open Addressing

Find another empty slot when collision occurs.

class HashTableOpenAddressing:
def __init__(self, size=10):
self.size = size
self.keys = [None] * size
self.values = [None] * size
def _hash(self, key):
return hash(key) % self.size
def _probe(self, index):
"""Linear probing"""
return (index + 1) % self.size
def put(self, key, value):
index = self._hash(key)
while self.keys[index] is not None:
if self.keys[index] == key:
self.values[index] = value
return
index = self._probe(index)
self.keys[index] = key
self.values[index] = value
def get(self, key):
index = self._hash(key)
while self.keys[index] is not None:
if self.keys[index] == key:
return self.values[index]
index = self._probe(index)
return None

Load Factor & Rehashing

Load Factor = number of entries / table size

When load factor exceeds threshold (typically 0.75), rehash:

  1. Create larger table (usually 2x)
  2. Reinsert all elements
class DynamicHashTable:
def __init__(self):
self.size = 8
self.count = 0
self.load_threshold = 0.75
self.table = [[] for _ in range(self.size)]
def _rehash(self):
old_table = self.table
self.size *= 2
self.table = [[] for _ in range(self.size)]
self.count = 0
for bucket in old_table:
for key, value in bucket:
self.put(key, value)
def put(self, key, value):
if self.count / self.size >= self.load_threshold:
self._rehash()
# ... rest of put implementation

Common Patterns

1. Frequency Counter

Count occurrences of elements.

from collections import Counter
def frequency_count(arr):
"""
Count frequency of each element.
Time: O(n), Space: O(n)
Algorithm:
1. Create empty hash map
2. Iterate through array
3. For each element, increment its count
"""
freq = {}
for num in arr:
freq[num] = freq.get(num, 0) + 1
return freq
# Or use Counter (built-in)
freq = Counter([1, 2, 2, 3, 3, 3])
# Counter({3: 3, 2: 2, 1: 1})
# Most common elements
most_common = freq.most_common(2) # [(3, 3), (2, 2)]
# Example usage
arr = [1, 2, 2, 3, 3, 3, 4]
result = frequency_count(arr)
# {1: 1, 2: 2, 3: 3, 4: 1}

2. Two Sum

Find two numbers that sum to target.

Algorithm Visualization:
nums = [2, 7, 11, 15], target = 9
Step 1: num = 2
complement = 9 - 2 = 7
seen = {} → 7 not found
seen = {2: 0}
Step 2: num = 7
complement = 9 - 7 = 2
seen = {2: 0} → 2 FOUND at index 0!
Return [0, 1]
def two_sum(nums: list, target: int) -> list:
"""
Find indices of two numbers summing to target.
Time: O(n), Space: O(n)
Algorithm:
1. Create hash map to store value -> index
2. For each number, calculate complement (target - num)
3. If complement exists in map, return both indices
4. Otherwise, store current number and index
"""
seen = {} # value -> index
for i, num in enumerate(nums):
complement = target - num
if complement in seen:
return [seen[complement], i]
seen[num] = i
return []
# Example: nums = [2, 7, 11, 15], target = 9
# Returns: [0, 1] (2 + 7 = 9)

3. Group Anagrams

Group strings by their sorted characters.

Algorithm Visualization:
Input: ["eat", "tea", "tan", "ate", "nat", "bat"]
Step 1: "eat" → sorted = "aet" → groups["aet"] = ["eat"]
Step 2: "tea" → sorted = "aet" → groups["aet"] = ["eat", "tea"]
Step 3: "tan" → sorted = "ant" → groups["ant"] = ["tan"]
Step 4: "ate" → sorted = "aet" → groups["aet"] = ["eat", "tea", "ate"]
Step 5: "nat" → sorted = "ant" → groups["ant"] = ["tan", "nat"]
Step 6: "bat" → sorted = "abt" → groups["abt"] = ["bat"]
Output: [["eat", "tea", "ate"], ["tan", "nat"], ["bat"]]
from collections import defaultdict
def group_anagrams(strs: list) -> list:
"""
Group anagrams together.
Time: O(n * k log k) where k is max string length
Space: O(n * k)
Algorithm:
1. Create hash map with sorted string as key
2. For each string, sort it to create key
3. Add original string to list at that key
4. Return all values (groups)
"""
groups = defaultdict(list)
for s in strs:
# Sort characters to create canonical form
key = tuple(sorted(s))
groups[key].append(s)
return list(groups.values())
# Alternative: O(n * k) using character count as key
def group_anagrams_optimal(strs: list) -> list:
groups = defaultdict(list)
for s in strs:
# Count characters (26 lowercase letters)
count = [0] * 26
for c in s:
count[ord(c) - ord('a')] += 1
key = tuple(count)
groups[key].append(s)
return list(groups.values())

4. Subarray Sum Equals K

Count subarrays with sum equal to k using prefix sum technique.

Algorithm Visualization:
nums = [1, 1, 1], k = 2
Index: 0 1 2
nums: 1 1 1
prefix: 1 2 3
Step 1: prefix = 1, need prefix - k = -1
map = {0: 1} → -1 not found
map = {0: 1, 1: 1}
count = 0
Step 2: prefix = 2, need prefix - k = 0
map = {0: 1, 1: 1} → 0 FOUND (1 time)
Subarray [0..1] sums to 2 ✓
map = {0: 1, 1: 1, 2: 1}
count = 1
Step 3: prefix = 3, need prefix - k = 1
map has 1 (1 time)
Subarray [1..2] sums to 2 ✓
count = 2
Answer: 2 subarrays sum to k
def subarray_sum(nums: list, k: int) -> int:
"""
Count subarrays summing to k.
Time: O(n), Space: O(n)
Algorithm:
1. Use prefix sum: sum of subarray [i..j] = prefix[j] - prefix[i-1]
2. If prefix[j] - k = prefix[i-1], then subarray [i..j] sums to k
3. Use hash map to count prefix sums seen so far
4. For each prefix sum, check if (prefix - k) was seen before
"""
count = 0
prefix_sum = 0
prefix_counts = {0: 1} # Handle subarrays starting at index 0
for num in nums:
prefix_sum += num
# If (prefix_sum - k) exists, we found subarrays
count += prefix_counts.get(prefix_sum - k, 0)
# Store current prefix sum
prefix_counts[prefix_sum] = prefix_counts.get(prefix_sum, 0) + 1
return count
# Example: nums = [1, 1, 1], k = 2
# Returns: 2 (subarrays [0,1] and [1,2])

5. LRU Cache

Least Recently Used cache with O(1) operations.

LRU Cache Structure (capacity = 3):
Hash Map + Doubly Linked List:
HashMap Doubly Linked List
┌─────────────┐ (Most Recent → Least Recent)
│ key1 → node1│
│ key2 → node2│────────→ [head] ↔ [node3] ↔ [node2] ↔ [node1] ↔ [tail]
│ key3 → node3│ newest oldest
└─────────────┘
Operations:
- get(key): O(1) - lookup in map, move to front
- put(key,val): O(1) - add to front, evict from back if full
from collections import OrderedDict
class LRUCache:
"""
LRU Cache using OrderedDict.
Time: O(1) for get and put
Space: O(capacity)
"""
def __init__(self, capacity: int):
self.capacity = capacity
self.cache = OrderedDict()
def get(self, key: int) -> int:
if key not in self.cache:
return -1
# Move to end (most recently used)
self.cache.move_to_end(key)
return self.cache[key]
def put(self, key: int, value: int) -> None:
if key in self.cache:
self.cache.move_to_end(key)
self.cache[key] = value
if len(self.cache) > self.capacity:
# Remove least recently used (first item)
self.cache.popitem(last=False)
# Implementation with manual doubly linked list
class Node:
def __init__(self, key=0, val=0):
self.key = key
self.val = val
self.prev = None
self.next = None
class LRUCacheManual:
def __init__(self, capacity: int):
self.capacity = capacity
self.cache = {} # key -> Node
# Dummy head and tail
self.head = Node()
self.tail = Node()
self.head.next = self.tail
self.tail.prev = self.head
def _remove(self, node):
node.prev.next = node.next
node.next.prev = node.prev
def _add_to_front(self, node):
node.next = self.head.next
node.prev = self.head
self.head.next.prev = node
self.head.next = node
def get(self, key: int) -> int:
if key not in self.cache:
return -1
node = self.cache[key]
self._remove(node)
self._add_to_front(node)
return node.val
def put(self, key: int, value: int) -> None:
if key in self.cache:
self._remove(self.cache[key])
node = Node(key, value)
self.cache[key] = node
self._add_to_front(node)
if len(self.cache) > self.capacity:
lru = self.tail.prev
self._remove(lru)
del self.cache[lru.key]

6. First Unique Character

Find first non-repeating character.

Algorithm Visualization:
s = "leetcode"
Step 1: Count frequencies
l: 1, e: 3, t: 1, c: 1, o: 1, d: 1
Step 2: Find first char with count = 1
Index 0: 'l' → count = 1 → FOUND!
Return: 0
s = "aabb"
Frequencies: a: 2, b: 2
No char with count = 1
Return: -1
def first_unique_char(s: str) -> int:
"""
Return index of first non-repeating character.
Time: O(n), Space: O(1) - limited alphabet (26 letters)
Algorithm:
1. First pass: count frequency of each character
2. Second pass: find first character with count = 1
"""
freq = {}
# Count frequencies
for char in s:
freq[char] = freq.get(char, 0) + 1
# Find first unique
for i, char in enumerate(s):
if freq[char] == 1:
return i
return -1
# Using Counter
from collections import Counter
def first_unique_char_v2(s: str) -> int:
freq = Counter(s)
for i, char in enumerate(s):
if freq[char] == 1:
return i
return -1
# Example: "leetcode" → 0 (first 'l' is unique)
# Example: "loveleetcode" → 2 (first 'v' is unique)

HashSet

A set stores unique elements with O(1) lookup.

# Create set
s = set()
s = {1, 2, 3}
# Add element
s.add(4)
# Remove element
s.remove(4) # Raises KeyError if not found
s.discard(4) # Silent if not found
# Check membership
if 2 in s:
print("Found!")
# Set operations
a = {1, 2, 3}
b = {2, 3, 4}
union = a | b # {1, 2, 3, 4}
intersection = a & b # {2, 3}
difference = a - b # {1}
symmetric_diff = a ^ b # {1, 4}

Applications

  1. Caching - LRU cache, memoization
  2. Indexing - Database indexing, search engines
  3. Counting - Frequency analysis, histograms
  4. Deduplication - Remove duplicates efficiently
  5. Symbol Tables - Compilers, interpreters
  6. Routing Tables - Network routing

Practice Problems

Easy

ProblemPatternCompanies
Two SumHash MapAmazon, Google, Meta
Valid AnagramFrequency CountAmazon, Microsoft
Contains DuplicateHash SetAmazon, Apple
First Unique CharacterFrequency CountAmazon, Bloomberg

Medium

ProblemPatternCompanies
Group AnagramsHash by Sorted KeyAmazon, Meta
Subarray Sum Equals KPrefix Sum + HashGoogle, Meta
Longest Consecutive SequenceHash SetGoogle, Amazon
LRU CacheHash + Doubly Linked ListAmazon, Google, Meta

Hard

ProblemPatternCompanies
LFU CacheHash + Frequency BucketsAmazon, Google
Substring with ConcatenationSliding Window + HashAmazon
Minimum Window SubstringSliding Window + HashMeta, Google

Key Takeaways

  1. O(1) average case - Hash tables excel at lookups
  2. Good hash function - Key to uniform distribution
  3. Handle collisions - Chaining or open addressing
  4. Watch load factor - Rehash when too full
  5. HashSet for uniqueness - O(1) membership testing

Next Steps