Skip to content

Hashing & Digital Signatures

Hashing is a one-way function that takes an input of any size and produces a fixed-size output (the hash or digest). Unlike encryption, hashing cannot be reversed — you cannot recover the original input from the hash. This property makes hashing essential for password storage, data integrity verification, and digital signatures.


Cryptographic Hash Functions

A cryptographic hash function must satisfy several key properties:

PropertyDescriptionWhat It Prevents
DeterministicSame input always produces the same outputNothing (required for utility)
Fixed output sizeAny input produces a hash of the same lengthN/A (design property)
Preimage resistanceGiven a hash, it is infeasible to find the original inputReversing password hashes
Second preimage resistanceGiven an input, it is infeasible to find a different input with the same hashForging documents
Collision resistanceIt is infeasible to find ANY two inputs with the same hashCreating fraudulent certificates
Avalanche effectA tiny change in input produces a completely different hashDetecting even single-bit changes
Input: "Hello, World!"
SHA-256: dffd6021bb2bd5b0af676290809ec3a53191dd81c7f70a4b28688a362182986f
Input: "Hello, World?" (only the last character changed)
SHA-256: 287ecf3a9a38cf8da72e133afdb8daefe13a1f0d536b15a6e093a7ad73557fc4
Completely different output from a one-character change (avalanche effect).

Common Hash Functions

AlgorithmOutput SizeStatusUse Case
MD5128 bitsBroken (collisions found)Legacy checksums only
SHA-1160 bitsBroken (collisions demonstrated in 2017)Legacy, being phased out
SHA-256256 bitsSecureGeneral purpose, certificates, blockchain
SHA-384384 bitsSecureHigher security requirements
SHA-512512 bitsSecureHigh-security applications
SHA-3 (Keccak)224-512 bitsSecureAlternative to SHA-2 family
BLAKE21-64 bytesSecureFast hashing, file integrity
BLAKE3256 bitsSecureExtremely fast, parallelizable

Hashing in Practice

import hashlib
# SHA-256 hashing
message = b"Transfer $10,000 to account 12345"
digest = hashlib.sha256(message).hexdigest()
print(f"SHA-256: {digest}")
# Verify integrity
received_message = b"Transfer $10,000 to account 12345"
is_intact = hashlib.sha256(received_message).hexdigest() == digest
print(f"Integrity: {'VALID' if is_intact else 'TAMPERED'}")
# Detect tampering
tampered_message = b"Transfer $99,999 to account 12345"
is_tampered = hashlib.sha256(tampered_message).hexdigest() != digest
print(f"Tamper detected: {is_tampered}")
# File hashing (for verifying downloads)
def hash_file(filepath: str) -> str:
"""Compute SHA-256 hash of a file in chunks."""
sha256 = hashlib.sha256()
with open(filepath, "rb") as f:
while chunk := f.read(8192):
sha256.update(chunk)
return sha256.hexdigest()

HMAC (Hash-based Message Authentication Code)

A plain hash verifies integrity (data was not changed) but not authenticity (data came from a trusted source). An attacker can change the message and recompute the hash. HMAC solves this by incorporating a secret key into the hash computation.

Plain Hash (no authentication):
Attacker intercepts: message + hash
Attacker modifies message, recomputes hash
Receiver cannot detect the forgery
HMAC (authenticated):
HMAC = Hash(key || Hash(key || message))
Attacker cannot recompute HMAC without the secret key
Receiver verifies: recompute HMAC with shared key and compare
┌──────────┐ ┌──────────┐
│ Message │───▶│ │
│ │ │ HMAC │───▶ Authentication Tag
│ Secret │───▶│ Function │ (fixed size, e.g., 256 bits)
│ Key │ │ │
└──────────┘ └──────────┘

HMAC Use Cases

Use CaseHow HMAC Is Used
API authenticationClient signs requests with a shared secret; server verifies
JWT signingHS256 algorithm uses HMAC-SHA256 to sign tokens
Webhook verificationService sends HMAC of payload; receiver verifies authenticity
Cookie integrityServer HMACs cookie values to detect client-side tampering
Message authenticationSender attaches HMAC; receiver verifies before processing
import hmac
import hashlib
# Shared secret between sender and receiver
secret_key = b"super-secret-api-key-2024"
# Sender: create HMAC for a message
message = b'{"action": "transfer", "amount": 10000}'
tag = hmac.new(secret_key, message, hashlib.sha256).hexdigest()
print(f"HMAC tag: {tag}")
# Receiver: verify the HMAC
received_message = b'{"action": "transfer", "amount": 10000}'
expected_tag = hmac.new(
secret_key, received_message, hashlib.sha256
).hexdigest()
# Use constant-time comparison to prevent timing attacks
is_valid = hmac.compare_digest(tag, expected_tag)
print(f"HMAC valid: {is_valid}")
# Detect tampering
tampered = b'{"action": "transfer", "amount": 99999}'
tampered_tag = hmac.new(
secret_key, tampered, hashlib.sha256
).hexdigest()
is_tampered = not hmac.compare_digest(tag, tampered_tag)
print(f"Tamper detected: {is_tampered}")

Password Hashing

Storing passwords requires a fundamentally different approach than general-purpose hashing. Password hashes must be slow by design to resist brute-force attacks.

Why SHA-256 Is Wrong for Passwords

FactorSHA-256bcrypt/Argon2
SpeedBillions of hashes/second on a GPUThousands of hashes/second (by design)
SaltMust be added manuallyBuilt-in, automatic
Cost factorFixedTunable (increase over time as hardware improves)
Memory usageMinimalConfigurable (Argon2) — resists GPU attacks
Brute-force 8-char passwordSeconds to minutesYears to centuries
Attacker with SHA-256:
10 billion hashes/second (modern GPU)
8-character password (lowercase + digits) = 36^8 = 2.8 trillion combinations
Time to crack: ~280 seconds (under 5 minutes)
Attacker with bcrypt (cost=12):
~1,000 hashes/second (same GPU, bcrypt is intentionally slow)
Time to crack: ~2.8 billion seconds = ~89 YEARS

Password Hashing Algorithms

AlgorithmMemory-HardRecommendedNotes
Argon2idYesBest choiceWinner of the Password Hashing Competition (2015)
bcryptNoGoodWidely supported, battle-tested since 1999
scryptYesGoodMemory-hard, but more complex to tune
PBKDF2NoAcceptableNIST approved, but not memory-hard
SHA-256 (raw)NoNever for passwordsFar too fast
MD5NoNeverBroken, absurdly fast

Password Hashing in Practice

# --- Argon2id (recommended) ---
from argon2 import PasswordHasher
ph = PasswordHasher(
time_cost=3, # Number of iterations
memory_cost=65536, # 64 MB of memory
parallelism=4, # Number of parallel threads
)
# Hash a password (salt is generated automatically)
password = "correct-horse-battery-staple"
hashed = ph.hash(password)
print(f"Argon2id hash: {hashed}")
# Output: $argon2id$v=19$m=65536,t=3,p=4$...
# Verify a password
try:
ph.verify(hashed, password)
print("Password is correct")
except Exception:
print("Password is incorrect")
# Check if rehashing is needed (cost params changed)
if ph.check_needs_rehash(hashed):
hashed = ph.hash(password) # Rehash with new params
# --- bcrypt (widely available alternative) ---
import bcrypt
password_bytes = b"correct-horse-battery-staple"
# Hash with automatic salt generation
# cost factor 12 = 2^12 = 4096 iterations
salt = bcrypt.gensalt(rounds=12)
hashed_bcrypt = bcrypt.hashpw(password_bytes, salt)
print(f"bcrypt hash: {hashed_bcrypt.decode()}")
# Verify
is_valid = bcrypt.checkpw(password_bytes, hashed_bcrypt)
print(f"Password valid: {is_valid}")

Password Storage Checklist

PracticeWhy
Use Argon2id or bcryptIntentionally slow, resists GPU attacks
Never store plain textA single breach exposes every user
Never use reversible encryptionAttacker with the key gets all passwords
Use unique saltsPrevents rainbow table and batch attacks
Tune cost parametersTarget 250ms-1s per hash on your hardware
Increase cost over timeHardware gets faster; rehash on login
Enforce strong passwordsCheck against breached password lists (e.g., HaveIBeenPwned)
Implement rate limitingPrevent online brute-force attacks

Digital Signatures

A digital signature proves that a message was created by a specific sender (authentication) and has not been modified (integrity), and the sender cannot deny creating it (non-repudiation).

┌──────────────────────────────────────────────────────────┐
│ Digital Signature │
│ │
│ Signing (Sender): │
│ 1. Hash the message: digest = SHA-256(message) │
│ 2. Encrypt the hash with sender's PRIVATE key: │
│ signature = Sign(digest, private_key) │
│ 3. Send: message + signature │
│ │
│ Verification (Receiver): │
│ 1. Hash the received message: digest = SHA-256(message) │
│ 2. Decrypt the signature with sender's PUBLIC key: │
│ original_digest = Verify(signature, public_key) │
│ 3. Compare: digest == original_digest │
│ If equal → message is authentic and unmodified │
│ If not → message was tampered or sender is fake │
│ │
│ ┌──────────┐ Private Key ┌───────────┐ │
│ │ Message │──────────────▶│ Signature │ │
│ │ (hashed) │ SIGN │ │ │
│ └──────────┘ └───────────┘ │
│ │ │ │
│ │ Public Key │ │
│ └─────────VERIFY───────────┘ │
│ Match? → Valid signature │
└──────────────────────────────────────────────────────────┘

Signing vs Encryption

PropertyEncryptionDigital Signature
PurposeConfidentiality (hide content)Authentication and integrity
Who uses the private keyRecipient (to decrypt)Sender (to sign)
Who uses the public keySender (to encrypt)Receiver (to verify)
Non-repudiationNoYes

Digital Signatures in Practice

from cryptography.hazmat.primitives.asymmetric import ed25519
# Generate Ed25519 key pair (fast, secure, simple)
private_key = ed25519.Ed25519PrivateKey.generate()
public_key = private_key.public_key()
# Sign a message
message = b"Release v2.1.0 is approved for production deployment"
signature = private_key.sign(message)
print(f"Signature: {signature.hex()[:40]}...")
# Verify the signature (anyone with the public key can verify)
try:
public_key.verify(signature, message)
print("Signature is VALID — message is authentic")
except Exception:
print("Signature is INVALID — message was tampered")
# Tamper detection
tampered = b"Release v2.1.0 is approved for STAGING deployment"
try:
public_key.verify(signature, tampered)
print("Signature is VALID")
except Exception:
print("Signature is INVALID — tampering detected!")
# --- RSA-PSS signatures (for RSA key pairs) ---
from cryptography.hazmat.primitives.asymmetric import rsa, padding, utils
from cryptography.hazmat.primitives import hashes
rsa_private = rsa.generate_private_key(
public_exponent=65537, key_size=2048
)
rsa_public = rsa_private.public_key()
# Sign with RSA-PSS
rsa_signature = rsa_private.sign(
message,
padding.PSS(
mgf=padding.MGF1(hashes.SHA256()),
salt_length=padding.PSS.MAX_LENGTH,
),
hashes.SHA256(),
)
# Verify
rsa_public.verify(
rsa_signature,
message,
padding.PSS(
mgf=padding.MGF1(hashes.SHA256()),
salt_length=padding.PSS.MAX_LENGTH,
),
hashes.SHA256(),
)
print("RSA-PSS signature verified")

Real-World Applications of Digital Signatures

ApplicationHow Signatures Are Used
Code signingOS verifies that software came from the claimed developer
TLS certificatesCA signs the server’s certificate to prove identity
Git commitsGPG/SSH signatures prove who authored a commit
Package managersnpm, pip, apt verify packages are not tampered
Email (S/MIME, PGP)Prove the sender’s identity and message integrity
JWT tokensRS256/ES256 signatures prevent token forgery
BlockchainTransaction signatures prove ownership of funds
PDF documentsDigital signatures for legally binding documents

Quick Reference: Choosing the Right Primitive

NeedPrimitiveAlgorithm
Verify file integrityHashSHA-256
Store user passwordsPassword hashArgon2id or bcrypt
Authenticate API requestsHMACHMAC-SHA256
Prove message authenticityDigital signatureEd25519
Sign software releasesDigital signatureEd25519 or RSA-PSS
Verify data in transitMAC (within TLS)Poly1305 or GCM tag
Check for accidental corruptionHash or CRCSHA-256 or CRC-32

Next Steps