Caching Strategies: The Two Hardest Problems in Computer Science
There are only two hard things in computer science: cache invalidation and naming things. Here's how to get the first one less wrong.
February 23, 2026 · 6 min · 1261 words · Rob Washington
Table of Contents
Phil Karlton’s famous quote about hard problems in computer science exists because caching is genuinely difficult. Not the mechanics — putting data in Redis is easy. The hard part is knowing when that data is wrong.
Get caching right and your application feels instant. Get it wrong and users see stale data, inconsistent state, or worse — data that was never supposed to be visible to them.
The most common pattern: check cache first, fall back to database, populate cache on miss.
1
2
3
4
5
6
7
8
9
10
11
12
13
defget_user(user_id):# Check cachecached=redis.get(f"user:{user_id}")ifcached:returnjson.loads(cached)# Cache miss - fetch from databaseuser=db.query("SELECT * FROM users WHERE id = %s",user_id)# Populate cacheredis.setex(f"user:{user_id}",3600,json.dumps(user))returnuser
Pros:
Simple to understand
Cache only contains data that’s actually requested
Database is source of truth
Cons:
First request is always slow (cache miss)
Stale data if database updated without cache invalidation
defupdate_user(user_id,data):# Update databasedb.execute("UPDATE users SET name = %s WHERE id = %s",(data['name'],user_id))# Update cacheuser=db.query("SELECT * FROM users WHERE id = %s",user_id)redis.setex(f"user:{user_id}",3600,json.dumps(user))returnuser
Pros:
Cache always consistent with database
No stale reads after writes
Cons:
Write latency increased (two writes per operation)
defupdate_user(user_id,data):db.execute("UPDATE users SET name = %s WHERE id = %s",(data['name'],user_id))redis.delete(f"user:{user_id}")# Invalidate cache
The challenge: knowing all the places data might be cached.
1
2
3
4
5
6
7
8
9
10
11
defupdate_user(user_id,data):db.execute("UPDATE users SET name = %s WHERE id = %s",(data['name'],user_id))# Invalidate all related cachesredis.delete(f"user:{user_id}")redis.delete(f"user_profile:{user_id}")redis.delete(f"user_settings:{user_id}")# Invalidate aggregates that include this useruser=db.query("SELECT team_id FROM users WHERE id = %s",user_id)redis.delete(f"team_members:{user['team_id']}")
This gets messy fast. Consider pub/sub for complex invalidation:
1
2
3
4
5
6
7
8
9
10
defupdate_user(user_id,data):db.execute("UPDATE users SET name = %s WHERE id = %s",(data['name'],user_id))redis.publish('user_updated',json.dumps({'user_id':user_id}))# Subscriber handles all invalidationsdefhandle_user_updated(message):user_id=message['user_id']redis.delete(f"user:{user_id}")redis.delete(f"user_profile:{user_id}")# ... etc
Include a version in cache keys that changes on update:
1
2
3
4
5
6
7
8
9
10
11
defget_user_cache_key(user_id):version=db.query("SELECT cache_version FROM users WHERE id = %s",user_id)returnf"user:{user_id}:v{version}"defupdate_user(user_id,data):db.execute("""
UPDATE users
SET name = %s, cache_version = cache_version + 1
WHERE id = %s """,(data['name'],user_id))# Old cache key automatically becomes orphaned
Old entries naturally expire via TTL. No explicit invalidation needed.
When a popular cache key expires, hundreds of requests simultaneously hit the database:
Solutions:
Lock while rebuilding:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
defget_user(user_id):cached=redis.get(f"user:{user_id}")ifcached:returnjson.loads(cached)# Try to acquire locklock_key=f"lock:user:{user_id}"ifredis.setnx(lock_key,"1"):redis.expire(lock_key,10)try:user=db.query("SELECT * FROM users WHERE id = %s",user_id)redis.setex(f"user:{user_id}",3600,json.dumps(user))returnuserfinally:redis.delete(lock_key)else:# Another request is rebuilding, wait and retrytime.sleep(0.1)returnget_user(user_id)
Stale-while-revalidate:
1
2
3
4
5
6
7
8
9
10
11
defget_user(user_id):cached=redis.get(f"user:{user_id}")ifcached:data=json.loads(cached)ifdata['_expires_at']<time.time():# Expired but usable - refresh asyncqueue.enqueue('refresh_user_cache',user_id)returndata# Hard miss - must fetch synchronouslyreturnfetch_and_cache_user(user_id)
fromfunctoolsimportlru_cache@lru_cache(maxsize=1000)defget_user_l1(user_id):# Check L2 (Redis)cached=redis.get(f"user:{user_id}")ifcached:returnjson.loads(cached)# Miss - fetch from databaseuser=db.query("SELECT * FROM users WHERE id = %s",user_id)redis.setex(f"user:{user_id}",3600,json.dumps(user))returnuser
Invalidation becomes harder — now you need to clear both levels:
1
2
3
definvalidate_user(user_id):get_user_l1.cache_clear()# Clear entire L1 (crude but safe)redis.delete(f"user:{user_id}")# Clear L2
# Bad: opaque hashkey=hashlib.md5(f"{user_id}{query}".encode()).hexdigest()# Good: structured and readablekey=f"user:{user_id}:profile"key=f"search:products:category={cat}:page={page}"key=f"api:v2:users:{user_id}:posts:limit=10"
Include version prefix for easy bulk invalidation:
1
2
3
4
5
6
7
CACHE_VERSION="v3"defcache_key(parts):returnf"{CACHE_VERSION}:{':'.join(parts)}"# Deploying with breaking cache format? Bump CACHE_VERSION# Old keys naturally expire, no migration needed
Hit rate: Percentage of requests served from cache (target: >90%)
Miss rate: Requests that hit the database
Eviction rate: Keys removed due to memory pressure
Latency: Cache read/write times
1
2
3
4
5
6
7
8
9
10
11
defget_user(user_id):start=time.time()cached=redis.get(f"user:{user_id}")ifcached:metrics.increment('cache.hit',tags=['type:user'])metrics.timing('cache.latency',time.time()-start)returnjson.loads(cached)metrics.increment('cache.miss',tags=['type:user'])# ... fetch from database
Low hit rate? Either your TTLs are too short, your working set is too large for cache size, or your access patterns are too random for caching to help.
Caching is a trade-off between speed and correctness. Perfect consistency means no caching. Perfect speed means permanent caching. Everything else is finding the right balance for your use case.
Start with cache-aside and TTL expiration. Add event-based invalidation for data that needs freshness. Use multi-level caching for high-traffic paths. Monitor constantly.
And remember: the best cache invalidation strategy is one simple enough that you can reason about it at 3am during an incident.
📬 Get the Newsletter
Weekly insights on DevOps, automation, and CLI mastery. No spam, unsubscribe anytime.