Caching is one of the most powerful performance optimizations available. It’s also one of the easiest to get wrong. The classic joke—“there are only two hard things in computer science: cache invalidation and naming things”—exists for a reason.
Let’s cut through the complexity.
When to Cache#
Not everything should be cached. Before adding a cache, ask:
- Is this data read more than written? Caching write-heavy data creates invalidation nightmares.
- Is computing this expensive? Database queries, API calls, complex calculations—good candidates.
- Can I tolerate stale data? If not, caching gets complicated fast.
- Is this a hot path? Cache what’s accessed frequently, not everything.
1
2
3
4
5
6
7
8
9
| # Good cache candidate: expensive query, rarely changes
@cache(ttl=3600)
def get_product_catalog():
return db.query("SELECT * FROM products WHERE active = true")
# Bad cache candidate: changes every request
@cache(ttl=60) # Don't do this
def get_user_cart(user_id):
return db.query("SELECT * FROM carts WHERE user_id = ?", user_id)
|
Where to Cache#
Caching happens at multiple layers. Each has tradeoffs.
Browser Cache#
The closest cache to your user. Set proper headers:
1
2
3
4
5
6
7
8
9
10
11
12
13
| @app.route('/static/<path:filename>')
def static_files(filename):
response = send_file(filename)
# Cache static assets for 1 year
response.headers['Cache-Control'] = 'public, max-age=31536000, immutable'
return response
@app.route('/api/user/profile')
def user_profile():
response = jsonify(get_profile())
# Don't cache private data in shared caches
response.headers['Cache-Control'] = 'private, max-age=60'
return response
|
CDN Cache#
For static assets and cacheable API responses. Key considerations:
- Cache keys matter: Include relevant query params, exclude tracking params
- Vary headers: Tell CDNs when to serve different versions
- Purge strategy: How will you invalidate when content changes?
1
2
3
4
5
6
7
| # Nginx example: cache API responses at edge
location /api/products {
proxy_cache api_cache;
proxy_cache_valid 200 10m;
proxy_cache_key $request_uri;
add_header X-Cache-Status $upstream_cache_status;
}
|
Application Cache#
In-memory caching for hot data. Fast but limited by memory:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
| from functools import lru_cache
from cachetools import TTLCache
# Simple LRU cache (no TTL, no distributed support)
@lru_cache(maxsize=1000)
def get_user_permissions(user_id: int) -> list:
return db.query_permissions(user_id)
# TTL cache for time-sensitive data
config_cache = TTLCache(maxsize=100, ttl=300)
def get_feature_flags():
if 'flags' not in config_cache:
config_cache['flags'] = fetch_flags_from_service()
return config_cache['flags']
|
Distributed Cache (Redis/Memcached)#
Shared across application instances. The workhorse of production caching:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
| import redis
import json
r = redis.Redis(host='localhost', port=6379, db=0)
def get_user_profile(user_id: str) -> dict:
cache_key = f"user:profile:{user_id}"
# Try cache first
cached = r.get(cache_key)
if cached:
return json.loads(cached)
# Cache miss: fetch and store
profile = db.get_user(user_id)
r.setex(cache_key, 3600, json.dumps(profile)) # 1 hour TTL
return profile
|
Database Query Cache#
Let the database cache query results. Often overlooked:
1
2
3
4
5
| -- PostgreSQL: check if query is being cached
EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM products WHERE category = 'electronics';
-- Look for "Buffers: shared hit" vs "shared read"
-- Hits mean data came from PostgreSQL's buffer cache
|
Cache Invalidation Strategies#
Here’s where things get interesting.
Time-Based (TTL)#
The simplest approach. Set an expiration, accept staleness:
1
2
| # Data is stale for up to 5 minutes after change
r.setex("product:123", 300, json.dumps(product))
|
When to use: Data where slight staleness is acceptable. Analytics, catalogs, configuration.
Event-Based#
Invalidate when data changes:
1
2
3
4
5
6
7
| def update_product(product_id: int, data: dict):
db.update_product(product_id, data)
# Immediately invalidate cache
r.delete(f"product:{product_id}")
# Also invalidate any lists containing this product
r.delete("products:all")
r.delete(f"products:category:{data['category']}")
|
When to use: Data that must be fresh. User profiles, inventory counts, permissions.
Write-Through#
Update cache and database together:
1
2
3
4
| def update_user_settings(user_id: str, settings: dict):
# Update both atomically (as much as possible)
db.update_user_settings(user_id, settings)
r.setex(f"user:settings:{user_id}", 3600, json.dumps(settings))
|
When to use: Critical data where cache misses are expensive.
Cache-Aside (Lazy Loading)#
The most common pattern. Cache on read, invalidate on write:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
| def get_data(key):
# 1. Check cache
cached = cache.get(key)
if cached:
return cached
# 2. Cache miss: load from source
data = database.get(key)
# 3. Populate cache
cache.set(key, data, ttl=3600)
return data
def update_data(key, value):
# 1. Update source
database.set(key, value)
# 2. Invalidate cache (don't update—avoids race conditions)
cache.delete(key)
|
Common Pitfalls#
The Thundering Herd#
When cache expires, hundreds of requests hit the database simultaneously:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
| import threading
_locks = {}
def get_with_lock(key: str, fetch_func, ttl: int = 3600):
cached = r.get(key)
if cached:
return json.loads(cached)
# Use a lock to prevent thundering herd
lock_key = f"lock:{key}"
if lock_key not in _locks:
_locks[lock_key] = threading.Lock()
with _locks[lock_key]:
# Double-check after acquiring lock
cached = r.get(key)
if cached:
return json.loads(cached)
data = fetch_func()
r.setex(key, ttl, json.dumps(data))
return data
|
Cache Stampede on Cold Start#
After a deploy or cache flush, everything is a cache miss:
1
2
3
4
5
6
7
8
9
10
11
| def warm_cache():
"""Run after deploy to pre-populate critical caches"""
popular_products = db.get_popular_products(limit=100)
for product in popular_products:
r.setex(f"product:{product.id}", 3600, json.dumps(product))
# Stagger TTLs to prevent synchronized expiration
import random
for key, data in critical_configs.items():
ttl = 3600 + random.randint(0, 600) # 1 hour + 0-10 minutes
r.setex(key, ttl, json.dumps(data))
|
Caching Nulls#
Don’t let cache misses for non-existent data hit your database repeatedly:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
| def get_user(user_id: str):
cache_key = f"user:{user_id}"
cached = r.get(cache_key)
if cached == b"NULL": # Explicit null marker
return None
if cached:
return json.loads(cached)
user = db.get_user(user_id)
if user is None:
r.setex(cache_key, 300, "NULL") # Cache the miss, shorter TTL
else:
r.setex(cache_key, 3600, json.dumps(user))
return user
|
Monitoring Your Cache#
You can’t improve what you don’t measure:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
| from prometheus_client import Counter, Histogram
cache_hits = Counter('cache_hits_total', 'Cache hits', ['cache_name'])
cache_misses = Counter('cache_misses_total', 'Cache misses', ['cache_name'])
cache_latency = Histogram('cache_operation_seconds', 'Cache operation latency',
['cache_name', 'operation'])
def cached_get(key: str, cache_name: str = "default"):
with cache_latency.labels(cache_name, 'get').time():
result = r.get(key)
if result:
cache_hits.labels(cache_name).inc()
else:
cache_misses.labels(cache_name).inc()
return result
|
Track these metrics:
- Hit rate: Should be >90% for effective caches
- Latency: p50, p95, p99
- Memory usage: Are you approaching limits?
- Eviction rate: High evictions mean you need more memory or better TTLs
Start Simple#
Don’t over-engineer caching from day one:
- Measure first: Profile your app. Find the actual bottlenecks.
- Start with TTL: Simple time-based expiration handles most cases.
- Add invalidation when needed: Only add complexity when staleness becomes a problem.
- Monitor everything: Cache problems are silent until they’re catastrophic.
The best cache is one you understand completely. Complexity in caching leads to bugs that are nearly impossible to reproduce and debug.
Cache wisely. Your database will thank you—until your cache fails at 2 AM.