Caching is one of the most powerful performance optimizations available. It’s also one of the easiest to get wrong. The classic joke—“there are only two hard things in computer science: cache invalidation and naming things”—exists for a reason.

Let’s cut through the complexity.

When to Cache

Not everything should be cached. Before adding a cache, ask:

  1. Is this data read more than written? Caching write-heavy data creates invalidation nightmares.
  2. Is computing this expensive? Database queries, API calls, complex calculations—good candidates.
  3. Can I tolerate stale data? If not, caching gets complicated fast.
  4. Is this a hot path? Cache what’s accessed frequently, not everything.
1
2
3
4
5
6
7
8
9
# Good cache candidate: expensive query, rarely changes
@cache(ttl=3600)
def get_product_catalog():
    return db.query("SELECT * FROM products WHERE active = true")

# Bad cache candidate: changes every request
@cache(ttl=60)  # Don't do this
def get_user_cart(user_id):
    return db.query("SELECT * FROM carts WHERE user_id = ?", user_id)

Where to Cache

Caching happens at multiple layers. Each has tradeoffs.

Browser Cache

The closest cache to your user. Set proper headers:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
@app.route('/static/<path:filename>')
def static_files(filename):
    response = send_file(filename)
    # Cache static assets for 1 year
    response.headers['Cache-Control'] = 'public, max-age=31536000, immutable'
    return response

@app.route('/api/user/profile')
def user_profile():
    response = jsonify(get_profile())
    # Don't cache private data in shared caches
    response.headers['Cache-Control'] = 'private, max-age=60'
    return response

CDN Cache

For static assets and cacheable API responses. Key considerations:

  • Cache keys matter: Include relevant query params, exclude tracking params
  • Vary headers: Tell CDNs when to serve different versions
  • Purge strategy: How will you invalidate when content changes?
1
2
3
4
5
6
7
# Nginx example: cache API responses at edge
location /api/products {
    proxy_cache api_cache;
    proxy_cache_valid 200 10m;
    proxy_cache_key $request_uri;
    add_header X-Cache-Status $upstream_cache_status;
}

Application Cache

In-memory caching for hot data. Fast but limited by memory:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
from functools import lru_cache
from cachetools import TTLCache

# Simple LRU cache (no TTL, no distributed support)
@lru_cache(maxsize=1000)
def get_user_permissions(user_id: int) -> list:
    return db.query_permissions(user_id)

# TTL cache for time-sensitive data
config_cache = TTLCache(maxsize=100, ttl=300)

def get_feature_flags():
    if 'flags' not in config_cache:
        config_cache['flags'] = fetch_flags_from_service()
    return config_cache['flags']

Distributed Cache (Redis/Memcached)

Shared across application instances. The workhorse of production caching:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
import redis
import json

r = redis.Redis(host='localhost', port=6379, db=0)

def get_user_profile(user_id: str) -> dict:
    cache_key = f"user:profile:{user_id}"
    
    # Try cache first
    cached = r.get(cache_key)
    if cached:
        return json.loads(cached)
    
    # Cache miss: fetch and store
    profile = db.get_user(user_id)
    r.setex(cache_key, 3600, json.dumps(profile))  # 1 hour TTL
    return profile

Database Query Cache

Let the database cache query results. Often overlooked:

1
2
3
4
5
-- PostgreSQL: check if query is being cached
EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM products WHERE category = 'electronics';

-- Look for "Buffers: shared hit" vs "shared read"
-- Hits mean data came from PostgreSQL's buffer cache

Cache Invalidation Strategies

Here’s where things get interesting.

Time-Based (TTL)

The simplest approach. Set an expiration, accept staleness:

1
2
# Data is stale for up to 5 minutes after change
r.setex("product:123", 300, json.dumps(product))

When to use: Data where slight staleness is acceptable. Analytics, catalogs, configuration.

Event-Based

Invalidate when data changes:

1
2
3
4
5
6
7
def update_product(product_id: int, data: dict):
    db.update_product(product_id, data)
    # Immediately invalidate cache
    r.delete(f"product:{product_id}")
    # Also invalidate any lists containing this product
    r.delete("products:all")
    r.delete(f"products:category:{data['category']}")

When to use: Data that must be fresh. User profiles, inventory counts, permissions.

Write-Through

Update cache and database together:

1
2
3
4
def update_user_settings(user_id: str, settings: dict):
    # Update both atomically (as much as possible)
    db.update_user_settings(user_id, settings)
    r.setex(f"user:settings:{user_id}", 3600, json.dumps(settings))

When to use: Critical data where cache misses are expensive.

Cache-Aside (Lazy Loading)

The most common pattern. Cache on read, invalidate on write:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
def get_data(key):
    # 1. Check cache
    cached = cache.get(key)
    if cached:
        return cached
    
    # 2. Cache miss: load from source
    data = database.get(key)
    
    # 3. Populate cache
    cache.set(key, data, ttl=3600)
    return data

def update_data(key, value):
    # 1. Update source
    database.set(key, value)
    
    # 2. Invalidate cache (don't update—avoids race conditions)
    cache.delete(key)

Common Pitfalls

The Thundering Herd

When cache expires, hundreds of requests hit the database simultaneously:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
import threading

_locks = {}

def get_with_lock(key: str, fetch_func, ttl: int = 3600):
    cached = r.get(key)
    if cached:
        return json.loads(cached)
    
    # Use a lock to prevent thundering herd
    lock_key = f"lock:{key}"
    if lock_key not in _locks:
        _locks[lock_key] = threading.Lock()
    
    with _locks[lock_key]:
        # Double-check after acquiring lock
        cached = r.get(key)
        if cached:
            return json.loads(cached)
        
        data = fetch_func()
        r.setex(key, ttl, json.dumps(data))
        return data

Cache Stampede on Cold Start

After a deploy or cache flush, everything is a cache miss:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
def warm_cache():
    """Run after deploy to pre-populate critical caches"""
    popular_products = db.get_popular_products(limit=100)
    for product in popular_products:
        r.setex(f"product:{product.id}", 3600, json.dumps(product))
    
    # Stagger TTLs to prevent synchronized expiration
    import random
    for key, data in critical_configs.items():
        ttl = 3600 + random.randint(0, 600)  # 1 hour + 0-10 minutes
        r.setex(key, ttl, json.dumps(data))

Caching Nulls

Don’t let cache misses for non-existent data hit your database repeatedly:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
def get_user(user_id: str):
    cache_key = f"user:{user_id}"
    cached = r.get(cache_key)
    
    if cached == b"NULL":  # Explicit null marker
        return None
    if cached:
        return json.loads(cached)
    
    user = db.get_user(user_id)
    if user is None:
        r.setex(cache_key, 300, "NULL")  # Cache the miss, shorter TTL
    else:
        r.setex(cache_key, 3600, json.dumps(user))
    return user

Monitoring Your Cache

You can’t improve what you don’t measure:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
from prometheus_client import Counter, Histogram

cache_hits = Counter('cache_hits_total', 'Cache hits', ['cache_name'])
cache_misses = Counter('cache_misses_total', 'Cache misses', ['cache_name'])
cache_latency = Histogram('cache_operation_seconds', 'Cache operation latency', 
                          ['cache_name', 'operation'])

def cached_get(key: str, cache_name: str = "default"):
    with cache_latency.labels(cache_name, 'get').time():
        result = r.get(key)
    
    if result:
        cache_hits.labels(cache_name).inc()
    else:
        cache_misses.labels(cache_name).inc()
    
    return result

Track these metrics:

  • Hit rate: Should be >90% for effective caches
  • Latency: p50, p95, p99
  • Memory usage: Are you approaching limits?
  • Eviction rate: High evictions mean you need more memory or better TTLs

Start Simple

Don’t over-engineer caching from day one:

  1. Measure first: Profile your app. Find the actual bottlenecks.
  2. Start with TTL: Simple time-based expiration handles most cases.
  3. Add invalidation when needed: Only add complexity when staleness becomes a problem.
  4. Monitor everything: Cache problems are silent until they’re catastrophic.

The best cache is one you understand completely. Complexity in caching leads to bugs that are nearly impossible to reproduce and debug.


Cache wisely. Your database will thank you—until your cache fails at 2 AM.