Not all failures are permanent. Retry patterns help distinguish transient hiccups from real problems.

Exponential Backoff

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
import time
import random

def retry_with_backoff(func, max_retries=5, base_delay=1):
    for attempt in range(max_retries):
        try:
            return func()
        except Exception as e:
            if attempt == max_retries - 1:
                raise
            
            delay = base_delay * (2 ** attempt)
            jitter = random.uniform(0, delay * 0.1)
            time.sleep(delay + jitter)

Each retry waits longer: 1s, 2s, 4s, 8s, 16s. Jitter prevents thundering herd.

With tenacity

1
2
3
4
5
6
7
8
from tenacity import retry, stop_after_attempt, wait_exponential

@retry(
    stop=stop_after_attempt(5),
    wait=wait_exponential(multiplier=1, min=1, max=60)
)
def call_api():
    return requests.get("https://api.example.com")

Retry Only Transient Errors

1
2
3
4
5
6
7
8
from tenacity import retry, retry_if_exception_type

@retry(
    retry=retry_if_exception_type((ConnectionError, TimeoutError)),
    stop=stop_after_attempt(3)
)
def fetch_data():
    return external_service.get()

Don’t retry 400 Bad Request — that won’t fix itself.

Idempotency Matters

Only retry operations that are safe to repeat:

  • GET requests ✓
  • Database reads ✓
  • Idempotent writes (with unique keys) ✓
  • Non-idempotent POST × (might create duplicates)
1
2
3
4
5
6
def create_order_idempotent(order_id, data):
    # Use order_id as idempotency key
    return db.execute(
        "INSERT INTO orders (id, ...) VALUES (%s, ...) ON CONFLICT (id) DO NOTHING",
        order_id
    )

The Checklist

  1. Is the error transient? (network, timeout, 503)
  2. Is the operation idempotent?
  3. Is backoff configured?
  4. Is there a max retry limit?
  5. Are you logging retry attempts?

Retries are powerful but dangerous. Use them deliberately.