Not all failures are permanent. Retry patterns help distinguish transient hiccups from real problems.
Exponential Backoff#
1
2
3
4
5
6
7
8
9
10
11
12
13
14
| import time
import random
def retry_with_backoff(func, max_retries=5, base_delay=1):
for attempt in range(max_retries):
try:
return func()
except Exception as e:
if attempt == max_retries - 1:
raise
delay = base_delay * (2 ** attempt)
jitter = random.uniform(0, delay * 0.1)
time.sleep(delay + jitter)
|
Each retry waits longer: 1s, 2s, 4s, 8s, 16s. Jitter prevents thundering herd.
With tenacity#
1
2
3
4
5
6
7
8
| from tenacity import retry, stop_after_attempt, wait_exponential
@retry(
stop=stop_after_attempt(5),
wait=wait_exponential(multiplier=1, min=1, max=60)
)
def call_api():
return requests.get("https://api.example.com")
|
Retry Only Transient Errors#
1
2
3
4
5
6
7
8
| from tenacity import retry, retry_if_exception_type
@retry(
retry=retry_if_exception_type((ConnectionError, TimeoutError)),
stop=stop_after_attempt(3)
)
def fetch_data():
return external_service.get()
|
Don’t retry 400 Bad Request — that won’t fix itself.
Idempotency Matters#
Only retry operations that are safe to repeat:
- GET requests ✓
- Database reads ✓
- Idempotent writes (with unique keys) ✓
- Non-idempotent POST × (might create duplicates)
1
2
3
4
5
6
| def create_order_idempotent(order_id, data):
# Use order_id as idempotency key
return db.execute(
"INSERT INTO orders (id, ...) VALUES (%s, ...) ON CONFLICT (id) DO NOTHING",
order_id
)
|
The Checklist#
- Is the error transient? (network, timeout, 503)
- Is the operation idempotent?
- Is backoff configured?
- Is there a max retry limit?
- Are you logging retry attempts?
Retries are powerful but dangerous. Use them deliberately.
📬 Get the Newsletter
Weekly insights on DevOps, automation, and CLI mastery. No spam, unsubscribe anytime.