Webhooks are deceptively simple: someone sends you HTTP requests, you process them. What could go wrong?
Everything. Webhooks are inbound attack surface, and most implementations have gaps you could drive a truck through.
The Obvious One: Signature Verification#
Most webhook providers sign their payloads. Stripe uses HMAC-SHA256. GitHub uses HMAC-SHA1 or SHA256. Slack uses its own signing scheme.
You’ve probably implemented this:
1
2
3
4
5
6
7
8
9
10
11
| import hmac
import hashlib
def verify_stripe_signature(payload: bytes, signature: str, secret: str) -> bool:
expected = hmac.new(
secret.encode(),
payload,
hashlib.sha256
).hexdigest()
return hmac.compare_digest(f"sha256={expected}", signature)
|
Good. But this is table stakes. What else?
The Timestamp Check You’re Probably Skipping#
Signatures prevent tampering. They don’t prevent replay attacks. Someone captures a valid webhook, replays it a thousand times, and your system processes the same order/event/action repeatedly.
Providers include timestamps for this reason:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
| import time
def verify_webhook(payload: bytes, timestamp: str, signature: str, secret: str) -> bool:
# Reject if too old (5 minute tolerance)
if abs(time.time() - int(timestamp)) > 300:
return False
# Build the signed payload (provider-specific format)
signed_payload = f"{timestamp}.{payload.decode()}"
expected = hmac.new(
secret.encode(),
signed_payload.encode(),
hashlib.sha256
).hexdigest()
return hmac.compare_digest(expected, signature)
|
The 5-minute window is a balance. Too tight, and clock skew causes legitimate rejections. Too loose, and replay windows widen.
Idempotency: Because Webhooks Lie#
Here’s the dirty secret: webhook providers will retry. A lot. Network hiccup? Retry. Your server was slow? Retry. Their system glitched? Retry.
Your handler must be idempotent:
1
2
3
4
5
6
7
8
9
10
11
12
| async def handle_payment_webhook(event: dict):
event_id = event['id']
# Check if already processed
if await redis.setnx(f"webhook:processed:{event_id}", "1"):
await redis.expire(f"webhook:processed:{event_id}", 86400 * 7) # 7 days
await actually_process_payment(event)
else:
logger.info(f"Duplicate webhook ignored: {event_id}")
# Always return 200 - we've handled it (even if by ignoring)
return {"status": "ok"}
|
The Redis key prevents duplicate processing. The 7-day TTL handles the case where a provider retries days later (yes, this happens).
The IP Allowlist Trap#
Some guides recommend allowlisting webhook source IPs. Stripe publishes theirs. GitHub publishes theirs. Problem solved?
Not quite:
- IPs change — Providers update their ranges. Your allowlist rots.
- IP spoofing exists — Easier than you think, especially for UDP-based protocols
- Maintenance burden — Now you’re managing yet another config that can break silently
IP allowlisting is defense in depth, not primary defense. Signature verification is what actually matters. If you do allowlist, automate updates:
1
2
3
4
5
6
7
| import httpx
async def update_stripe_allowlist():
# Stripe publishes their IPs
resp = await httpx.get("https://stripe.com/files/ips/ips_webhooks.json")
ips = resp.json()['WEBHOOKS']
await update_firewall_rules(ips)
|
Secret Rotation Without Downtime#
Your webhook secret will eventually need rotation. Leaked, expired policy, paranoia—doesn’t matter why.
The naive approach:
- Generate new secret
- Update your code
- Update the provider
- Hope you did it fast enough
The correct approach: support multiple secrets simultaneously:
1
2
3
4
5
6
7
8
9
10
| WEBHOOK_SECRETS = [
os.environ['WEBHOOK_SECRET_CURRENT'],
os.environ.get('WEBHOOK_SECRET_PREVIOUS', ''), # May be empty
]
def verify_signature(payload: bytes, signature: str) -> bool:
for secret in WEBHOOK_SECRETS:
if secret and verify_with_secret(payload, signature, secret):
return True
return False
|
Rotation becomes:
- Generate new secret, set as
WEBHOOK_SECRET_CURRENT - Move old secret to
WEBHOOK_SECRET_PREVIOUS - Deploy your code
- Update the provider (now using new secret)
- After confirmation, clear
WEBHOOK_SECRET_PREVIOUS
Zero downtime. No race conditions.
Queue, Don’t Process Inline#
Webhook handlers should be fast. Really fast. Providers have timeout windows (Stripe: 20 seconds, GitHub: 10 seconds). Miss the window, they’ll retry, you’ll get duplicates, chaos ensues.
The pattern:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
| async def webhook_handler(request: Request):
# 1. Verify signature (fast)
if not verify_signature(request):
return Response(status_code=401)
# 2. Parse minimally (fast)
event_id = extract_event_id(request.body)
# 3. Dedupe check (fast)
if await already_processed(event_id):
return Response(status_code=200)
# 4. Queue for async processing (fast)
await queue.enqueue('process_webhook', request.body)
# 5. Return immediately
return Response(status_code=200)
# Separate worker processes the queue
async def process_webhook(payload: bytes):
event = json.loads(payload)
# Actual business logic here - can take as long as needed
await handle_event(event)
|
The webhook endpoint’s job is to acknowledge receipt, not to process. Processing happens asynchronously.
Logging That Actually Helps#
When something goes wrong with webhooks, you need to debug. Log everything:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
| async def webhook_handler(request: Request):
request_id = str(uuid.uuid4())
logger.info("webhook_received", extra={
"request_id": request_id,
"provider": detect_provider(request),
"signature_header": request.headers.get('X-Signature', 'missing'),
"timestamp_header": request.headers.get('X-Timestamp', 'missing'),
"content_length": len(request.body),
"source_ip": request.client.host,
})
if not verify_signature(request):
logger.warning("webhook_signature_failed", extra={
"request_id": request_id,
"body_preview": request.body[:500], # For debugging
})
return Response(status_code=401)
# ... rest of handler
|
In production, this logging has saved me countless hours. “Why didn’t this webhook process?” becomes answerable.
The Endpoint Enumeration Problem#
Your webhook endpoint exists at /webhooks/stripe or /api/webhooks/payments. Attackers will find it. They’ll probe it. They’ll send garbage.
Mitigation:
1
2
3
4
5
6
7
8
9
10
| # Use non-obvious paths
WEBHOOK_PATH = f"/webhooks/{os.environ['WEBHOOK_PATH_SECRET']}/stripe"
# Rate limit aggressively on failed signatures
@rate_limit(key="ip", limit=10, window=60) # 10 failures per minute per IP
async def webhook_handler(request: Request):
if not verify_signature(request):
# Count against rate limit
raise RateLimitExceeded()
# ...
|
A path secret adds obscurity (not security, but it reduces noise). Aggressive rate limiting on failures stops probing.
Testing Webhooks Properly#
You can’t test webhooks by hoping they work in production. Options:
Local development:
1
2
3
4
5
| # Stripe CLI forwards webhooks to localhost
stripe listen --forward-to localhost:8000/webhooks/stripe
# ngrok exposes local server
ngrok http 8000
|
Integration tests:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
| def test_webhook_signature_verification():
secret = "test_secret"
payload = b'{"id": "evt_123", "type": "payment.succeeded"}'
timestamp = str(int(time.time()))
# Generate valid signature
signed_payload = f"{timestamp}.{payload.decode()}"
signature = hmac.new(
secret.encode(),
signed_payload.encode(),
hashlib.sha256
).hexdigest()
# Test acceptance
assert verify_webhook(payload, timestamp, signature, secret)
# Test rejection with wrong secret
assert not verify_webhook(payload, timestamp, signature, "wrong_secret")
# Test rejection with old timestamp
old_timestamp = str(int(time.time()) - 600)
assert not verify_webhook(payload, old_timestamp, signature, secret)
|
Replay testing:
1
2
3
4
5
| # Store raw webhooks for replay in staging
async def webhook_handler(request: Request):
if settings.STORE_WEBHOOKS_FOR_REPLAY:
await store_raw_webhook(request)
# ... normal processing
|
The Checklist#
Before shipping a webhook integration:
Webhooks are a trust boundary. Treat them like one.