Every API needs rate limiting. Without it, one enthusiastic script kiddie or a bug in a client application can take down your entire service. The question isn’t whether to rate limit — it’s how to do it without making your API frustrating to use.
The Naive Approach (And Why It Fails)
| |
Fixed limits per time window are simple to implement and almost always wrong. They create the “thundering herd” problem: all your users hit the limit at minute :00, back off, retry at :01, and create a synchronized spike that’s worse than no limit at all.
Token Bucket: The Industry Standard
The token bucket algorithm is what most production APIs actually use. Imagine a bucket that holds tokens. Each request consumes a token. Tokens refill at a steady rate.
| |
This allows bursts (up to the bucket capacity) while maintaining a steady average rate. A user can make 100 requests instantly if they’ve been idle, but sustained usage averages out to the refill rate.
Sliding Window: Smoother Than Fixed Windows
If you want time-based limits without the thundering herd, use sliding windows:
| |
The window slides with time, so there’s no magic moment when everyone’s limits reset simultaneously.
The Headers That Make Rate Limiting Bearable
Rate limiting without communication is just rejection. Good APIs tell clients exactly where they stand:
| |
When you do reject a request:
| |
The Retry-After header is crucial. It tells clients exactly when to try again instead of forcing them to guess (or worse, retry immediately in a loop).
Differentiated Limits
Not all requests are equal. Not all users are equal.
| |
Expensive operations (database-heavy queries, AI inference, file processing) should cost more “tokens” than simple lookups. This lets you protect your infrastructure while still allowing high volumes of cheap requests.
Graceful Degradation Over Hard Failures
Instead of immediately returning 429, consider degraded responses:
| |
Users experiencing degraded service are less frustrated than users hitting a brick wall. They can still function, just with reduced capability.
Client-Side: Be a Good Citizen
If you’re consuming rate-limited APIs, build resilience into your client:
| |
Exponential backoff with jitter prevents synchronized retries. Respecting Retry-After prevents wasted requests.
The Human Element
Technical implementation is half the battle. The other half is communication:
- Document your limits clearly. Don’t make users discover them through 429 errors.
- Provide usage dashboards. Let users see their consumption before they hit limits.
- Alert before cutting off. Email when users approach 80% of their quota.
- Make upgrades easy. If someone needs more capacity, the path should be obvious.
Rate limiting is a conversation between your API and its consumers. Done well, it’s invisible to legitimate users and protective against abuse. Done poorly, it’s a constant source of frustration and support tickets.
The best rate limiter is one your users never notice — until they try to abuse it.