API Rate Limit Handler
Handle API rate limits gracefully without losing data or degrading UX.
Usage
- Read the API's rate limit documentation: limits per second, minute, hour, and day
- Parse rate limit headers from responses (X-RateLimit-Remaining, Retry-After)
- Implement exponential backoff with jitter for retries
- Add request queuing to spread requests evenly across time windows
- Cache responses to reduce unnecessary API calls
Examples
- Exponential backoff with jitter: On 429 response: wait = min(base_delay * 2^attempt + random(0, 1000ms), max_delay). Attempt 1: ~1s. Attempt 2: ~2s. Attempt 3: ~4s. Max 5 retries, max 30s delay. Jitter prevents thundering herd when multiple clients retry simultaneously
- Token bucket implementation: Allow 100 requests per minute. Bucket starts full (100 tokens). Each request consumes 1 token. Tokens refill at 100/60 = 1.67 per second. If bucket is empty, queue the request until a token is available. This smooths bursts naturally
- Response caching strategy: Cache GET responses with TTL matching data freshness needs. User profile: cache 5 minutes. Product catalog: cache 1 hour. Static config: cache 24 hours. Use ETags for conditional requests — returns 304 Not Modified (no body, no rate limit cost on many APIs)
- Monitoring dashboard: Track: requests per minute (current vs limit), 429 error count (should be near zero), average retry count per request, cache hit ratio (target >60%). Alert when usage exceeds 80% of limit — proactive, not reactive
Guidelines
- Always respect Retry-After headers — ignoring them can get your API key banned permanently
- Different endpoints often have different rate limits — don't assume one limit applies to all endpoints
- Implement circuit breakers: after 5 consecutive failures, stop sending requests for 60 seconds instead of hammering a failing API
- Log every 429 response with the endpoint and timestamp — patterns reveal optimization opportunities
- For batch operations, prefer bulk/batch API endpoints over individual calls when available
- Pre-calculate if your use case fits within rate limits before building. If you need 10,000 calls/hour and the limit is 1,000, caching and batching may not be enough — you need a different architecture