API Security
Performance
Rate Limiting &
Throttling Patterns
Token buckets, sliding windows, and distributed rate limiting. Protect your APIs without frustrating legitimate users.
Rate limiting is the difference between a stable API and a crashed one. But bad rate limiting frustrates users, breaks integrations, and costs you customers. The goal isn't just protectionโit's invisible protection.
Here's how we rate limit across 28 Workers handling millions of requests.
| Algorithm | Best For | Pros | Cons |
|---|---|---|---|
| Fixed Window | Simple APIs | Easy to implement | Burst at boundaries |
| Sliding Window | Most APIs | Smooth limiting | More storage |
| Token Bucket | Bursty traffic | Allows bursts | Complex tuning |
| Leaky Bucket | Steady output | Constant rate | No burst allowed |
Pattern 1: Sliding Window Counter
The best balance of accuracy and simplicity for most APIs:
sliding-window.ts
interface RateLimitResult {
allowed: boolean;
remaining: number;
resetAt: number;
}
async function checkRateLimit(
key: string,
limit: number,
windowMs: number,
env: Env
): Promise<RateLimitResult> {
const now = Date.now();
const windowStart = now - windowMs;
const currentWindow = Math.floor(now / windowMs);
const previousWindow = currentWindow - 1;
// Get counts for current and previous windows
const [current, previous] = await Promise.all([
env.KV.get(`ratelimit:${key}:${currentWindow}`),
env.KV.get(`ratelimit:${key}:${previousWindow}`)
]);
const currentCount = parseInt(current || '0');
const previousCount = parseInt(previous || '0');
// Calculate weighted count (sliding window approximation)
const windowProgress = (now % windowMs) / windowMs;
const weightedCount = currentCount + previousCount * (1 - windowProgress);
const allowed = weightedCount < limit;
if (allowed) {
// Increment counter
await env.KV.put(
`ratelimit:${key}:${currentWindow}`,
(currentCount + 1).toString(),
{ expirationTtl: Math.ceil(windowMs / 1000) * 2 }
);
}
return {
allowed,
remaining: Math.max(0, Math.floor(limit - weightedCount)),
resetAt: (currentWindow + 1) * windowMs
};
}
Pattern 2: Token Bucket
Allow bursts while maintaining average rate:
token-bucket.ts
interface TokenBucket {
tokens: number;
lastRefill: number;
}
async function tokenBucketLimit(
key: string,
maxTokens: number,
refillRate: number, // tokens per second
tokensNeeded: number,
env: Env
): Promise<RateLimitResult> {
const now = Date.now();
// Get current bucket state
const bucketKey = `bucket:${key}`;
let bucket = await env.KV.get(bucketKey, 'json') as TokenBucket | null;
if (!bucket) {
bucket = { tokens: maxTokens, lastRefill: now };
}
// Calculate tokens to add since last request
const elapsed = (now - bucket.lastRefill) / 1000;
const tokensToAdd = elapsed * refillRate;
bucket.tokens = Math.min(maxTokens, bucket.tokens + tokensToAdd);
bucket.lastRefill = now;
// Check if we have enough tokens
const allowed = bucket.tokens >= tokensNeeded;
if (allowed) {
bucket.tokens -= tokensNeeded;
}
// Save bucket state
await env.KV.put(bucketKey, JSON.stringify(bucket), {
expirationTtl: 3600 // 1 hour
});
return {
allowed,
remaining: Math.floor(bucket.tokens),
resetAt: now + ((maxTokens - bucket.tokens) / refillRate) * 1000
};
}
Pattern 3: Rate Limit Middleware
rate-limit-middleware.ts
interface RateLimitConfig {
limit: number;
windowMs: number;
keyGenerator: (request: Request) => string;
}
function rateLimit(config: RateLimitConfig) {
return (handler: Handler): Handler => {
return async (request, env, ctx) => {
const key = config.keyGenerator(request);
const result = await checkRateLimit(
key, config.limit, config.windowMs, env
);
// Add rate limit headers to all responses
const headers = {
'X-RateLimit-Limit': config.limit.toString(),
'X-RateLimit-Remaining': result.remaining.toString(),
'X-RateLimit-Reset': Math.ceil(result.resetAt / 1000).toString()
};
if (!result.allowed) {
return new Response(JSON.stringify({
error: 'RATE_LIMITED',
message: 'Too many requests',
retryAfter: Math.ceil((result.resetAt - Date.now()) / 1000)
}), {
status: 429,
headers: {
...headers,
'Retry-After': Math.ceil((result.resetAt - Date.now()) / 1000).toString(),
'Content-Type': 'application/json'
}
});
}
const response = await handler(request, env, ctx);
// Add headers to successful response
Object.entries(headers).forEach(([k, v]) => {
response.headers.set(k, v);
});
return response;
};
};
}
// Usage with different strategies
const apiLimiter = rateLimit({
limit: 100,
windowMs: 60000, // 100 requests per minute
keyGenerator: (req) => req.headers.get('X-API-Key') || getIP(req)
});
const authLimiter = rateLimit({
limit: 5,
windowMs: 300000, // 5 attempts per 5 minutes
keyGenerator: (req) => `auth:${getIP(req)}`
});
User Experience Tip
Always return helpful 429 responses with Retry-After headers. Tell users exactly when they can try again. Include remaining quota in every response so clients can throttle themselves.
Rate Limiting Checklist
- Use sliding window for most APIs (best accuracy)
- Use token bucket if you need to allow controlled bursts
- Always return X-RateLimit-* headers on every response
- Include Retry-After header on 429 responses
- Rate limit by API key first, IP address as fallback
- Set different limits for different endpoints
- Use higher limits for authenticated users
- Monitor rate limit hits to tune thresholds
Good rate limiting is invisible to legitimate users. They should never notice it existsโuntil it saves your API from a traffic spike or attack.