API Design

API Rate Limiting: Token Bucket vs Sliding Window Guide

Master API rate limiting with token bucket and sliding window algorithms. Compare implementation strategies, performance trade-offs, and real-world examples.

· By PropTechUSA AI
13m
Read Time
2.6k
Words
6
Sections
8
Code Examples

When your API traffic spikes from 1,000 to 100,000 requests per minute in seconds, the difference between token bucket and sliding window rate limiting can mean the difference between seamless scaling and catastrophic failure. Understanding these algorithms isn't just academic—it's critical for building resilient systems that handle real-world traffic patterns.

Understanding API Rate Limiting Fundamentals

Why Rate Limiting Matters in Modern APIs

API rate limiting serves as your first line of defense against abuse, ensures fair resource allocation, and maintains service quality under varying load conditions. Without proper rate limiting, a single misbehaving client can overwhelm your infrastructure, affecting all users.

Modern applications, especially in PropTech where real estate data feeds and property search APIs handle massive concurrent requests, require sophisticated rate limiting strategies. The choice between token bucket and sliding window algorithms directly impacts user experience, resource utilization, and system resilience.

Core Rate Limiting Concepts

Before diving into specific algorithms, let's establish the foundational concepts:

  • Rate: The number of requests allowed within a specific time window
  • Burst capacity: The maximum number of requests that can be processed immediately
  • Backpressure: The mechanism for handling excess requests
  • Fairness: How evenly the rate limit distributes across time

These concepts form the basis for evaluating different rate limiting approaches and their suitability for various use cases.

Traffic Pattern Considerations

Real-world API traffic rarely follows predictable patterns. Consider these common scenarios:

  • Bursty traffic: Mobile apps making batch requests after network reconnection
  • Periodic spikes: Property listing updates triggering simultaneous API calls
  • Steady streams: Real-time data feeds requiring consistent throughput

Each pattern demands different rate limiting characteristics, influencing your algorithm choice.

Token Bucket Algorithm Deep Dive

How Token Bucket Works

The token bucket algorithm operates on a simple but powerful principle: tokens are added to a bucket at a steady rate, and each request consumes one token. When the bucket is empty, requests are either queued or rejected.

typescript
class TokenBucket {

private tokens: number;

private lastRefill: number;

private readonly capacity: number;

private readonly refillRate: number; // tokens per second

constructor(capacity: number, refillRate: number) {

this.capacity = capacity;

this.refillRate = refillRate;

this.tokens = capacity;

this.lastRefill = Date.now();

}

private refill(): void {

class="kw">const now = Date.now();

class="kw">const timePassed = (now - this.lastRefill) / 1000;

class="kw">const tokensToAdd = Math.floor(timePassed * this.refillRate);

this.tokens = Math.min(this.capacity, this.tokens + tokensToAdd);

this.lastRefill = now;

}

public tryConsume(tokens: number = 1): boolean {

this.refill();

class="kw">if (this.tokens >= tokens) {

this.tokens -= tokens;

class="kw">return true;

}

class="kw">return false;

}

}

Token Bucket Advantages

The token bucket algorithm excels in several key areas:

  • Burst handling: Accumulated tokens allow for natural traffic bursts
  • Smooth rate limiting: Steady token replenishment provides consistent long-term rates
  • Memory efficiency: Requires minimal state tracking
  • Implementation simplicity: Straightforward logic with few edge cases

Real-World Token Bucket Implementation

Consider implementing token bucket rate limiting for a property search API:

typescript
class PropertySearchRateLimiter {

private buckets = new Map<string, TokenBucket>();

constructor(

private readonly baseRate: number = 100, // requests per minute

private readonly burstCapacity: number = 20

) {}

public class="kw">async checkRateLimit(clientId: string): Promise<{

allowed: boolean;

remainingTokens: number;

resetTime: number;

}> {

class="kw">let bucket = this.buckets.get(clientId);

class="kw">if (!bucket) {

bucket = new TokenBucket(

this.burstCapacity,

this.baseRate / 60 // convert to per-second rate

);

this.buckets.set(clientId, bucket);

}

class="kw">const allowed = bucket.tryConsume(1);

class="kw">return {

allowed,

remainingTokens: bucket.tokens,

resetTime: Date.now() + (60 * 1000) // next minute

};

}

}

Sliding Window Algorithm Implementation

Understanding Sliding Window Mechanics

The sliding window algorithm maintains a more precise view of request distribution over time. Instead of using tokens, it tracks actual request timestamps and enforces limits based on requests within a moving time window.

typescript
class SlidingWindowRateLimiter {

private requestLogs = new Map<string, number[]>();

private readonly windowSizeMs: number;

private readonly maxRequests: number;

constructor(windowSizeSeconds: number, maxRequests: number) {

this.windowSizeMs = windowSizeSeconds * 1000;

this.maxRequests = maxRequests;

}

public checkRateLimit(clientId: string): {

allowed: boolean;

requestsInWindow: number;

windowResetTime: number;

} {

class="kw">const now = Date.now();

class="kw">const windowStart = now - this.windowSizeMs;

// Get or create request log class="kw">for client

class="kw">let requests = this.requestLogs.get(clientId) || [];

// Remove requests outside the current window

requests = requests.filter(timestamp => timestamp > windowStart);

// Update the cleaned request log

this.requestLogs.set(clientId, requests);

// Check class="kw">if we can allow this request

class="kw">const allowed = requests.length < this.maxRequests;

class="kw">if (allowed) {

requests.push(now);

}

class="kw">return {

allowed,

requestsInWindow: requests.length,

windowResetTime: Math.min(...requests) + this.windowSizeMs

};

}

// Cleanup old entries periodically

public cleanup(): void {

class="kw">const cutoff = Date.now() - this.windowSizeMs;

class="kw">for (class="kw">const [clientId, requests] of this.requestLogs.entries()) {

class="kw">const validRequests = requests.filter(timestamp => timestamp > cutoff);

class="kw">if (validRequests.length === 0) {

this.requestLogs.delete(clientId);

} class="kw">else {

this.requestLogs.set(clientId, validRequests);

}

}

}

}

Sliding Window Variants

Several sliding window implementations offer different trade-offs:

Fixed Window Counter: Simpler implementation with less precision
typescript
class FixedWindowCounter {

private windows = new Map<string, { count: number; windowStart: number }>();

public checkRateLimit(clientId: string, limit: number, windowMs: number): boolean {

class="kw">const now = Date.now();

class="kw">const windowStart = Math.floor(now / windowMs) * windowMs;

class="kw">const window = this.windows.get(clientId);

class="kw">if (!window || window.windowStart !== windowStart) {

this.windows.set(clientId, { count: 1, windowStart });

class="kw">return true;

}

class="kw">if (window.count < limit) {

window.count++;

class="kw">return true;

}

class="kw">return false;

}

}

Sliding Window Log: Most accurate but memory-intensive

Performance Optimization Strategies

For high-throughput scenarios, consider these optimizations:

  • Bucketed timestamps: Group requests into sub-windows to reduce memory usage
  • Probabilistic counting: Use data structures like HyperLogLog for approximate counting
  • Distributed caching: Implement rate limiting state in Redis or similar systems
💡
Pro Tip
For PropTech applications handling property feed updates, consider using bucketed sliding windows with 10-second sub-windows within a 5-minute rate limiting window. This balances accuracy with memory efficiency.

Algorithm Comparison and Best Practices

Performance and Resource Trade-offs

Choosing between token bucket and sliding window algorithms requires understanding their performance characteristics:

Token Bucket Performance:
  • Memory usage: O(1) per client
  • CPU overhead: Minimal, constant-time operations
  • Accuracy: Good for burst handling, less precise for sustained rates
  • Best for: APIs with natural traffic bursts, resource-constrained environments
Sliding Window Performance:
  • Memory usage: O(n) where n is requests per window
  • CPU overhead: Higher due to timestamp filtering
  • Accuracy: Excellent for precise rate enforcement
  • Best for: APIs requiring strict rate compliance, billing-sensitive applications

Implementation Decision Matrix

Use this decision framework to choose the right algorithm:

| Scenario | Token Bucket | Sliding Window |

|----------|-------------|----------------|

| High burst tolerance needed | ✅ | ❌ |

| Strict rate compliance required | ❌ | ✅ |

| Memory constraints | ✅ | ❌ |

| Billing accuracy critical | ❌ | ✅ |

| Simple implementation preferred | ✅ | ❌ |

| Detailed analytics needed | ❌ | ✅ |

Hybrid Approaches

Advanced implementations often combine both algorithms:

typescript
class HybridRateLimiter {

private tokenBucket: TokenBucket;

private slidingWindow: SlidingWindowRateLimiter;

constructor(

burstCapacity: number,

sustainedRate: number,

windowSeconds: number

) {

this.tokenBucket = new TokenBucket(burstCapacity, sustainedRate);

this.slidingWindow = new SlidingWindowRateLimiter(windowSeconds, sustainedRate * windowSeconds);

}

public checkRateLimit(clientId: string): boolean {

// First check token bucket class="kw">for burst capacity

class="kw">const burstAllowed = this.tokenBucket.tryConsume(1);

class="kw">if (!burstAllowed) class="kw">return false;

// Then check sliding window class="kw">for sustained rate

class="kw">const sustainedCheck = this.slidingWindow.checkRateLimit(clientId);

class="kw">return sustainedCheck.allowed;

}

}

Monitoring and Observability

Implement comprehensive monitoring for your rate limiting system:

typescript
interface RateLimitMetrics {

totalRequests: number;

rejectedRequests: number;

averageTokensRemaining: number;

p95ResponseTime: number;

topClientsByVolume: Array<{ clientId: string; requestCount: number }>;

}

class RateLimitMonitor {

private metrics: RateLimitMetrics = {

totalRequests: 0,

rejectedRequests: 0,

averageTokensRemaining: 0,

p95ResponseTime: 0,

topClientsByVolume: []

};

public recordRequest(allowed: boolean, tokensRemaining: number, responseTime: number): void {

this.metrics.totalRequests++;

class="kw">if (!allowed) this.metrics.rejectedRequests++;

// Update running averages and percentiles

this.updateMetrics(tokensRemaining, responseTime);

}

private updateMetrics(tokensRemaining: number, responseTime: number): void {

// Implementation class="kw">for updating running statistics

}

}

⚠️
Warning
Avoid implementing rate limiting as an afterthought. Design your API architecture with rate limiting considerations from the start to prevent performance bottlenecks and ensure smooth scaling.

Advanced Considerations and Future-Proofing

Distributed Rate Limiting Challenges

As your API scales across multiple servers, coordinating rate limits becomes complex. Consider these approaches:

Centralized State Management:
typescript
class DistributedTokenBucket {

constructor(private redis: RedisClient) {}

class="kw">async tryConsume(clientId: string, tokens: number = 1): Promise<boolean> {

class="kw">const script =

local key = KEYS[1]

local capacity = tonumber(ARGV[1])

local tokens = tonumber(ARGV[2])

local interval = tonumber(ARGV[3])

local requested = tonumber(ARGV[4])

local bucket = redis.call(&#039;HMGET&#039;, key, &#039;tokens&#039;, &#039;last_refill&#039;)

local current_tokens = tonumber(bucket[1]) or capacity

local last_refill = tonumber(bucket[2]) or redis.call(&#039;TIME&#039;)[1]

local now = redis.call(&#039;TIME&#039;)[1]

local elapsed = now - last_refill

local new_tokens = math.min(capacity, current_tokens + (elapsed * tokens / interval))

class="kw">if new_tokens >= requested then

new_tokens = new_tokens - requested

redis.call(&#039;HMSET&#039;, key, &#039;tokens&#039;, new_tokens, &#039;last_refill&#039;, now)

redis.call(&#039;EXPIRE&#039;, key, interval * 2)

class="kw">return 1

class="kw">else

redis.call(&#039;HMSET&#039;, key, &#039;tokens&#039;, new_tokens, &#039;last_refill&#039;, now)

redis.call(&#039;EXPIRE&#039;, key, interval * 2)

class="kw">return 0

end

;

class="kw">const result = class="kw">await this.redis.eval(

script,

1,

rate_limit:${clientId},

100, // capacity

60, // tokens per interval

60, // interval in seconds

tokens

);

class="kw">return result === 1;

}

}

Integration with Modern API Gateways

When implementing rate limiting at PropTechUSA.ai, we've found that integrating with existing API gateway solutions provides the best balance of performance and maintainability. Consider these integration patterns:

  • Header-based communication: Pass rate limit information via HTTP headers
  • Middleware chains: Implement rate limiting as reusable middleware
  • Circuit breaker integration: Combine rate limiting with circuit breaker patterns

Adaptive Rate Limiting

Advanced systems implement adaptive rate limiting that adjusts based on system load:

typescript
class AdaptiveRateLimiter {

private baseRate: number;

private currentRate: number;

private systemLoad: number = 0;

constructor(baseRate: number) {

this.baseRate = baseRate;

this.currentRate = baseRate;

}

public updateSystemMetrics(cpuUsage: number, memoryUsage: number, responseTime: number): void {

// Calculate system load factor

this.systemLoad = (cpuUsage 0.4) + (memoryUsage 0.3) + (responseTime / 1000 * 0.3);

// Adjust rate based on system load

class="kw">if (this.systemLoad > 0.8) {

this.currentRate = this.baseRate * 0.5; // Reduce rate under high load

} class="kw">else class="kw">if (this.systemLoad < 0.3) {

this.currentRate = this.baseRate * 1.2; // Increase rate under low load

} class="kw">else {

this.currentRate = this.baseRate;

}

}

public getCurrentRate(): number {

class="kw">return this.currentRate;

}

}

Testing and Validation Strategies

Comprehensive testing ensures your rate limiting implementation performs correctly under various conditions:

  • Load testing: Validate behavior under expected traffic patterns
  • Burst testing: Ensure proper handling of traffic spikes
  • Edge case testing: Test boundary conditions and error scenarios
  • Long-running tests: Verify stability over extended periods

Conclusion and Implementation Roadmap

Choosing between token bucket and sliding window algorithms for API rate limiting isn't just a technical decision—it's a strategic choice that impacts user experience, system performance, and operational costs. Token bucket algorithms excel in scenarios requiring burst tolerance and resource efficiency, while sliding window approaches provide superior accuracy and control for strict rate enforcement.

For most PropTech applications, we recommend starting with a token bucket implementation for its simplicity and burst-handling capabilities, then evolving to hybrid approaches as requirements become more sophisticated. The key is understanding your specific traffic patterns, performance requirements, and operational constraints.

💡
Pro Tip
Start simple with token bucket rate limiting, monitor your traffic patterns closely, and iterate based on real-world usage data. Most applications benefit from the burst tolerance that token buckets provide.

Ready to implement robust rate limiting for your APIs? At PropTechUSA.ai, we've helped dozens of property technology companies build scalable, resilient API infrastructures. Our team can guide you through architecture decisions, implementation strategies, and performance optimization techniques tailored to your specific use case. Contact us to discuss how we can accelerate your API development journey.

Need This Built?
We build production-grade systems with the exact tech covered in this article.
Start Your Project
PT
PropTechUSA.ai Engineering
Technical Content
Deep technical content from the team building production systems with Cloudflare Workers, AI APIs, and modern web infrastructure.