API Rate Limiting Algorithms Explained

Token bucket, leaky bucket, and sliding window explained with practical trade-offs for production APIs.

Baikal Signal
Protecting APIs while keeping good clients moving quickly.

Rate limiting protects your API from abuse and ensures fair resource allocation. This guide compares the most common algorithms and their trade-offs.

Why Rate Limit

Rate limiting serves multiple purposes:

  • Prevent DoS attacks
  • Ensure fair usage among clients
  • Control infrastructure costs
  • Maintain service quality during traffic spikes

Token Bucket

The token bucket algorithm allows bursts while enforcing average rate.

How It Works

  1. Bucket holds tokens (e.g., 100 tokens)
  2. Tokens replenish at fixed rate (e.g., 10/second)
  3. Each request consumes 1 token
  4. If no tokens available, request is rejected
class TokenBucket {
                                                                          constructor(capacity, refillRate) {
                                                                            this.capacity = capacity;
                                                                            this.tokens = capacity;
                                                                            this.refillRate = refillRate;
                                                                            this.lastRefill = Date.now();
                                                                          }
                                                                        
                                                                          consume() {
                                                                            this.refill();
                                                                            if (this.tokens > 0) {
                                                                              this.tokens--;
                                                                              return true;
                                                                            }
                                                                            return false;
                                                                          }
                                                                        
                                                                          refill() {
                                                                            const now = Date.now();
                                                                            const elapsed = (now - this.lastRefill) / 1000;
                                                                            const tokensToAdd = elapsed * this.refillRate;
                                                                            this.tokens = Math.min(this.capacity, this.tokens + tokensToAdd);
                                                                            this.lastRefill = now;
                                                                          }
                                                                        }

Best For

APIs that need to handle occasional bursts while maintaining average rate.

Leaky Bucket

Requests enter a queue and are processed at constant rate.

Implementation

class LeakyBucket {
                                                                          constructor(capacity, leakRate) {
                                                                            this.capacity = capacity;
                                                                            this.queue = [];
                                                                            this.leakRate = leakRate;
                                                                            
                                                                            setInterval(() => this.leak(), 1000 / leakRate);
                                                                          }
                                                                        
                                                                          add(request) {
                                                                            if (this.queue.length < this.capacity) {
                                                                              this.queue.push(request);
                                                                              return true;
                                                                            }
                                                                            return false; // Queue full
                                                                          }
                                                                        
                                                                          leak() {
                                                                            if (this.queue.length > 0) {
                                                                              const request = this.queue.shift();
                                                                              processRequest(request);
                                                                            }
                                                                          }
                                                                        }

Best For

Smoothing out traffic spikes and maintaining predictable output rate.

Sliding Window

Track requests in a rolling time window for precise rate limiting.

class SlidingWindow {
                                                                          constructor(limit, windowMs) {
                                                                            this.limit = limit;
                                                                            this.windowMs = windowMs;
                                                                            this.requests = [];
                                                                          }
                                                                        
                                                                          allow() {
                                                                            const now = Date.now();
                                                                            const windowStart = now - this.windowMs;
                                                                            
                                                                            // Remove old requests
                                                                            this.requests = this.requests.filter(t => t > windowStart);
                                                                            
                                                                            if (this.requests.length < this.limit) {
                                                                              this.requests.push(now);
                                                                              return true;
                                                                            }
                                                                            return false;
                                                                          }
                                                                        }

Best For

Precise rate limiting with no edge-case issues at window boundaries.

Algorithm Comparison

Algorithm Memory Bursts Complexity
Token Bucket O(1) Allowed Low
Leaky Bucket O(n) Smoothed Medium
Sliding Window O(n) Allowed Medium

Recommendations

  • Token Bucket: Most APIs (good balance)
  • Leaky Bucket: When steady output rate is critical
  • Sliding Window: When precision matters most

Summary

Choose token bucket for most use cases as it balances simplicity with flexibility. Use leaky bucket when you need predictable processing rates. Implement sliding window when you need precise control without edge cases. All algorithms can be distributed using Redis for multi-server deployments.