A rate limiter controls the rate of requests that a client can make to a service, ensuring fairness, protecting against abuse, and managing resource usage effectively. It enforces policies like:

  • Maximum requests per second (e.g., 10 requests per second).
  • Burst limits (e.g., 50 requests allowed in a short burst, followed by enforcement of the steady rate).

Rate limiting is essential in APIs, distributed systems, and web services to ensure predictable behavior under varying loads.


Core Algorithms

Rate limiting typically relies on algorithms that efficiently track and enforce limits. Key algorithms include:

Token Bucket

  • Concept: A bucket is filled with tokens at a steady rate (e.g., 10 tokens per second). Each request consumes one token. If the bucket is empty, the request is denied.
  • Burst Handling: The bucket can hold more tokens than the refill rate, allowing short bursts of requests.
  • Implementation:
    • Increment tokens periodically.
    • Decrement tokens on each request, denying requests if the bucket is empty.

Leaky Bucket

  • Concept: Requests are added to a queue (the “bucket”). The queue processes requests at a fixed rate, “leaking” them out. Excess requests are discarded.
  • Implementation:
    • Add requests to the bucket.
    • Process requests from the bucket at a constant rate.
    • Drop requests if the bucket is full.

Fixed Window Counter

  • Concept: Divide time into discrete windows (e.g., 1 second). Count the number of requests in the current window and deny requests exceeding the limit.
  • Limitation: Can allow bursts at the edges of windows (e.g., if requests hit near the end of one window and the start of another).

Sliding Window Log

  • Concept: Maintain a log of timestamps for each request. Remove old timestamps and enforce limits based on recent activity.
  • Limitation: Memory-intensive for high traffic.

Sliding Window Counter

  • Concept: Combine the fixed window counter with fractional counts for the current and previous windows, smoothing out edge cases.

Distributed Rate Limiting

In distributed systems, rate limiting must coordinate across multiple servers or regions. Key techniques include:

  1. Centralized Counter with Redis:

    • Use Redis to store request counters for each client or key.
    • Example for Fixed Window Counter:
      def rate_limit(client_id, max_requests, window_seconds):
          key = f"rate_limit:{client_id}"
          count = redis.incr(key)
          if count == 1:
              redis.expire(key, window_seconds)
          return count <= max_requests
    • Pros: Simple and effective.
    • Cons: Requires Redis availability.
  2. Token Bucket with Redis:

    • Redis stores the token count and refill timestamp. A Lua script atomically handles token consumption and refill logic to avoid race conditions.
  3. Clustered Databases:

    • Use a strongly consistent database like Spanner or CockroachDB to maintain counters.
  4. Client-Side Rate Limiting:

    • Push rate-limiting logic to the client. Servers enforce limits only for non-cooperative clients.

Advanced Considerations

  1. Burst Tolerance:

    • Token Bucket handles bursts better than Fixed Window Counter by allowing token accumulation.
  2. Fairness Across Regions:

    • Use a global synchronization strategy (e.g., Redis or multi-region databases) to prevent clients from exploiting regional inconsistencies.
  3. Fallback Mechanisms:

    • When the rate limiter fails (e.g., Redis is down), implement default behavior to avoid complete service denial.
  4. Retry-After Headers:

    • Include Retry-After headers in API responses to inform clients when they can safely retry.