Rate Limiting: Protecting Your API From Itself

One misbehaving client sending 10,000 requests per second can bring your API to its knees. Rate limiting prevents that. Its not about being mean to users - its about keeping the service alive for everyone.

Why Rate Limit?

Common Algorithms

Fixed Window

Count requests in fixed time windows (e.g., per minute):

class FixedWindowLimiter {
  private counts: Map<string, { count: number; windowStart: number }> = new Map();

  isAllowed(key: string, limit: number, windowMs: number): boolean {
    const now = Date.now();
    const windowStart = Math.floor(now / windowMs) * windowMs;

    const entry = this.counts.get(key);

    if (!entry || entry.windowStart !== windowStart) {
      this.counts.set(key, { count: 1, windowStart });
      return true;
    }

    if (entry.count >= limit) {
      return false;
    }

    entry.count++;
    return true;
  }
}

Problem: Burst at window boundary. User makes 100 requests at 11:59, then 100 more at 12:00. Thats 200 in 2 seconds.

Sliding Window

Smoother limiting using weighted average:

class SlidingWindowLimiter {
  isAllowed(key: string, limit: number, windowMs: number): boolean {
    const now = Date.now();
    const currentWindow = Math.floor(now / windowMs);
    const previousWindow = currentWindow - 1;

    const currentCount = this.getCount(key, currentWindow);
    const previousCount = this.getCount(key, previousWindow);

    // Weight based on position in current window
    const elapsed = now % windowMs;
    const weight = elapsed / windowMs;

    const estimated = previousCount * (1 - weight) + currentCount;

    if (estimated >= limit) {
      return false;
    }

    this.increment(key, currentWindow);
    return true;
  }
}

Token Bucket

Tokens refill over time. Each request consumes a token:

class TokenBucket {
  private buckets: Map<string, { tokens: number; lastRefill: number }> = new Map();

  isAllowed(
    key: string,
    maxTokens: number,
    refillRate: number  // tokens per second
  ): boolean {
    const now = Date.now();
    let bucket = this.buckets.get(key);

    if (!bucket) {
      bucket = { tokens: maxTokens, lastRefill: now };
      this.buckets.set(key, bucket);
    }

    // Refill tokens based on time elapsed
    const elapsed = (now - bucket.lastRefill) / 1000;
    bucket.tokens = Math.min(maxTokens, bucket.tokens + elapsed * refillRate);
    bucket.lastRefill = now;

    if (bucket.tokens < 1) {
      return false;
    }

    bucket.tokens--;
    return true;
  }
}

Best for: Allowing bursts while maintaining average rate.

Redis Implementation

For distributed systems, use Redis:

async function checkRateLimit(
  userId: string,
  limit: number,
  windowSeconds: number
): Promise<{ allowed: boolean; remaining: number; resetAt: number }> {
  const key = `ratelimit:${userId}`;
  const now = Math.floor(Date.now() / 1000);
  const windowStart = now - windowSeconds;

  // Remove old entries, add current, count total
  const results = await redis
    .multi()
    .zremrangebyscore(key, 0, windowStart)
    .zadd(key, now, `${now}-${Math.random()}`)
    .zcard(key)
    .expire(key, windowSeconds)
    .exec();

  const count = results[2][1] as number;
  const allowed = count <= limit;

  return {
    allowed,
    remaining: Math.max(0, limit - count),
    resetAt: now + windowSeconds
  };
}

Response Headers

Tell clients their limits:

app.use(async (req, res, next) => {
  const result = await checkRateLimit(req.userId, 100, 60);

  res.set({
    'X-RateLimit-Limit': '100',
    'X-RateLimit-Remaining': result.remaining.toString(),
    'X-RateLimit-Reset': result.resetAt.toString(),
  });

  if (!result.allowed) {
    res.set('Retry-After', '60');
    return res.status(429).json({
      error: 'Too many requests',
      retryAfter: 60
    });
  }

  next();
});

Different Limits for Different Things

const limits = {
  // Per user
  'user:requests': { limit: 100, window: 60 },    // 100/min
  'user:uploads': { limit: 10, window: 3600 },    // 10/hour

  // Per IP (for unauthenticated)
  'ip:requests': { limit: 30, window: 60 },       // 30/min

  // Per API key (for B2B)
  'apikey:requests': { limit: 1000, window: 60 }, // 1000/min

  // Global (protect the service)
  'global:requests': { limit: 10000, window: 1 }, // 10k/sec
};

Rate Limit by Cost

Not all endpoints are equal:

const endpointCosts = {
  'GET /users': 1,
  'POST /users': 5,
  'GET /reports/generate': 50,  // Expensive operation
  'POST /ai/generate': 100,     // Very expensive
};

async function checkCostBasedLimit(userId: string, endpoint: string) {
  const cost = endpointCosts[endpoint] || 1;
  const result = await checkRateLimit(userId, 1000, 60); // 1000 points/min

  if (result.remaining < cost) {
    return { allowed: false, remaining: result.remaining };
  }

  // Consume points
  await consumePoints(userId, cost);
  return { allowed: true, remaining: result.remaining - cost };
}

Quick Checklist

[ ] Rate limit by user AND by IP
[ ] Return proper 429 status code
[ ] Include rate limit headers
[ ] Log rate limit hits for monitoring
[ ] Have different tiers (free vs paid)
[ ] Document limits in API docs
[ ] Alert when users frequently hit limits