One misbehaving client sending 10,000 requests per second can bring your API to its knees. Rate limiting prevents that. Its not about being mean to users - its about keeping the service alive for everyone.
Why Rate Limit?
Common Algorithms
Fixed Window
Count requests in fixed time windows (e.g., per minute):
class FixedWindowLimiter {
private counts: Map<string, { count: number; windowStart: number }> = new Map();
isAllowed(key: string, limit: number, windowMs: number): boolean {
const now = Date.now();
const windowStart = Math.floor(now / windowMs) * windowMs;
const entry = this.counts.get(key);
if (!entry || entry.windowStart !== windowStart) {
this.counts.set(key, { count: 1, windowStart });
return true;
}
if (entry.count >= limit) {
return false;
}
entry.count++;
return true;
}
}
Problem: Burst at window boundary. User makes 100 requests at 11:59, then 100 more at 12:00. Thats 200 in 2 seconds.
Sliding Window
Smoother limiting using weighted average:
class SlidingWindowLimiter {
isAllowed(key: string, limit: number, windowMs: number): boolean {
const now = Date.now();
const currentWindow = Math.floor(now / windowMs);
const previousWindow = currentWindow - 1;
const currentCount = this.getCount(key, currentWindow);
const previousCount = this.getCount(key, previousWindow);
// Weight based on position in current window
const elapsed = now % windowMs;
const weight = elapsed / windowMs;
const estimated = previousCount * (1 - weight) + currentCount;
if (estimated >= limit) {
return false;
}
this.increment(key, currentWindow);
return true;
}
}
Token Bucket
Tokens refill over time. Each request consumes a token:
class TokenBucket {
private buckets: Map<string, { tokens: number; lastRefill: number }> = new Map();
isAllowed(
key: string,
maxTokens: number,
refillRate: number // tokens per second
): boolean {
const now = Date.now();
let bucket = this.buckets.get(key);
if (!bucket) {
bucket = { tokens: maxTokens, lastRefill: now };
this.buckets.set(key, bucket);
}
// Refill tokens based on time elapsed
const elapsed = (now - bucket.lastRefill) / 1000;
bucket.tokens = Math.min(maxTokens, bucket.tokens + elapsed * refillRate);
bucket.lastRefill = now;
if (bucket.tokens < 1) {
return false;
}
bucket.tokens--;
return true;
}
}
Best for: Allowing bursts while maintaining average rate.
Redis Implementation
For distributed systems, use Redis:
async function checkRateLimit(
userId: string,
limit: number,
windowSeconds: number
): Promise<{ allowed: boolean; remaining: number; resetAt: number }> {
const key = `ratelimit:${userId}`;
const now = Math.floor(Date.now() / 1000);
const windowStart = now - windowSeconds;
// Remove old entries, add current, count total
const results = await redis
.multi()
.zremrangebyscore(key, 0, windowStart)
.zadd(key, now, `${now}-${Math.random()}`)
.zcard(key)
.expire(key, windowSeconds)
.exec();
const count = results[2][1] as number;
const allowed = count <= limit;
return {
allowed,
remaining: Math.max(0, limit - count),
resetAt: now + windowSeconds
};
}
Response Headers
Tell clients their limits:
app.use(async (req, res, next) => {
const result = await checkRateLimit(req.userId, 100, 60);
res.set({
'X-RateLimit-Limit': '100',
'X-RateLimit-Remaining': result.remaining.toString(),
'X-RateLimit-Reset': result.resetAt.toString(),
});
if (!result.allowed) {
res.set('Retry-After', '60');
return res.status(429).json({
error: 'Too many requests',
retryAfter: 60
});
}
next();
});
Different Limits for Different Things
const limits = {
// Per user
'user:requests': { limit: 100, window: 60 }, // 100/min
'user:uploads': { limit: 10, window: 3600 }, // 10/hour
// Per IP (for unauthenticated)
'ip:requests': { limit: 30, window: 60 }, // 30/min
// Per API key (for B2B)
'apikey:requests': { limit: 1000, window: 60 }, // 1000/min
// Global (protect the service)
'global:requests': { limit: 10000, window: 1 }, // 10k/sec
};
Rate Limit by Cost
Not all endpoints are equal:
const endpointCosts = {
'GET /users': 1,
'POST /users': 5,
'GET /reports/generate': 50, // Expensive operation
'POST /ai/generate': 100, // Very expensive
};
async function checkCostBasedLimit(userId: string, endpoint: string) {
const cost = endpointCosts[endpoint] || 1;
const result = await checkRateLimit(userId, 1000, 60); // 1000 points/min
if (result.remaining < cost) {
return { allowed: false, remaining: result.remaining };
}
// Consume points
await consumePoints(userId, cost);
return { allowed: true, remaining: result.remaining - cost };
}
Quick Checklist
- [ ] Rate limit by user AND by IP
- [ ] Return proper 429 status code
- [ ] Include rate limit headers
- [ ] Log rate limit hits for monitoring
- [ ] Have different tiers (free vs paid)
- [ ] Document limits in API docs
- [ ] Alert when users frequently hit limits
Further Reading
Rate limiting is about fairness. Good clients get reliable service. Bad clients get blocked before they hurt everyone else. Its one of those things that seems unnecessary until you really need it.
