LLM APIs are flaky. Rate limits, timeouts, content filters, random 500s. If you dont handle errors gracefully, your AI features will frustrate users constantly.
Common Failures
Some are retryable, some arent. Handle them differently.
Basic Retry Logic
async function callWithRetry<T>(
fn: () => Promise<T>,
maxRetries = 3
): Promise<T> {
let lastError: Error;
for (let i = 0; i < maxRetries; i++) {
try {
return await fn();
} catch (err) {
lastError = err;
// Don't retry non-retryable errors
if (!isRetryable(err)) throw err;
// Exponential backoff
const delay = Math.pow(2, i) * 1000;
await sleep(delay);
}
}
throw lastError;
}
function isRetryable(err: Error): boolean {
const code = err['status'] || err['code'];
return [429, 500, 502, 503, 504].includes(code);
}
Rate Limit Handling
Rate limits are guaranteed to happen. Respect the retry-after header:
async function handleRateLimit(response: Response) {
const retryAfter = response.headers.get("retry-after");
if (retryAfter) {
const waitMs = parseInt(retryAfter) * 1000;
await sleep(waitMs);
return true; // Should retry
}
return false;
}
Better yet, add a token bucket to avoid hitting limits:
class RateLimiter {
private tokens: number;
private lastRefill: number;
constructor(private tokensPerMinute: number) {
this.tokens = tokensPerMinute;
this.lastRefill = Date.now();
}
async acquire() {
this.refill();
if (this.tokens <= 0) {
// Wait until next refill
const waitTime = 60000 - (Date.now() - this.lastRefill);
await sleep(waitTime);
this.refill();
}
this.tokens--;
}
private refill() {
const now = Date.now();
const elapsed = now - this.lastRefill;
if (elapsed >= 60000) {
this.tokens = this.tokensPerMinute;
this.lastRefill = now;
}
}
}
Graceful Degradation
When AI fails, have a backup:
async function getAIResponse(prompt: string): Promise<string> {
try {
return await callWithRetry(() => llm.complete(prompt));
} catch (err) {
// Log for monitoring
logger.error("AI call failed", { error: err, prompt });
// Return graceful fallback
if (isContentFiltered(err)) {
return "I can't help with that request.";
}
if (isQuotaExceeded(err)) {
return "AI features are temporarily unavailable.";
}
return "Something went wrong. Please try again.";
}
}
Never show raw error messages to users. They dont care about 429 Too Many Requests.
Timeout Protection
LLM calls can hang. Always set timeouts:
async function withTimeout<T>(
promise: Promise<T>,
ms: number
): Promise<T> {
const timeout = new Promise<never>((_, reject) => {
setTimeout(() => reject(new Error("Request timeout")), ms);
});
return Promise.race([promise, timeout]);
}
// Usage
const response = await withTimeout(
llm.complete(prompt),
30000 // 30 second max
);
Circuit Breaker Pattern
If an API keeps failing, stop hammering it:
class CircuitBreaker {
private failures = 0;
private lastFailure: number = 0;
private state: "closed" | "open" | "half-open" = "closed";
async call<T>(fn: () => Promise<T>): Promise<T> {
if (this.state === "open") {
// Check if we should try again
if (Date.now() - this.lastFailure > 30000) {
this.state = "half-open";
} else {
throw new Error("Circuit breaker is open");
}
}
try {
const result = await fn();
this.onSuccess();
return result;
} catch (err) {
this.onFailure();
throw err;
}
}
private onSuccess() {
this.failures = 0;
this.state = "closed";
}
private onFailure() {
this.failures++;
this.lastFailure = Date.now();
if (this.failures >= 5) {
this.state = "open";
}
}
}
Quick Checklist
Before shipping AI features:
- [ ] Retry logic with exponential backoff
- [ ] Timeout on all LLM calls
- [ ] User-friendly error messages
- [ ] Rate limit handling
- [ ] Fallback for when AI is down
- [ ] Error logging for debugging
- [ ] Alerts for quota/billing issues
LLM APIs will fail. Plan for it. Your 2am self will thank you.
