Network requests fail. Databases hiccup. Third-party APIs go down. The question isnt if - its when.
Naive retry logic makes things worse. Smart retry logic makes your system resilient.
The Wrong Way
// Don't do this
async function fetchData() {
while (true) {
try {
return await api.call();
} catch (e) {
// Retry immediately, forever
console.log('Retrying...');
}
}
}
This hammers the failing service, potentially making the outage worse. Its also gonna spin forever if the issue isnt transient.
Exponential Backoff
Wait longer between each retry:
async function fetchWithBackoff<T>(
fn: () => Promise<T>,
maxRetries = 3
): Promise<T> {
let attempt = 0;
while (attempt < maxRetries) {
try {
return await fn();
} catch (error) {
attempt++;
if (attempt >= maxRetries) throw error;
// Exponential backoff: 1s, 2s, 4s, 8s...
const delay = Math.pow(2, attempt) * 1000;
await sleep(delay);
}
}
throw new Error('Max retries exceeded');
}
Add Jitter
If 1000 clients all retry at the same time, you get a thundering herd:
Add randomness to spread retries:
function getBackoffDelay(attempt: number, baseDelay = 1000): number {
const exponentialDelay = Math.pow(2, attempt) * baseDelay;
// Add jitter: 50-100% of calculated delay
const jitter = exponentialDelay * (0.5 + Math.random() * 0.5);
// Cap at 30 seconds
return Math.min(jitter, 30000);
}
Only Retry Retryable Errors
Not all errors should be retried:
function isRetryable(error: Error): boolean {
// Network errors - retry
if (error.code === 'ECONNRESET') return true;
if (error.code === 'ETIMEDOUT') return true;
// HTTP status codes
const status = error['status'];
if (status === 429) return true; // Rate limited
if (status === 503) return true; // Service unavailable
if (status >= 500) return true; // Server errors
// Don't retry client errors
if (status >= 400 && status < 500) return false;
return false;
}
async function fetchWithSmartRetry<T>(fn: () => Promise<T>): Promise<T> {
let attempt = 0;
while (attempt < 3) {
try {
return await fn();
} catch (error) {
if (!isRetryable(error)) throw error;
attempt++;
const delay = getBackoffDelay(attempt);
await sleep(delay);
}
}
throw new Error('Max retries exceeded');
}
Respect Retry-After Headers
Some APIs tell you exactly when to retry:
function getRetryDelay(response: Response, attempt: number): number {
const retryAfter = response.headers.get('Retry-After');
if (retryAfter) {
// Could be seconds or a date
const seconds = parseInt(retryAfter);
if (!isNaN(seconds)) return seconds * 1000;
const date = Date.parse(retryAfter);
if (!isNaN(date)) return date - Date.now();
}
// Fall back to exponential backoff
return getBackoffDelay(attempt);
}
Complete Implementation
interface RetryConfig {
maxRetries: number;
baseDelay: number;
maxDelay: number;
shouldRetry: (error: Error) => boolean;
}
async function withRetry<T>(
fn: () => Promise<T>,
config: Partial<RetryConfig> = {}
): Promise<T> {
const {
maxRetries = 3,
baseDelay = 1000,
maxDelay = 30000,
shouldRetry = isRetryable
} = config;
let lastError: Error;
for (let attempt = 0; attempt <= maxRetries; attempt++) {
try {
return await fn();
} catch (error) {
lastError = error;
if (attempt >= maxRetries || !shouldRetry(error)) {
throw error;
}
const delay = Math.min(
getBackoffDelay(attempt, baseDelay),
maxDelay
);
console.log(`Retry ${attempt + 1}/${maxRetries} in ${delay}ms`);
await sleep(delay);
}
}
throw lastError;
}
Further Reading
Retries are about being a good citizen. Back off when services struggle, add jitter to avoid stampedes, and know when to give up. Your system - and the services you depend on - will be more reliable for it.
