Reliability
Production
Error Handling &
Recovery Patterns
Graceful degradation, retry logic, user-friendly errors, and recovery strategies for systems that fail gracefully.
Every system fails. The difference between good and great systems isn't whether they failโit's how they recover. A well-handled error builds trust. A raw stack trace destroys it.
Here's how we handle errors across 28 Workers processing millions of requests monthly.
Rule 1: Never Expose Raw Errors
โ Bad
{
"error": "TypeError: Cannot read property 'address' of undefined at getProperty (/src/handlers.js:142:23)"
}
โ Good
{
"error": {
"code": "PROPERTY_NOT_FOUND",
"message": "We couldn't find that property. Please check the address and try again.",
"requestId": "req_abc123"
}
}
error-handler.ts
class AppError extends Error {
constructor(
public code: string,
public userMessage: string,
public statusCode: number = 500,
public context?: Record<string, any>
) {
super(userMessage);
}
}
const ERRORS = {
PROPERTY_NOT_FOUND: new AppError(
'PROPERTY_NOT_FOUND',
"We couldn't find that property. Please check the address.",
404
),
RATE_LIMITED: new AppError(
'RATE_LIMITED',
'Too many requests. Please wait a moment and try again.',
429
),
AI_UNAVAILABLE: new AppError(
'AI_UNAVAILABLE',
'Our AI assistant is temporarily busy. Your message has been saved.',
503
)
};
function handleError(error: unknown, requestId: string): Response {
// Known application error
if (error instanceof AppError) {
return new Response(JSON.stringify({
error: {
code: error.code,
message: error.userMessage,
requestId
}
}), { status: error.statusCode });
}
// Unknown error - log full details, return generic message
console.error('Unhandled error', { error, requestId });
return new Response(JSON.stringify({
error: {
code: 'INTERNAL_ERROR',
message: 'Something went wrong. Please try again or contact support.',
requestId
}
}), { status: 500 });
}
Pattern 2: Retry with Exponential Backoff
retry.ts
interface RetryOptions {
maxAttempts: number;
baseDelayMs: number;
maxDelayMs: number;
retryableErrors: number[];
}
async function withRetry<T>(
fn: () => Promise<T>,
options: RetryOptions = {
maxAttempts: 3,
baseDelayMs: 100,
maxDelayMs: 5000,
retryableErrors: [408, 429, 500, 502, 503, 504]
}
): Promise<T> {
let lastError: Error;
for (let attempt = 1; attempt <= options.maxAttempts; attempt++) {
try {
return await fn();
} catch (error) {
lastError = error;
// Don't retry non-retryable errors
if (!isRetryable(error, options.retryableErrors)) {
throw error;
}
// Don't delay on last attempt
if (attempt < options.maxAttempts) {
const delay = calculateDelay(attempt, options);
await sleep(delay);
}
}
}
throw lastError;
}
function calculateDelay(attempt: number, options: RetryOptions): number {
// Exponential backoff: 100ms, 200ms, 400ms, 800ms...
const exponentialDelay = options.baseDelayMs * Math.pow(2, attempt - 1);
// Add jitter (ยฑ25%) to prevent thundering herd
const jitter = exponentialDelay * 0.25 * (Math.random() * 2 - 1);
// Cap at max delay
return Math.min(exponentialDelay + jitter, options.maxDelayMs);
}
| Status Code | Retry? | Reason |
|---|---|---|
| 400 Bad Request | โ No | Client error, won't change on retry |
| 401/403 Auth Error | โ No | Credentials won't magically become valid |
| 404 Not Found | โ No | Resource doesn't exist |
| 408 Timeout | โ Yes | Transient, may succeed on retry |
| 429 Rate Limited | โ Yes | Wait and retry (respect Retry-After) |
| 500 Server Error | โ Yes | Server may recover |
| 503 Unavailable | โ Yes | Temporary overload |
Pattern 3: Graceful Degradation
graceful-degradation.ts
async function getPropertyValuation(propertyId: string, env: Env) {
// Level 1: Try full AI valuation
try {
return await aiValuation(propertyId, env);
} catch (e) {
console.warn('AI valuation failed, trying algorithm', e);
}
// Level 2: Fall back to algorithmic valuation
try {
return await algorithmicValuation(propertyId, env);
} catch (e) {
console.warn('Algorithmic valuation failed, trying cache', e);
}
// Level 3: Return cached valuation (even if stale)
const cached = await env.KV.get(`valuation:${propertyId}`, 'json');
if (cached) {
return { ...cached, stale: true, disclaimer: 'Based on previous analysis' };
}
// Level 4: Return range estimate from comparable properties
const comps = await getNearbyComps(propertyId, env);
if (comps.length > 0) {
const avg = comps.reduce((s, c) => s + c.price, 0) / comps.length;
return {
estimate: { low: avg * 0.9, high: avg * 1.1 },
confidence: 'low',
disclaimer: 'Estimate based on nearby properties'
};
}
// Level 5: Be honest about failure
return {
estimate: null,
message: 'Unable to generate valuation. A specialist will contact you shortly.',
needsManualReview: true
};
}
User Trust Tip
When degrading, tell users what's different. "Based on previous analysis" is honest. Silently serving stale data erodes trust when users notice inconsistencies.
Pattern 4: Error Boundaries
error-boundary.ts
// Wrap risky operations so they don't crash the whole request
async function safeExecute<T>(
operation: () => Promise<T>,
fallback: T,
errorContext: string
): Promise<T> {
try {
return await operation();
} catch (error) {
console.error(`${errorContext} failed`, error);
return fallback;
}
}
// Usage: Non-critical features fail silently
const response = {
property: await getProperty(id), // Critical - let errors bubble
// Non-critical - use fallbacks
analytics: await safeExecute(
() => getAnalytics(id),
{ views: 0, saves: 0 },
'Analytics fetch'
),
recommendations: await safeExecute(
() => getRecommendations(id),
[],
'Recommendations fetch'
),
aiInsights: await safeExecute(
() => generateInsights(id),
null,
'AI insights'
)
};
Error Handling Checklist
- All errors caught at request boundary
- User-facing messages are helpful, not technical
- Full error details logged with request context
- Request IDs included in error responses
- Retry logic with exponential backoff + jitter
- Graceful degradation for each critical feature
- Error boundaries around non-critical operations
- Monitoring for error rates and types
Error handling isn't about preventing failuresโit's about maintaining user trust when failures happen. Every error is an opportunity to show users you're in control.