Reliability Production

Error Handling &
Recovery Patterns

Graceful degradation, retry logic, user-friendly errors, and recovery strategies for systems that fail gracefully.

๐Ÿ“– 11 min read January 24, 2026

Every system fails. The difference between good and great systems isn't whether they failโ€”it's how they recover. A well-handled error builds trust. A raw stack trace destroys it.

Here's how we handle errors across 28 Workers processing millions of requests monthly.

Rule 1: Never Expose Raw Errors

โŒ Bad
{ "error": "TypeError: Cannot read property 'address' of undefined at getProperty (/src/handlers.js:142:23)" }
โœ“ Good
{ "error": { "code": "PROPERTY_NOT_FOUND", "message": "We couldn't find that property. Please check the address and try again.", "requestId": "req_abc123" } }
error-handler.ts
class AppError extends Error { constructor( public code: string, public userMessage: string, public statusCode: number = 500, public context?: Record<string, any> ) { super(userMessage); } } const ERRORS = { PROPERTY_NOT_FOUND: new AppError( 'PROPERTY_NOT_FOUND', "We couldn't find that property. Please check the address.", 404 ), RATE_LIMITED: new AppError( 'RATE_LIMITED', 'Too many requests. Please wait a moment and try again.', 429 ), AI_UNAVAILABLE: new AppError( 'AI_UNAVAILABLE', 'Our AI assistant is temporarily busy. Your message has been saved.', 503 ) }; function handleError(error: unknown, requestId: string): Response { // Known application error if (error instanceof AppError) { return new Response(JSON.stringify({ error: { code: error.code, message: error.userMessage, requestId } }), { status: error.statusCode }); } // Unknown error - log full details, return generic message console.error('Unhandled error', { error, requestId }); return new Response(JSON.stringify({ error: { code: 'INTERNAL_ERROR', message: 'Something went wrong. Please try again or contact support.', requestId } }), { status: 500 }); }

Pattern 2: Retry with Exponential Backoff

retry.ts
interface RetryOptions { maxAttempts: number; baseDelayMs: number; maxDelayMs: number; retryableErrors: number[]; } async function withRetry<T>( fn: () => Promise<T>, options: RetryOptions = { maxAttempts: 3, baseDelayMs: 100, maxDelayMs: 5000, retryableErrors: [408, 429, 500, 502, 503, 504] } ): Promise<T> { let lastError: Error; for (let attempt = 1; attempt <= options.maxAttempts; attempt++) { try { return await fn(); } catch (error) { lastError = error; // Don't retry non-retryable errors if (!isRetryable(error, options.retryableErrors)) { throw error; } // Don't delay on last attempt if (attempt < options.maxAttempts) { const delay = calculateDelay(attempt, options); await sleep(delay); } } } throw lastError; } function calculateDelay(attempt: number, options: RetryOptions): number { // Exponential backoff: 100ms, 200ms, 400ms, 800ms... const exponentialDelay = options.baseDelayMs * Math.pow(2, attempt - 1); // Add jitter (ยฑ25%) to prevent thundering herd const jitter = exponentialDelay * 0.25 * (Math.random() * 2 - 1); // Cap at max delay return Math.min(exponentialDelay + jitter, options.maxDelayMs); }
Status Code Retry? Reason
400 Bad Request โŒ No Client error, won't change on retry
401/403 Auth Error โŒ No Credentials won't magically become valid
404 Not Found โŒ No Resource doesn't exist
408 Timeout โœ“ Yes Transient, may succeed on retry
429 Rate Limited โœ“ Yes Wait and retry (respect Retry-After)
500 Server Error โœ“ Yes Server may recover
503 Unavailable โœ“ Yes Temporary overload

Pattern 3: Graceful Degradation

graceful-degradation.ts
async function getPropertyValuation(propertyId: string, env: Env) { // Level 1: Try full AI valuation try { return await aiValuation(propertyId, env); } catch (e) { console.warn('AI valuation failed, trying algorithm', e); } // Level 2: Fall back to algorithmic valuation try { return await algorithmicValuation(propertyId, env); } catch (e) { console.warn('Algorithmic valuation failed, trying cache', e); } // Level 3: Return cached valuation (even if stale) const cached = await env.KV.get(`valuation:${propertyId}`, 'json'); if (cached) { return { ...cached, stale: true, disclaimer: 'Based on previous analysis' }; } // Level 4: Return range estimate from comparable properties const comps = await getNearbyComps(propertyId, env); if (comps.length > 0) { const avg = comps.reduce((s, c) => s + c.price, 0) / comps.length; return { estimate: { low: avg * 0.9, high: avg * 1.1 }, confidence: 'low', disclaimer: 'Estimate based on nearby properties' }; } // Level 5: Be honest about failure return { estimate: null, message: 'Unable to generate valuation. A specialist will contact you shortly.', needsManualReview: true }; }
User Trust Tip
When degrading, tell users what's different. "Based on previous analysis" is honest. Silently serving stale data erodes trust when users notice inconsistencies.

Pattern 4: Error Boundaries

error-boundary.ts
// Wrap risky operations so they don't crash the whole request async function safeExecute<T>( operation: () => Promise<T>, fallback: T, errorContext: string ): Promise<T> { try { return await operation(); } catch (error) { console.error(`${errorContext} failed`, error); return fallback; } } // Usage: Non-critical features fail silently const response = { property: await getProperty(id), // Critical - let errors bubble // Non-critical - use fallbacks analytics: await safeExecute( () => getAnalytics(id), { views: 0, saves: 0 }, 'Analytics fetch' ), recommendations: await safeExecute( () => getRecommendations(id), [], 'Recommendations fetch' ), aiInsights: await safeExecute( () => generateInsights(id), null, 'AI insights' ) };

Error Handling Checklist

  • All errors caught at request boundary
  • User-facing messages are helpful, not technical
  • Full error details logged with request context
  • Request IDs included in error responses
  • Retry logic with exponential backoff + jitter
  • Graceful degradation for each critical feature
  • Error boundaries around non-critical operations
  • Monitoring for error rates and types

Error handling isn't about preventing failuresโ€”it's about maintaining user trust when failures happen. Every error is an opportunity to show users you're in control.

Related Articles

Designing for Model Failure
Read more โ†’
Monitoring & Observability
Read more โ†’
Real-Time Data Pipelines
Read more โ†’

Need Reliable Systems?

We build production systems that fail gracefully.

โ†’ Get Started