Resilience AI Systems

Designing for Model Failure:
AI System Resilience Patterns

What happens when Claude or OpenAI goes down? Production patterns for fallback chains, graceful degradation, and keeping your AI-powered system running.

๐Ÿ“– 12 min read January 24, 2026

AI APIs fail. Claude has outages. OpenAI rate limits. Anthropic has maintenance windows. If your production system assumes 100% availability, you're building a time bomb.

This is the resilience architecture running in production, handling AI failures gracefully without customer impact.

The Failure Modes

Before building resilience, understand what can go wrong:

๐Ÿšซ
Complete Outage
API returns 500/503
โฑ๏ธ
Rate Limiting
429 Too Many Requests
๐ŸŒ
Degraded Performance
Responses taking 30s+

Pattern 1: The Fallback Chain

Never depend on a single AI provider. Build a chain of fallbacks:

AI Provider Fallback Chain
๐ŸŸข
Claude
Primary
โ†’
๐ŸŸก
GPT-4
Fallback 1
โ†’
๐ŸŸก
Gemini
Fallback 2
โ†’
๐Ÿ”ด
Cached/Static
Last Resort
ai-fallback.ts
interface AIProvider { name: string; complete: (prompt: string) => Promise<string>; isHealthy: () => Promise<boolean>; } const providers: AIProvider[] = [ { name: 'claude', complete: claudeComplete, isHealthy: claudeHealth }, { name: 'openai', complete: openaiComplete, isHealthy: openaiHealth }, { name: 'gemini', complete: geminiComplete, isHealthy: geminiHealth }, ]; export async function aiComplete(prompt: string): Promise<string> { for (const provider of providers) { try { // Skip unhealthy providers if (!await provider.isHealthy()) { console.log(`Skipping ${provider.name}: unhealthy`); continue; } const result = await withTimeout( provider.complete(prompt), 10000 // 10 second timeout ); // Log which provider succeeded await logProviderUsage(provider.name, 'success'); return result; } catch (error) { console.error(`${provider.name} failed:`, error); await logProviderUsage(provider.name, 'failure'); // Continue to next provider } } // All providers failed - return cached/static response return getCachedResponse(prompt); }

Pattern 2: Circuit Breaker

Don't keep hammering a failing service. Implement circuit breakers that "trip" after repeated failures:

circuit-breaker.ts
type CircuitState = 'closed' | 'open' | 'half-open'; class CircuitBreaker { private state: CircuitState = 'closed'; private failures = 0; private lastFailure = 0; private threshold = 5; // Open after 5 failures private timeout = 30000; // Try again after 30s async call<T>(fn: () => Promise<T>): Promise<T> { // If open, check if we should try again if (this.state === 'open') { if (Date.now() - this.lastFailure > this.timeout) { this.state = 'half-open'; } else { throw new Error('Circuit breaker is open'); } } try { const result = await fn(); this.onSuccess(); return result; } catch (error) { this.onFailure(); throw error; } } private onSuccess() { this.failures = 0; this.state = 'closed'; } private onFailure() { this.failures++; this.lastFailure = Date.now(); if (this.failures >= this.threshold) { this.state = 'open'; } } }
Why Circuit Breakers Matter
Without circuit breakers, a failing provider consumes your timeout budget on every request. With circuit breakers, you fail fast and skip to healthy providers immediately. Response time during outages drops from 10+ seconds to milliseconds.

Pattern 3: Response Caching

Many AI requests are repetitive. Cache responses to reduce API calls and provide instant fallbacks:

ai-cache.ts
async function cachedAIComplete( prompt: string, env: Env ): Promise<string> { // Generate cache key from prompt hash const cacheKey = `ai:${await hashPrompt(prompt)}`; // Check cache first const cached = await env.KV.get(cacheKey); if (cached) { await logMetric('ai_cache_hit'); return cached; } // Cache miss - call AI try { const response = await aiComplete(prompt); // Cache for 1 hour await env.KV.put(cacheKey, response, { expirationTtl: 3600 }); return response; } catch (error) { // If AI fails, try to return stale cache const stale = await env.KV.get(cacheKey + ':stale'); if (stale) { await logMetric('ai_stale_cache_used'); return stale; } throw error; } } // Hash prompt for consistent cache keys async function hashPrompt(prompt: string): Promise<string> { const encoder = new TextEncoder(); const data = encoder.encode(prompt); const hash = await crypto.subtle.digest('SHA-256', data); return btoa(String.fromCharCode(...new Uint8Array(hash))).slice(0, 16); }

Pattern 4: Graceful Degradation

When AI fails, don't crash. Degrade gracefully to a simpler experience:

graceful-degradation.ts
interface ChatResponse { message: string; source: 'ai' | 'template' | 'fallback'; degraded: boolean; } async function handleChat(userMessage: string): Promise<ChatResponse> { // Try AI first try { const aiResponse = await aiComplete(userMessage); return { message: aiResponse, source: 'ai', degraded: false }; } catch (error) { console.error('AI failed, trying templates'); } // Try template matching const template = matchTemplate(userMessage); if (template) { return { message: template, source: 'template', degraded: true }; } // Last resort: generic fallback return { message: `Thanks for your message. Our AI assistant is temporarily unavailable. Please call us at 1-888-784-3881 or try again in a few minutes.`, source: 'fallback', degraded: true }; } // Pre-defined templates for common questions function matchTemplate(message: string): string | null { const templates: Record<string, string> = { 'pricing': 'Our services start at $999. Visit /pricing for details.', 'contact': 'You can reach us at 1-888-784-3881 or /contact.', 'hours': 'We respond to inquiries 24/7 via our AI system.', }; for (const [keyword, response] of Object.entries(templates)) { if (message.toLowerCase().includes(keyword)) { return response; } } return null; }

Pattern 5: Health Monitoring

Proactively monitor provider health instead of discovering failures on user requests:

health-monitor.ts
// Run every minute via Cron Trigger export async function scheduled(event: ScheduledEvent, env: Env) { const providers = ['claude', 'openai', 'gemini']; for (const provider of providers) { const health = await checkProviderHealth(provider); // Store health status in KV await env.KV.put(`health:${provider}`, JSON.stringify({ healthy: health.ok, latency: health.latency, checkedAt: Date.now() }), { expirationTtl: 300 }); // Alert if provider is down if (!health.ok) { await sendSlackAlert(`โš ๏ธ ${provider} is unhealthy: ${health.error}`); } } } async function checkProviderHealth(provider: string) { const start = Date.now(); try { await testCompletion(provider); return { ok: true, latency: Date.now() - start }; } catch (error) { return { ok: false, error: error.message, latency: Date.now() - start }; } }

Implementation Checklist

  • Multiple AI providers configured with fallback order
  • Circuit breakers on each provider
  • Response caching with stale-while-revalidate
  • Graceful degradation to templates/static responses
  • Health monitoring with alerting
  • Timeouts on all AI calls (10s recommended)
  • Metrics logging for provider usage and failures

The goal isn't 100% AI uptimeโ€”it's 100% system uptime. AI is a feature, not a dependency. When it fails, users should notice quality degradation, not system failure.

Related Articles

Building AI Chatbots That Actually Convert
Read more โ†’
Real-Time Data Pipelines at the Edge
Read more โ†’
API Gateway Patterns at the Edge
Read more โ†’

Need Help Building Resilient AI Systems?

We architect production AI systems that handle failure gracefully.

โ†’ Get Architecture Help
๐ŸŒ™