Designing for Model Failure:
AI System Resilience Patterns
What happens when Claude or OpenAI goes down? Production patterns for fallback chains, graceful degradation, and keeping your AI-powered system running.
AI APIs fail. Claude has outages. OpenAI rate limits. Anthropic has maintenance windows. If your production system assumes 100% availability, you're building a time bomb.
This is the resilience architecture running in production, handling AI failures gracefully without customer impact.
The Failure Modes
Before building resilience, understand what can go wrong:
Pattern 1: The Fallback Chain
Never depend on a single AI provider. Build a chain of fallbacks:
interface AIProvider {
name: string;
complete: (prompt: string) => Promise<string>;
isHealthy: () => Promise<boolean>;
}
const providers: AIProvider[] = [
{ name: 'claude', complete: claudeComplete, isHealthy: claudeHealth },
{ name: 'openai', complete: openaiComplete, isHealthy: openaiHealth },
{ name: 'gemini', complete: geminiComplete, isHealthy: geminiHealth },
];
export async function aiComplete(prompt: string): Promise<string> {
for (const provider of providers) {
try {
// Skip unhealthy providers
if (!await provider.isHealthy()) {
console.log(`Skipping ${provider.name}: unhealthy`);
continue;
}
const result = await withTimeout(
provider.complete(prompt),
10000 // 10 second timeout
);
// Log which provider succeeded
await logProviderUsage(provider.name, 'success');
return result;
} catch (error) {
console.error(`${provider.name} failed:`, error);
await logProviderUsage(provider.name, 'failure');
// Continue to next provider
}
}
// All providers failed - return cached/static response
return getCachedResponse(prompt);
}
Pattern 2: Circuit Breaker
Don't keep hammering a failing service. Implement circuit breakers that "trip" after repeated failures:
type CircuitState = 'closed' | 'open' | 'half-open';
class CircuitBreaker {
private state: CircuitState = 'closed';
private failures = 0;
private lastFailure = 0;
private threshold = 5; // Open after 5 failures
private timeout = 30000; // Try again after 30s
async call<T>(fn: () => Promise<T>): Promise<T> {
// If open, check if we should try again
if (this.state === 'open') {
if (Date.now() - this.lastFailure > this.timeout) {
this.state = 'half-open';
} else {
throw new Error('Circuit breaker is open');
}
}
try {
const result = await fn();
this.onSuccess();
return result;
} catch (error) {
this.onFailure();
throw error;
}
}
private onSuccess() {
this.failures = 0;
this.state = 'closed';
}
private onFailure() {
this.failures++;
this.lastFailure = Date.now();
if (this.failures >= this.threshold) {
this.state = 'open';
}
}
}
Pattern 3: Response Caching
Many AI requests are repetitive. Cache responses to reduce API calls and provide instant fallbacks:
async function cachedAIComplete(
prompt: string,
env: Env
): Promise<string> {
// Generate cache key from prompt hash
const cacheKey = `ai:${await hashPrompt(prompt)}`;
// Check cache first
const cached = await env.KV.get(cacheKey);
if (cached) {
await logMetric('ai_cache_hit');
return cached;
}
// Cache miss - call AI
try {
const response = await aiComplete(prompt);
// Cache for 1 hour
await env.KV.put(cacheKey, response, {
expirationTtl: 3600
});
return response;
} catch (error) {
// If AI fails, try to return stale cache
const stale = await env.KV.get(cacheKey + ':stale');
if (stale) {
await logMetric('ai_stale_cache_used');
return stale;
}
throw error;
}
}
// Hash prompt for consistent cache keys
async function hashPrompt(prompt: string): Promise<string> {
const encoder = new TextEncoder();
const data = encoder.encode(prompt);
const hash = await crypto.subtle.digest('SHA-256', data);
return btoa(String.fromCharCode(...new Uint8Array(hash))).slice(0, 16);
}
Pattern 4: Graceful Degradation
When AI fails, don't crash. Degrade gracefully to a simpler experience:
interface ChatResponse {
message: string;
source: 'ai' | 'template' | 'fallback';
degraded: boolean;
}
async function handleChat(userMessage: string): Promise<ChatResponse> {
// Try AI first
try {
const aiResponse = await aiComplete(userMessage);
return { message: aiResponse, source: 'ai', degraded: false };
} catch (error) {
console.error('AI failed, trying templates');
}
// Try template matching
const template = matchTemplate(userMessage);
if (template) {
return { message: template, source: 'template', degraded: true };
}
// Last resort: generic fallback
return {
message: `Thanks for your message. Our AI assistant is temporarily unavailable. Please call us at 1-888-784-3881 or try again in a few minutes.`,
source: 'fallback',
degraded: true
};
}
// Pre-defined templates for common questions
function matchTemplate(message: string): string | null {
const templates: Record<string, string> = {
'pricing': 'Our services start at $999. Visit /pricing for details.',
'contact': 'You can reach us at 1-888-784-3881 or /contact.',
'hours': 'We respond to inquiries 24/7 via our AI system.',
};
for (const [keyword, response] of Object.entries(templates)) {
if (message.toLowerCase().includes(keyword)) {
return response;
}
}
return null;
}
Pattern 5: Health Monitoring
Proactively monitor provider health instead of discovering failures on user requests:
// Run every minute via Cron Trigger
export async function scheduled(event: ScheduledEvent, env: Env) {
const providers = ['claude', 'openai', 'gemini'];
for (const provider of providers) {
const health = await checkProviderHealth(provider);
// Store health status in KV
await env.KV.put(`health:${provider}`, JSON.stringify({
healthy: health.ok,
latency: health.latency,
checkedAt: Date.now()
}), { expirationTtl: 300 });
// Alert if provider is down
if (!health.ok) {
await sendSlackAlert(`โ ๏ธ ${provider} is unhealthy: ${health.error}`);
}
}
}
async function checkProviderHealth(provider: string) {
const start = Date.now();
try {
await testCompletion(provider);
return { ok: true, latency: Date.now() - start };
} catch (error) {
return { ok: false, error: error.message, latency: Date.now() - start };
}
}
Implementation Checklist
- Multiple AI providers configured with fallback order
- Circuit breakers on each provider
- Response caching with stale-while-revalidate
- Graceful degradation to templates/static responses
- Health monitoring with alerting
- Timeouts on all AI calls (10s recommended)
- Metrics logging for provider usage and failures
The goal isn't 100% AI uptimeโit's 100% system uptime. AI is a feature, not a dependency. When it fails, users should notice quality degradation, not system failure.
Related Articles
Need Help Building Resilient AI Systems?
We architect production AI systems that handle failure gracefully.
โ Get Architecture Help