How do circuit breakers help with AI APIs?

Circuit breakers stop calling a failing service after repeated failures. Instead of waiting for timeouts, you fail fast and skip to healthy providers, reducing response time during outages.

Should you cache AI responses?

Yes, for repetitive queries. Hash the prompt to create cache keys and store responses in KV. Include stale-while-revalidate logic for graceful degradation during outages.

Resilience AI Systems

Designing for Model Failure:
AI System Resilience Patterns

Q: What happens when Claude or OpenAI goes down?

Without resilience patterns, your app fails. With fallback chains, you automatically try backup providers (GPT-4, Gemini) or return cached/static responses.

What happens when Claude or OpenAI goes down? Production patterns for fallback chains, graceful degradation, and keeping your AI-powered system running.

📖 12 min read January 24, 2026

AI APIs fail. Claude has outages. OpenAI rate limits. Anthropic has maintenance windows. If your production system assumes 100% availability, you're building a time bomb.

This is the resilience architecture running in production, handling AI failures gracefully without customer impact.

The Failure Modes

Before building resilience, understand what can go wrong:

🚫

Complete Outage

API returns 500/503

⏱️

Rate Limiting

429 Too Many Requests

🐌

Degraded Performance

Responses taking 30s+

Pattern 1: The Fallback Chain

Never depend on a single AI provider. Build a chain of fallbacks:

AI Provider Fallback Chain

🟢

Claude

Primary

→

🟡

GPT-4

Fallback 1

→

🟡

Gemini

Fallback 2

→

🔴

Cached/Static

Last Resort

ai-fallback.ts

interface AIProvider {
  name: string;
  complete: (prompt: string) => Promise<string>;
  isHealthy: () => Promise<boolean>;
}

const providers: AIProvider[] = [
  { name: 'claude', complete: claudeComplete, isHealthy: claudeHealth },
  { name: 'openai', complete: openaiComplete, isHealthy: openaiHealth },
  { name: 'gemini', complete: geminiComplete, isHealthy: geminiHealth },
];

export async function aiComplete(prompt: string): Promise<string> {
  for (const provider of providers) {
    try {
      // Skip unhealthy providers
      if (!await provider.isHealthy()) {
        console.log(`Skipping ${provider.name}: unhealthy`);
        continue;
      }
      
      const result = await withTimeout(
        provider.complete(prompt),
        10000 // 10 second timeout
      );
      
      // Log which provider succeeded
      await logProviderUsage(provider.name, 'success');
      return result;
      
    } catch (error) {
      console.error(`${provider.name} failed:`, error);
      await logProviderUsage(provider.name, 'failure');
      // Continue to next provider
    }
  }
  
  // All providers failed - return cached/static response
  return getCachedResponse(prompt);
}
                

Pattern 2: Circuit Breaker

Don't keep hammering a failing service. Implement circuit breakers that "trip" after repeated failures:

circuit-breaker.ts

type CircuitState = 'closed' | 'open' | 'half-open';

class CircuitBreaker {
  private state: CircuitState = 'closed';
  private failures = 0;
  private lastFailure = 0;
  private threshold = 5;        // Open after 5 failures
  private timeout = 30000;     // Try again after 30s
  
  async call<T>(fn: () => Promise<T>): Promise<T> {
    // If open, check if we should try again
    if (this.state === 'open') {
      if (Date.now() - this.lastFailure > this.timeout) {
        this.state = 'half-open';
      } else {
        throw new Error('Circuit breaker is open');
      }
    }
    
    try {
      const result = await fn();
      this.onSuccess();
      return result;
    } catch (error) {
      this.onFailure();
      throw error;
    }
  }
  
  private onSuccess() {
    this.failures = 0;
    this.state = 'closed';
  }
  
  private onFailure() {
    this.failures++;
    this.lastFailure = Date.now();
    if (this.failures >= this.threshold) {
      this.state = 'open';
    }
  }
}
                

Why Circuit Breakers Matter

Without circuit breakers, a failing provider consumes your timeout budget on every request. With circuit breakers, you fail fast and skip to healthy providers immediately. Response time during outages drops from 10+ seconds to milliseconds.

Pattern 3: Response Caching

Many AI requests are repetitive. Cache responses to reduce API calls and provide instant fallbacks:

ai-cache.ts

async function cachedAIComplete(
  prompt: string,
  env: Env
): Promise<string> {
  // Generate cache key from prompt hash
  const cacheKey = `ai:${await hashPrompt(prompt)}`;
  
  // Check cache first
  const cached = await env.KV.get(cacheKey);
  if (cached) {
    await logMetric('ai_cache_hit');
    return cached;
  }
  
  // Cache miss - call AI
  try {
    const response = await aiComplete(prompt);
    
    // Cache for 1 hour
    await env.KV.put(cacheKey, response, {
      expirationTtl: 3600
    });
    
    return response;
  } catch (error) {
    // If AI fails, try to return stale cache
    const stale = await env.KV.get(cacheKey + ':stale');
    if (stale) {
      await logMetric('ai_stale_cache_used');
      return stale;
    }
    throw error;
  }
}

// Hash prompt for consistent cache keys
async function hashPrompt(prompt: string): Promise<string> {
  const encoder = new TextEncoder();
  const data = encoder.encode(prompt);
  const hash = await crypto.subtle.digest('SHA-256', data);
  return btoa(String.fromCharCode(...new Uint8Array(hash))).slice(0, 16);
}
                

Pattern 4: Graceful Degradation

When AI fails, don't crash. Degrade gracefully to a simpler experience:

graceful-degradation.ts

interface ChatResponse {
  message: string;
  source: 'ai' | 'template' | 'fallback';
  degraded: boolean;
}

async function handleChat(userMessage: string): Promise<ChatResponse> {
  // Try AI first
  try {
    const aiResponse = await aiComplete(userMessage);
    return { message: aiResponse, source: 'ai', degraded: false };
  } catch (error) {
    console.error('AI failed, trying templates');
  }
  
  // Try template matching
  const template = matchTemplate(userMessage);
  if (template) {
    return { message: template, source: 'template', degraded: true };
  }
  
  // Last resort: generic fallback
  return {
    message: `Thanks for your message. Our AI assistant is temporarily unavailable. Please call us at 1-888-784-3881 or try again in a few minutes.`,
    source: 'fallback',
    degraded: true
  };
}

// Pre-defined templates for common questions
function matchTemplate(message: string): string | null {
  const templates: Record<string, string> = {
    'pricing': 'Our services start at $999. Visit /pricing for details.',
    'contact': 'You can reach us at 1-888-784-3881 or /contact.',
    'hours': 'We respond to inquiries 24/7 via our AI system.',
  };
  
  for (const [keyword, response] of Object.entries(templates)) {
    if (message.toLowerCase().includes(keyword)) {
      return response;
    }
  }
  return null;
}
                

Pattern 5: Health Monitoring

Proactively monitor provider health instead of discovering failures on user requests:

health-monitor.ts

// Run every minute via Cron Trigger
export async function scheduled(event: ScheduledEvent, env: Env) {
  const providers = ['claude', 'openai', 'gemini'];
  
  for (const provider of providers) {
    const health = await checkProviderHealth(provider);
    
    // Store health status in KV
    await env.KV.put(`health:${provider}`, JSON.stringify({
      healthy: health.ok,
      latency: health.latency,
      checkedAt: Date.now()
    }), { expirationTtl: 300 });
    
    // Alert if provider is down
    if (!health.ok) {
      await sendSlackAlert(`⚠️ ${provider} is unhealthy: ${health.error}`);
    }
  }
}

async function checkProviderHealth(provider: string) {
  const start = Date.now();
  try {
    await testCompletion(provider);
    return { ok: true, latency: Date.now() - start };
  } catch (error) {
    return { ok: false, error: error.message, latency: Date.now() - start };
  }
}
                

Implementation Checklist

Multiple AI providers configured with fallback order
Circuit breakers on each provider
Response caching with stale-while-revalidate
Graceful degradation to templates/static responses
Health monitoring with alerting
Timeouts on all AI calls (10s recommended)
Metrics logging for provider usage and failures

The goal isn't 100% AI uptime—it's 100% system uptime. AI is a feature, not a dependency. When it fails, users should notice quality degradation, not system failure.

Building AI Chatbots That Actually Convert

Real-Time Data Pipelines at the Edge

API Gateway Patterns at the Edge

Need Help Building Resilient AI Systems?

We architect production AI systems that handle failure gracefully.

→ Get Architecture Help