The happy path is easy to build. The Consilium's happy path — 11 workers, all healthy, Anthropic's API nominal, all caches warm — is documented in posts 1 through 8. This post is for the other path. When one agent fails. When the model rate-limits. When you're deploying at 2pm on a Tuesday and something goes wrong. Here's exactly what happens and how to make sure it doesn't matter.
Circuit Breakers: The Pattern That Saved Us
Without a circuit breaker, a failing API dependency kills your latency. Every request attempts the primary, waits for the timeout (typically 10–30 seconds), then falls back. At concurrent load, those timeouts stack. Your P95 goes from 11 seconds to 40 seconds. Your queue builds. Users leave.
With a circuit breaker, the first few failures trip the circuit to OPEN state. Subsequent requests skip the primary entirely and route directly to the fallback. No timeout wait. The system degrades in quality (Haiku instead of Sonnet) but not in responsiveness. When the cool-down period expires, the circuit moves to HALF-OPEN — one test request through. If it succeeds, the circuit closes. If it fails, it re-opens.
type CBState = 'CLOSED' | 'OPEN' | 'HALF_OPEN'; interface CircuitBreakerConfig { failureThreshold: number; // consecutive failures to trip — 3 cooldownMs: number; // wait before half-open test — 60_000 successThreshold: number; // successes to close from half-open — 2 model: string; // which model this breaker guards } class CircuitBreaker { private state: CBState = 'CLOSED'; private failures = 0; private successes = 0; private openedAt = 0; async call<T>(fn: () => Promise<T>): Promise<T> { if (this.state === 'OPEN') { const elapsed = Date.now() - this.openedAt; if (elapsed < this.config.cooldownMs) { throw new Error('CIRCUIT_OPEN'); // immediate — no wait } this.state = 'HALF_OPEN'; } try { const result = await fn(); // Success path — increment or close circuit if (this.state === 'HALF_OPEN') { this.successes++; if (this.successes >= this.config.successThreshold) { this.close(); } } else { this.failures = 0; // reset on success } return result; } catch (err) { // Failure path — count and potentially open this.failures++; if (this.failures >= this.config.failureThreshold) { this.state = 'OPEN'; this.openedAt = Date.now(); } throw err; } } private close() { this.state = 'CLOSED'; this.failures = 0; this.successes = 0; } }
The Fallback Chain
const MODELS = [ 'claude-sonnet-4-20250514', 'claude-haiku-4-5-20251001', ] as const; const breakers = new Map(MODELS.map(m => [m, new CircuitBreaker({ failureThreshold: 3, cooldownMs: 60_000, successThreshold: 2, model: m, })])); export async function streamWithFallback(payload: Payload, env: Env, agentId: string) { for (const model of MODELS) { const breaker = breakers.get(model)!; try { return await breaker.call(() => callAnthropic({ ...payload, model }, env)); } catch (err: any) { const isFinal = model === MODELS[MODELS.length - 1]; if (isFinal) break; // exhausted all models — fall through to cache // Log the failover — visible in observability dashboard logFailover({ agentId, from: model, reason: err.message }, env); } } // Last resort: return cached response with staleness flag const cached = await env.KV.get(`stale:${agentId}`); if (cached) { const { response, cachedAt } = JSON.parse(cached); return new Response( JSON.stringify({ ...response, _degraded: true, _cachedAt: cachedAt }), { headers: { 'X-Degraded-Mode': 'true' } } ); } // Truly nothing left — fail explicitly with structured error throw new Error('ALL_MODELS_FAILED'); }
Four Degradation Tiers
Graceful degradation means every failure mode has a designed response — not an accident of how Promise.allSettled works. Four tiers, from minor to critical:
function getDegradationTier(results: SettledResult[]): DegradationTier { const failed = results.filter(r => r.status === 'rejected').length; if (failed === 0) return 'T0_NOMINAL'; // all 10 responded if (failed === 1) return 'T1_MINOR'; // → 9-agent synthesis, failedAgents: ['agentId'] in tension map // → synthesis prompt: "Do not draw conclusions from absent domains" if (failed <= 3) return 'T2_DEGRADED'; // → synthesis with explicit coverage gaps in output // → client shows: "Analysis missing: [domain names]" // → Round 2 disabled for degraded queries if (failed <= 6) return 'T3_PARTIAL'; // → orchestrator skips tension map entirely // → returns parallel agent summaries only, no synthesis // → client renders "Partial analysis — full synthesis unavailable" return 'T4_CRITICAL'; // → Queued response with ETA from KV // → Slack alert fires // → PagerDuty webhook (if configured) } // T4 Slack alert — fires in under 500ms of detection async function alertT4(error: string, env: Env) { await fetch(env.SLACK_WEBHOOK, { method: 'POST', body: JSON.stringify({ text: `🔴 T4 CRITICAL: ${error} — Consilium degraded` }) }); }
Two patterns working together. Promise.allSettled() means a failed agent can't block the others — the orchestrator processes whatever completed. Circuit breakers mean a failing model doesn't stall every request with a timeout — the circuit trips, subsequent requests route to fallback immediately. Without either pattern, 14 concurrent queries × 30-second timeout × 3 failing agents = 1,260 cumulative seconds of stalled requests. With both patterns, impact is +340ms average latency for 11 minutes.
Zero-Downtime Deploys
Deploying to 11 Workers simultaneously creates a window where the orchestrator expects an API contract that some agents haven't yet implemented. The safe deploy order: agents first, orchestrator last. Every agent Worker is deployed and verified before the orchestrator deploy begins.
#!/bin/bash # deploy-all.sh — deploys 11 Workers in safe order # Run: ./scripts/deploy-all.sh [--env production] AGENTS=("vasquez" "webb" "chen" "okafor" "mitchell" "nakamura" "diallo" "harlow" "deleon" "cross") # Phase 1: Deploy all 10 agent workers # Orchestrator NOT deployed yet — old orchestrator still calls old agents fine for agent in "${AGENTS[@]}"; do echo "Deploying $agent..." wrangler deploy --config workers/$agent/wrangler.toml if [ $? -ne 0 ]; then echo "DEPLOY FAILED: $agent — aborting. Orchestrator not touched." exit 1 fi sleep 2 # brief settle window per agent done # Verify all agents responding before orchestrator deploy echo "Running agent health checks..." ./scripts/health-check-agents.sh if [ $? -ne 0 ]; then echo "HEALTH CHECK FAILED — orchestrator deploy blocked" exit 1 fi # Phase 2: Deploy orchestrator last # All agents already on new version — orchestrator deploy is safe echo "Deploying orchestrator..." wrangler deploy --config workers/orchestrator/wrangler.toml echo "Deploy complete. 11/11 workers updated. Zero downtime."
The health check script sends a minimal probe request to each agent Worker — not a full query, just a GET /health that returns the Worker's version hash and circuit state. If any agent returns 500 or times out after its deploy, the orchestrator deploy is blocked. This catches the ~5% of deploys where a Worker builds successfully but fails at runtime — a missing env var, a broken import, a KV binding that wasn't updated. Catching it before the orchestrator deploy means the old orchestrator keeps running against the old working agents.
Promise.allSettled() not Promise.all(). A failed agent returns a rejected promise — the orchestrator catches it, marks it in the tension map's failedAgents field, and proceeds with the other nine responses. The synthesis prompt explicitly instructs the orchestrator not to draw conclusions from absent domains. One agent failure = T1 tier, 9-agent synthesis. Two to three failures = T2 degraded. More than six = T3, synthesis skipped, parallel summaries only.Ask the Consilium something hard. The resilience architecture you just read about is running behind every response.
Open The Consilium