ALL SYSTEMS OK
R2: STANDBY
99.97% UPTIME
WORKERS: 11/11 OPERATIONAL/// ANTHROPIC API: NOMINAL/// CIRCUIT BREAKERS: ALL CLOSED/// CACHE HIT RATE: 88.4%/// LAST INCIDENT: 47 DAYS AGO/// FALLBACK ACTIVATIONS (7D): 3/// P95 LATENCY: 11.8s/// WORKERS: 11/11 OPERATIONAL/// ANTHROPIC API: NOMINAL/// CIRCUIT BREAKERS: ALL CLOSED/// CACHE HIT RATE: 88.4%/// LAST INCIDENT: 47 DAYS AGO/// FALLBACK ACTIVATIONS (7D): 3/// P95 LATENCY: 11.8s
Resilience Architecture · Post #09 · Production Hardening

99.97%
Uptime On
An 11-Worker
AI System.

Fallback chains. Circuit breakers. Graceful degradation. Zero-downtime deploys. And the 429 storm that hit during peak traffic — and why nobody noticed.

// Worker Health — Real-Time

The happy path is easy to build. The Consilium's happy path — 11 workers, all healthy, Anthropic's API nominal, all caches warm — is documented in posts 1 through 8. This post is for the other path. When one agent fails. When the model rate-limits. When you're deploying at 2pm on a Tuesday and something goes wrong. Here's exactly what happens and how to make sure it doesn't matter.

Circuit Breakers: The Pattern That Saved Us

Without a circuit breaker, a failing API dependency kills your latency. Every request attempts the primary, waits for the timeout (typically 10–30 seconds), then falls back. At concurrent load, those timeouts stack. Your P95 goes from 11 seconds to 40 seconds. Your queue builds. Users leave.

With a circuit breaker, the first few failures trip the circuit to OPEN state. Subsequent requests skip the primary entirely and route directly to the fallback. No timeout wait. The system degrades in quality (Haiku instead of Sonnet) but not in responsiveness. When the cool-down period expires, the circuit moves to HALF-OPEN — one test request through. If it succeeds, the circuit closes. If it fails, it re-opens.

// Circuit Breaker State Machine — Live Simulation
shared/circuit-breaker.ts — full implementation
Production Code
type CBState = 'CLOSED' | 'OPEN' | 'HALF_OPEN';

interface CircuitBreakerConfig {
  failureThreshold: number;   // consecutive failures to trip — 3
  cooldownMs: number;         // wait before half-open test — 60_000
  successThreshold: number;   // successes to close from half-open — 2
  model: string;             // which model this breaker guards
}

class CircuitBreaker {
  private state: CBState = 'CLOSED';
  private failures = 0;
  private successes = 0;
  private openedAt = 0;

  async call<T>(fn: () => Promise<T>): Promise<T> {
    if (this.state === 'OPEN') {
      const elapsed = Date.now() - this.openedAt;
      if (elapsed < this.config.cooldownMs) {
        throw new Error('CIRCUIT_OPEN'); // immediate — no wait
      }
      this.state = 'HALF_OPEN';
    }

    try {
      const result = await fn();
      // Success path — increment or close circuit
      if (this.state === 'HALF_OPEN') {
        this.successes++;
        if (this.successes >= this.config.successThreshold) {
          this.close();
        }
      } else {
        this.failures = 0; // reset on success
      }
      return result;
    } catch (err) {
      // Failure path — count and potentially open
      this.failures++;
      if (this.failures >= this.config.failureThreshold) {
        this.state = 'OPEN';
        this.openedAt = Date.now();
      }
      throw err;
    }
  }

  private close() {
    this.state = 'CLOSED'; this.failures = 0; this.successes = 0;
  }
}

The Fallback Chain

// Multi-Model Fallback — Every Worker
1
PRIMARYclaude-sonnet-4-20250514
WHEN: Circuit CLOSED · normal operation
Latency: P50 ~380ms TTFT · Full epistemic fingerprint quality
2
FALLBACKclaude-haiku-4-5-20251001
WHEN: Primary returns 429/5xx/timeout OR circuit OPEN
Latency: P50 ~180ms TTFT · +200ms for failover handoff · Reduced nuance
3
LAST RESORTCached degraded response
WHEN: Both models fail · All Anthropic endpoints unreachable
Returns last successful response for this agent + staleness timestamp · logs DEGRADED_MODE
shared/stream-with-fallback.ts — the full fallback function
Production
const MODELS = [
  'claude-sonnet-4-20250514',
  'claude-haiku-4-5-20251001',
] as const;

const breakers = new Map(MODELS.map(m => [m, new CircuitBreaker({
  failureThreshold: 3,
  cooldownMs: 60_000,
  successThreshold: 2,
  model: m,
})]));

export async function streamWithFallback(payload: Payload, env: Env, agentId: string) {
  for (const model of MODELS) {
    const breaker = breakers.get(model)!;
    try {
      return await breaker.call(() => callAnthropic({ ...payload, model }, env));
    } catch (err: any) {
      const isFinal = model === MODELS[MODELS.length - 1];
      if (isFinal) break; // exhausted all models — fall through to cache
      // Log the failover — visible in observability dashboard
      logFailover({ agentId, from: model, reason: err.message }, env);
    }
  }

  // Last resort: return cached response with staleness flag
  const cached = await env.KV.get(`stale:${agentId}`);
  if (cached) {
    const { response, cachedAt } = JSON.parse(cached);
    return new Response(
      JSON.stringify({ ...response, _degraded: true, _cachedAt: cachedAt }),
      { headers: { 'X-Degraded-Mode': 'true' } }
    );
  }

  // Truly nothing left — fail explicitly with structured error
  throw new Error('ALL_MODELS_FAILED');
}

Four Degradation Tiers

Graceful degradation means every failure mode has a designed response — not an accident of how Promise.allSettled works. Four tiers, from minor to critical:

T11 Agent Down
T23+ Agents Down
T3No Orchestrator
T4All Models Fail
orchestrator/degradation.ts — tier detection and response
Tiers
function getDegradationTier(results: SettledResult[]): DegradationTier {
  const failed = results.filter(r => r.status === 'rejected').length;

  if (failed === 0) return 'T0_NOMINAL'; // all 10 responded

  if (failed === 1) return 'T1_MINOR';
  // → 9-agent synthesis, failedAgents: ['agentId'] in tension map
  // → synthesis prompt: "Do not draw conclusions from absent domains"

  if (failed <= 3) return 'T2_DEGRADED';
  // → synthesis with explicit coverage gaps in output
  // → client shows: "Analysis missing: [domain names]"
  // → Round 2 disabled for degraded queries

  if (failed <= 6) return 'T3_PARTIAL';
  // → orchestrator skips tension map entirely
  // → returns parallel agent summaries only, no synthesis
  // → client renders "Partial analysis — full synthesis unavailable"

  return 'T4_CRITICAL';
  // → Queued response with ETA from KV
  // → Slack alert fires
  // → PagerDuty webhook (if configured)
}

// T4 Slack alert — fires in under 500ms of detection
async function alertT4(error: string, env: Env) {
  await fetch(env.SLACK_WEBHOOK, {
    method: 'POST',
    body: JSON.stringify({ text: `🔴 T4 CRITICAL: ${error} — Consilium degraded` })
  });
}
Post-Incident Report #003 · Duration: 11 minutes
The 429 Storm That Nobody Noticed
14:32:04
Concurrent user spike begins. 14 simultaneous queries hit the orchestrator — normal peak handling.
14:32:11
First 429s appear. Vasquez, Chen, and Webb all return 429 simultaneously — Anthropic's rate limit hit by the fan-out.
14:32:12
Circuit breakers trip to OPEN. All three agents' Sonnet breakers open. Haiku fallback activates instantly — no timeout wait.
14:32:13
7 remaining agents proceed on Sonnet. Fallback agents proceed on Haiku. Queries complete normally. Users see no error. Latency impact: +340ms average.
14:33:00
Rate limit pressure eases. Sonnet rate limit window resets.
14:33:12
Cool-down expires. Circuits move to HALF-OPEN. Test requests on Sonnet — all succeed.
14:43:18
All circuits CLOSED. Full Sonnet operation restored. Zero user-facing errors. Incident invisible to users.
Why allSettled + Circuit Breakers = No Downtime

Two patterns working together. Promise.allSettled() means a failed agent can't block the others — the orchestrator processes whatever completed. Circuit breakers mean a failing model doesn't stall every request with a timeout — the circuit trips, subsequent requests route to fallback immediately. Without either pattern, 14 concurrent queries × 30-second timeout × 3 failing agents = 1,260 cumulative seconds of stalled requests. With both patterns, impact is +340ms average latency for 11 minutes.

Zero-Downtime Deploys

Deploying to 11 Workers simultaneously creates a window where the orchestrator expects an API contract that some agents haven't yet implemented. The safe deploy order: agents first, orchestrator last. Every agent Worker is deployed and verified before the orchestrator deploy begins.

// Deploy Sequence — 11 Workers, 0 Downtime
scripts/deploy-all.sh — production deploy script
Deploy Script
#!/bin/bash
# deploy-all.sh — deploys 11 Workers in safe order
# Run: ./scripts/deploy-all.sh [--env production]

AGENTS=("vasquez" "webb" "chen" "okafor" "mitchell"
        "nakamura" "diallo" "harlow" "deleon" "cross")

# Phase 1: Deploy all 10 agent workers
# Orchestrator NOT deployed yet — old orchestrator still calls old agents fine
for agent in "${AGENTS[@]}"; do
  echo "Deploying $agent..."
  wrangler deploy --config workers/$agent/wrangler.toml
  if [ $? -ne 0 ]; then
    echo "DEPLOY FAILED: $agent — aborting. Orchestrator not touched."
    exit 1
  fi
  sleep 2 # brief settle window per agent
done

# Verify all agents responding before orchestrator deploy
echo "Running agent health checks..."
./scripts/health-check-agents.sh
if [ $? -ne 0 ]; then
  echo "HEALTH CHECK FAILED — orchestrator deploy blocked"
  exit 1
fi

# Phase 2: Deploy orchestrator last
# All agents already on new version — orchestrator deploy is safe
echo "Deploying orchestrator..."
wrangler deploy --config workers/orchestrator/wrangler.toml

echo "Deploy complete. 11/11 workers updated. Zero downtime."
The Health Check That Blocks Bad Deploys

The health check script sends a minimal probe request to each agent Worker — not a full query, just a GET /health that returns the Worker's version hash and circuit state. If any agent returns 500 or times out after its deploy, the orchestrator deploy is blocked. This catches the ~5% of deploys where a Worker builds successfully but fails at runtime — a missing env var, a broken import, a KV binding that wasn't updated. Catching it before the orchestrator deploy means the old orchestrator keeps running against the old working agents.

Frequently Asked
What is a circuit breaker pattern in AI systems? +
A circuit breaker monitors failure rate for a dependency and trips to "open" state when failures exceed a threshold, short-circuiting calls to that dependency for a cool-down period rather than letting each call fail with a timeout. For AI Workers, a circuit breaker tracks consecutive errors per model. When the circuit trips, it immediately returns CIRCUIT_OPEN — the caller routes to the fallback without waiting. After the cool-down, the circuit moves to HALF-OPEN: one test request through. Two successes close it. One failure re-opens it.
How does multi-model fallback work in a Cloudflare Worker? +
Each Worker maintains a priority-ordered model list: Sonnet (primary), Haiku (fallback), cached stale response (last resort). On each call it tries the primary. If the primary returns 429, 5xx, or times out, it retries on the fallback. The circuit breaker sits above this — if the primary has tripped its circuit, the Worker skips straight to the fallback without attempting the primary. Failover adds ~200ms latency (a new Anthropic request) but keeps the system running at slightly reduced quality.
How do you deploy 11 Workers without downtime? +
Deploy agents first, orchestrator last. Cloudflare deploys are atomic per Worker — traffic switches instantly to the new version. Deploying the orchestrator before agents creates a window where the orchestrator expects API changes the agents haven't shipped yet. Deploying agents first, verifying they're healthy via health checks, then deploying the orchestrator means there's never a moment where the orchestrator is calling a contract that doesn't exist. If any agent health check fails, the orchestrator deploy is blocked.
What happens when one agent Worker fails mid-query? +
The orchestrator uses Promise.allSettled() not Promise.all(). A failed agent returns a rejected promise — the orchestrator catches it, marks it in the tension map's failedAgents field, and proceeds with the other nine responses. The synthesis prompt explicitly instructs the orchestrator not to draw conclusions from absent domains. One agent failure = T1 tier, 9-agent synthesis. Two to three failures = T2 degraded. More than six = T3, synthesis skipped, parallel summaries only.
How did the 429 storm happen and what stopped it? +
14 concurrent queries caused all 10 agent Workers to simultaneously hit Anthropic's rate limit when their fan-out calls all fired within the same rate window. Without circuit breakers, each request would wait 30 seconds for the timeout before falling back — 14 × 30s × 3 failing agents = 1,260 seconds of stalled requests. With circuit breakers, the first three failures tripped the circuits to OPEN. Subsequent requests routed to Haiku instantly. User-visible impact: +340ms average latency for 11 minutes. Zero errors. Zero queue buildup.
// More From PropTechUSA.ai
// Circuit breakers armed. Fallbacks ready. 11/11 operational.
The System Stays Up.

Ask the Consilium something hard. The resilience architecture you just read about is running behind every response.

Open The Consilium