Every production system is a black box until you trace it. This is the full trace of a real Consilium query — the one that asked whether a specific off-market duplex represented sound investment in a tightening credit environment. Here is everything that happened in the 11.3 seconds before the answer appeared.
The Waterfall
Every row in that waterfall is a real operation. Nothing is padded. The long bars are the Anthropic API streaming calls — that's not overhead, that's the model generating tokens. The short bars at the start are Cloudflare infrastructure — routing, KV reads, auth checks. Under 50ms combined for all of them.
The 11 Hops
// T+0ms: Browser → Consilium Edge Worker (CF PoP: Dallas, TX) const query = { message: "Is the off-market duplex at 14th and Monroe sound in tightening credit?", queryId: "q_8f3a2b1c", sessionId: "s_aa9f1e", }; // T+4ms: CORS check + shared secret auth — 2ms each const authOk = req.headers.get('Authorization') === `Bearer ${env.CONSILIUM_SECRET}`; // T+8ms: Rate limit check — KV read #1 const rateKey = `rate:${ip}:${Math.floor(Date.now()/60000)}`; const count = await env.KV.get(rateKey); // ~4ms KV read // T+12ms: Session context load — KV read #2 const ctx = await env.KV.get(`ctx:${sessionId}`); // ~4ms KV read // T+16ms: Fan-out begins — all 10 agent calls fire simultaneously // None of these awaits — they're all in-flight at the same time const agentCalls = AGENTS.map(id => callWorker(`https://${id}.consilium.workers.dev`, { method: 'POST', headers: { 'Authorization': `Bearer ${env.INTERNAL_KEY}` }, body: JSON.stringify({ message, queryId }), }) ); // T+16ms→8,400ms: Agents stream individually as they complete // Client renders each agent's summary as it arrives — not waiting for all 10 for (const result of agentStreams) { emit('agent_complete', { agentId: result.id, summary: result.summary }); } // T+8,400ms: Last agent completes — orchestrator phase begins // T+8,450ms: Clash detection (Claude call #11a — 400ms) // T+8,850ms: Round 2 triggered (Vasquez + Okafor re-queried) emit('round2_triggered', { agents: ['vasquez', 'okafor'], tensionId: 't_001', severity: 8, }); // T+8,850ms→10,200ms: Round 2 agent calls (2 workers, not 10) // T+10,200ms: Synthesis begins streaming to client // T+11,340ms: tension_map event fired — complete JSON emit('tension_map', tensionMap); // validated TensionMap object // T+11,340ms: Stream closes. Query complete.
Zero Coldstarts. Here's Why.
The trace shows 0 cold starts across 11 Workers. This is not luck — it's a direct consequence of the microservices architecture. Individual Workers are small (under 15KB each) and at the traffic volumes the Consilium runs, each Worker is warm in Cloudflare's edge network before any query arrives. The monolith would have cold-started the entire 400KB bundle on each session; individual Workers warm and stay warm independently.
That last number is the critical one: 98.6% of end-to-end latency is the Anthropic API generating tokens. 1.4% is everything else — routing, auth, KV reads, Worker initialization, SSE framing, orchestration logic. The infrastructure overhead on a production 11-Worker microservices architecture is under 160ms. The bottleneck is always and only the models.
The Cost Breakdown
How $0.041 decomposes across 11 calls, with and without prompt caching:
| Agent | Input Tokens | Cache Read | Output Tokens | Cache Status | Cost |
|---|---|---|---|---|---|
| Vasquez | 142 | 2,240 | 380 | CACHED | $0.0034 |
| Webb | 156 | 2,190 | 410 | CACHED | $0.0037 |
| Chen | 138 | 2,280 | 360 | CACHED | $0.0032 |
| Okafor | 148 | 2,210 | 395 | CACHED | $0.0035 |
| Mitchell | 162 | 2,260 | 425 | CACHED | $0.0038 |
| Nakamura | 144 | 2,195 | 375 | CACHED | $0.0033 |
| Diallo | 158 | 2,220 | 405 | CACHED | $0.0036 |
| Harlow | 166 | 2,175 | 430 | CACHED | $0.0039 |
| DeLeon | 172 | 2,250 | 445 | CACHED | $0.0040 |
| Cross | 140 | 2,235 | 370 | CACHED | $0.0033 |
| Orchestrator | 4,820 | 0 | 680 | DYNAMIC | $0.0248 |
| TOTAL | 6,346 | 22,255 | 4,675 | — | $0.0405 |
The orchestrator's system prompt is dynamic — it includes all 10 agent responses as context, which changes on every call. That means no prompt caching applies: every orchestrator call pays full input cost on 4,820+ tokens. This is structural and unavoidable. The mitigation is keeping agent responses concise (structured summaries, not full essays) and using the clash detection pass to front-load the synthesis — the orchestrator only runs on the content it actually needs. The 10 agent calls combined cost $0.0357 with caching. Without caching they'd cost ~$0.28. The 88% aggregate cache hit rate is doing most of the cost work.
The SSE Event Stream
847 SSE chunks across one query. Here's the event taxonomy — every event type the client receives and what triggers it:
// Phase 1 events — fire as each agent completes (T+300ms → T+8.4s) 'agent_stream_start' // agent began streaming — show loading state 'agent_chunk' // text delta from agent's stream (most frequent) 'agent_complete' // agent finished — render summary card // Orchestrator phase events — fire after all 10 complete 'orchestrating' // "Mapping tensions..." loading indicator 'round2_triggered' // shows which 2 agents are re-querying and why 'round2_complete' // round 2 responses received 'synthesis_chunk' // streaming synthesis text delta // Terminal events — one of these always closes the stream 'tension_map' // complete validated TensionMap JSON 'stream_complete' // all done — dismiss loading states 'error' // { code, retry, agentId? } — client handles // Event volume breakdown for this query: // agent_chunk: 782 events (92.3% of total) // agent_complete: 10 events // synthesis_chunk: 48 events // round2_triggered: 1 event // tension_map: 1 event // orchestrating: 1 event // stream_complete: 1 event // Other: 3 events // ───────────────────────────── // Total: 847 events
Most people assume the Cloudflare routing layer adds meaningful latency. It adds 4–8ms total — round-trip included — because Workers run at the PoP closest to the user, not in a central data center. For a user in Dallas, the Consilium Worker runs in Dallas. The 11 agent Workers are called via Cloudflare Service Bindings — Worker-to-Worker calls that never leave Cloudflare's network. No DNS resolution, no TLS handshake, no public internet hops between the orchestrator and agents. The only calls that leave the network are the 11 Anthropic API calls, which go to Anthropic's servers from Cloudflare's edge.
Promise.allSettled() not Promise.all(). If one agent throws or times out, the other nine complete and the orchestrator proceeds with the responses it received. The tension map includes a failedAgents field that lists any agents that didn't respond. The synthesis prompt is aware of this and explicitly instructs the orchestrator not to draw conclusions from absent agent domains. A single agent failure degrades the output but doesn't break the query.Ask the Consilium something hard. Every millisecond you just read about is happening in real time.
Open The Consilium