engineering / full-stack-query-trace
LIVE TRACE
Engineering Deep Dive · Request Trace · Post #07

The Full Stack
of a $0.041
Query.
Every Millisecond.

You type a question. Here is every hop, every cache read, every API call, every SSE chunk, and every orchestration step — across 11 AI workers — from your browser back to your screen. Annotated, traced, and priced.

$0.041
per query
11 workers · 10 agents · 1 orchestrator 88% cache hit rate · 11.3s end-to-end
Workers Invoked
11
KV Reads
4
Anthropic API Calls
11
SSE Chunks Streamed
847
Round 2 Triggered
YES
Coldstart Count
0

Every production system is a black box until you trace it. This is the full trace of a real Consilium query — the one that asked whether a specific off-market duplex represented sound investment in a tightening credit environment. Here is everything that happened in the 11.3 seconds before the answer appeared.

The Waterfall

// Request Waterfall — Full Trace Total: 11,340ms · $0.041
0ms 2s 4s 6s 8s 10s 11.3s

Every row in that waterfall is a real operation. Nothing is padded. The long bars are the Anthropic API streaming calls — that's not overhead, that's the model generating tokens. The short bars at the start are Cloudflare infrastructure — routing, KV reads, auth checks. Under 50ms combined for all of them.

The 11 Hops

// Network Topology — One Query's Journey
trace/hop-sequence.ts — annotated full call stack
Full Trace
// T+0ms: Browser → Consilium Edge Worker (CF PoP: Dallas, TX)
const query = {
  message: "Is the off-market duplex at 14th and Monroe sound in tightening credit?",
  queryId: "q_8f3a2b1c",
  sessionId: "s_aa9f1e",
};

// T+4ms: CORS check + shared secret auth — 2ms each
const authOk = req.headers.get('Authorization') === `Bearer ${env.CONSILIUM_SECRET}`;

// T+8ms: Rate limit check — KV read #1
const rateKey = `rate:${ip}:${Math.floor(Date.now()/60000)}`;
const count = await env.KV.get(rateKey); // ~4ms KV read

// T+12ms: Session context load — KV read #2
const ctx = await env.KV.get(`ctx:${sessionId}`); // ~4ms KV read

// T+16ms: Fan-out begins — all 10 agent calls fire simultaneously
// None of these awaits — they're all in-flight at the same time
const agentCalls = AGENTS.map(id =>
  callWorker(`https://${id}.consilium.workers.dev`, {
    method: 'POST',
    headers: { 'Authorization': `Bearer ${env.INTERNAL_KEY}` },
    body: JSON.stringify({ message, queryId }),
  })
);

// T+16ms→8,400ms: Agents stream individually as they complete
// Client renders each agent's summary as it arrives — not waiting for all 10
for (const result of agentStreams) {
  emit('agent_complete', { agentId: result.id, summary: result.summary });
}

// T+8,400ms: Last agent completes — orchestrator phase begins
// T+8,450ms: Clash detection (Claude call #11a — 400ms)
// T+8,850ms: Round 2 triggered (Vasquez + Okafor re-queried)
emit('round2_triggered', {
  agents: ['vasquez', 'okafor'],
  tensionId: 't_001',
  severity: 8,
});

// T+8,850ms→10,200ms: Round 2 agent calls (2 workers, not 10)
// T+10,200ms: Synthesis begins streaming to client
// T+11,340ms: tension_map event fired — complete JSON
emit('tension_map', tensionMap); // validated TensionMap object
// T+11,340ms: Stream closes. Query complete.

Zero Coldstarts. Here's Why.

The trace shows 0 cold starts across 11 Workers. This is not luck — it's a direct consequence of the microservices architecture. Individual Workers are small (under 15KB each) and at the traffic volumes the Consilium runs, each Worker is warm in Cloudflare's edge network before any query arrives. The monolith would have cold-started the entire 400KB bundle on each session; individual Workers warm and stay warm independently.

0msCold Start Overhead
14KBAvg Worker Bundle
4msAvg KV Read
16msAll Infra Overhead
11.3sTotal End-to-End
98.6%Time Is The Models

That last number is the critical one: 98.6% of end-to-end latency is the Anthropic API generating tokens. 1.4% is everything else — routing, auth, KV reads, Worker initialization, SSE framing, orchestration logic. The infrastructure overhead on a production 11-Worker microservices architecture is under 160ms. The bottleneck is always and only the models.

The Cost Breakdown

How $0.041 decomposes across 11 calls, with and without prompt caching:

Agent Input Tokens Cache Read Output Tokens Cache Status Cost
Vasquez1422,240380CACHED$0.0034
Webb1562,190410CACHED$0.0037
Chen1382,280360CACHED$0.0032
Okafor1482,210395CACHED$0.0035
Mitchell1622,260425CACHED$0.0038
Nakamura1442,195375CACHED$0.0033
Diallo1582,220405CACHED$0.0036
Harlow1662,175430CACHED$0.0039
DeLeon1722,250445CACHED$0.0040
Cross1402,235370CACHED$0.0033
Orchestrator4,8200680DYNAMIC$0.0248
TOTAL6,34622,2554,675$0.0405
Why the Orchestrator Costs 60% of the Total

The orchestrator's system prompt is dynamic — it includes all 10 agent responses as context, which changes on every call. That means no prompt caching applies: every orchestrator call pays full input cost on 4,820+ tokens. This is structural and unavoidable. The mitigation is keeping agent responses concise (structured summaries, not full essays) and using the clash detection pass to front-load the synthesis — the orchestrator only runs on the content it actually needs. The 10 agent calls combined cost $0.0357 with caching. Without caching they'd cost ~$0.28. The 88% aggregate cache hit rate is doing most of the cost work.

The SSE Event Stream

847 SSE chunks across one query. Here's the event taxonomy — every event type the client receives and what triggers it:

client/sse-events.ts — full event taxonomy
Event Types
// Phase 1 events — fire as each agent completes (T+300ms → T+8.4s)

'agent_stream_start'  // agent began streaming — show loading state
'agent_chunk'         // text delta from agent's stream (most frequent)
'agent_complete'      // agent finished — render summary card

// Orchestrator phase events — fire after all 10 complete

'orchestrating'       // "Mapping tensions..." loading indicator
'round2_triggered'    // shows which 2 agents are re-querying and why
'round2_complete'     // round 2 responses received
'synthesis_chunk'     // streaming synthesis text delta

// Terminal events — one of these always closes the stream

'tension_map'         // complete validated TensionMap JSON
'stream_complete'     // all done — dismiss loading states
'error'               // { code, retry, agentId? } — client handles

// Event volume breakdown for this query:
// agent_chunk:       782 events (92.3% of total)
// agent_complete:    10 events
// synthesis_chunk:   48 events
// round2_triggered:  1 event
// tension_map:       1 event
// orchestrating:     1 event
// stream_complete:   1 event
// Other:             3 events
// ─────────────────────────────
// Total:             847 events
// The Full Stack — Layer by Layer
The Layer That Surprises Everyone

Most people assume the Cloudflare routing layer adds meaningful latency. It adds 4–8ms total — round-trip included — because Workers run at the PoP closest to the user, not in a central data center. For a user in Dallas, the Consilium Worker runs in Dallas. The 11 agent Workers are called via Cloudflare Service Bindings — Worker-to-Worker calls that never leave Cloudflare's network. No DNS resolution, no TLS handshake, no public internet hops between the orchestrator and agents. The only calls that leave the network are the 11 Anthropic API calls, which go to Anthropic's servers from Cloudflare's edge.

Frequently Asked
Why does the total query cost $0.041 when individual agent calls seem cheap? +
The orchestrator alone costs $0.025 — 60% of the total — because it processes all 10 agent responses as dynamic context on every call (no caching possible). The 10 agent calls combined cost $0.016 with 88% cache hit rate. Without prompt caching across all agents, the same query would cost approximately $0.31. The $0.041 figure is what caching actually buys you: a 7.5× cost reduction on the agents, with the orchestrator remaining the unavoidable cost center.
Why does the user see content at 300ms if all 10 agents take up to 8.4 seconds? +
Two-phase streaming. Phase 1: all 10 agent calls fire simultaneously at T+16ms. Each agent streams its response as it completes — the first agent typically finishes around T+300ms, and the client renders that summary immediately. The user sees content building progressively rather than waiting for all 10 to finish. Phase 2 begins at T+8.4s when the last agent completes, triggering clash detection and synthesis. Total perceived latency is dominated by time-to-first-content (~300ms), not time-to-complete (~11s).
What are Cloudflare Service Bindings and why do they matter for this architecture? +
Service Bindings allow one Cloudflare Worker to call another Worker directly without going over the public internet — no DNS, no TLS handshake, no egress cost. For the Consilium, the orchestrator calls all 10 agent Workers via Service Bindings. This means 10 internal API calls add roughly 2–4ms each in overhead rather than 20–60ms for public HTTPS calls. It also means agent Workers don't need public endpoints — they're inaccessible from the internet and can only be called by the orchestrator via binding.
What happens if one of the 10 agent Workers fails mid-query? +
The orchestrator uses Promise.allSettled() not Promise.all(). If one agent throws or times out, the other nine complete and the orchestrator proceeds with the responses it received. The tension map includes a failedAgents field that lists any agents that didn't respond. The synthesis prompt is aware of this and explicitly instructs the orchestrator not to draw conclusions from absent agent domains. A single agent failure degrades the output but doesn't break the query.
How does Round 2 affect the total cost? +
Round 2 re-queries only the two conflicting agents, not all ten. For this query, that added approximately $0.008 — two additional agent calls at cached rates. The orchestrator's clash detection pass (determining whether Round 2 is needed) costs roughly $0.003. Total Round 2 overhead: ~$0.011, or about 27% of the total query cost. Whether that cost is justified depends on the severity of the conflict — for a severity-8 factual contradiction on a load-bearing claim, it's the most valuable $0.011 in the query.
// More From PropTechUSA.ai
// $0.041. 11 Workers. 11.3 Seconds.
Watch It Run.

Ask the Consilium something hard. Every millisecond you just read about is happening in real time.

Open The Consilium