The Full Stack of a $0.041 Query: Every Millisecond From Your Browser to Eleven AI Brains and Back

Every production system is a black box until you trace it. This is the full trace of a real Consilium query — the one that asked whether a specific off-market duplex represented sound investment in a tightening credit environment. Here is everything that happened in the 11.3 seconds before the answer appeared.

The Waterfall

// Request Waterfall — Full Trace Total: 11,340ms · $0.041

0ms 2s 4s 6s 8s 10s 11.3s

Every row in that waterfall is a real operation. Nothing is padded. The long bars are the Anthropic API streaming calls — that's not overhead, that's the model generating tokens. The short bars at the start are Cloudflare infrastructure — routing, KV reads, auth checks. Under 50ms combined for all of them.

The 11 Hops

// Network Topology — One Query's Journey

trace/hop-sequence.ts — annotated full call stack

Full Trace

// T+0ms: Browser → Consilium Edge Worker (CF PoP: Dallas, TX)
const query = {
  message: "Is the off-market duplex at 14th and Monroe sound in tightening credit?",
  queryId: "q_8f3a2b1c",
  sessionId: "s_aa9f1e",
};

// T+4ms: CORS check + shared secret auth — 2ms each
const authOk = req.headers.get('Authorization') === `Bearer ${env.CONSILIUM_SECRET}`;

// T+8ms: Rate limit check — KV read #1
const rateKey = `rate:${ip}:${Math.floor(Date.now()/60000)}`;
const count = await env.KV.get(rateKey); // ~4ms KV read

// T+12ms: Session context load — KV read #2
const ctx = await env.KV.get(`ctx:${sessionId}`); // ~4ms KV read

// T+16ms: Fan-out begins — all 10 agent calls fire simultaneously
// None of these awaits — they're all in-flight at the same time
const agentCalls = AGENTS.map(id =>
  callWorker(`https://${id}.consilium.workers.dev`, {
    method: 'POST',
    headers: { 'Authorization': `Bearer ${env.INTERNAL_KEY}` },
    body: JSON.stringify({ message, queryId }),
  })
);

// T+16ms→8,400ms: Agents stream individually as they complete
// Client renders each agent's summary as it arrives — not waiting for all 10
for (const result of agentStreams) {
  emit('agent_complete', { agentId: result.id, summary: result.summary });
}

// T+8,400ms: Last agent completes — orchestrator phase begins
// T+8,450ms: Clash detection (Claude call #11a — 400ms)
// T+8,850ms: Round 2 triggered (Vasquez + Okafor re-queried)
emit('round2_triggered', {
  agents: ['vasquez', 'okafor'],
  tensionId: 't_001',
  severity: 8,
});

// T+8,850ms→10,200ms: Round 2 agent calls (2 workers, not 10)
// T+10,200ms: Synthesis begins streaming to client
// T+11,340ms: tension_map event fired — complete JSON
emit('tension_map', tensionMap); // validated TensionMap object
// T+11,340ms: Stream closes. Query complete.

Zero Coldstarts. Here's Why.

The trace shows 0 cold starts across 11 Workers. This is not luck — it's a direct consequence of the microservices architecture. Individual Workers are small (under 15KB each) and at the traffic volumes the Consilium runs, each Worker is warm in Cloudflare's edge network before any query arrives. The monolith would have cold-started the entire 400KB bundle on each session; individual Workers warm and stay warm independently.

0msCold Start Overhead

14KBAvg Worker Bundle

4msAvg KV Read

16msAll Infra Overhead

11.3sTotal End-to-End

98.6%Time Is The Models

That last number is the critical one: 98.6% of end-to-end latency is the Anthropic API generating tokens. 1.4% is everything else — routing, auth, KV reads, Worker initialization, SSE framing, orchestration logic. The infrastructure overhead on a production 11-Worker microservices architecture is under 160ms. The bottleneck is always and only the models.

The Cost Breakdown

How $0.041 decomposes across 11 calls, with and without prompt caching:

Agent	Input Tokens	Cache Read	Output Tokens	Cache Status	Cost
Vasquez	142	2,240	380	CACHED	$0.0034
Webb	156	2,190	410	CACHED	$0.0037
Chen	138	2,280	360	CACHED	$0.0032
Okafor	148	2,210	395	CACHED	$0.0035
Mitchell	162	2,260	425	CACHED	$0.0038
Nakamura	144	2,195	375	CACHED	$0.0033
Diallo	158	2,220	405	CACHED	$0.0036
Harlow	166	2,175	430	CACHED	$0.0039
DeLeon	172	2,250	445	CACHED	$0.0040
Cross	140	2,235	370	CACHED	$0.0033
Orchestrator	4,820	0	680	DYNAMIC	$0.0248
TOTAL	6,346	22,255	4,675	—	$0.0405

Why the Orchestrator Costs 60% of the Total

The orchestrator's system prompt is dynamic — it includes all 10 agent responses as context, which changes on every call. That means no prompt caching applies: every orchestrator call pays full input cost on 4,820+ tokens. This is structural and unavoidable. The mitigation is keeping agent responses concise (structured summaries, not full essays) and using the clash detection pass to front-load the synthesis — the orchestrator only runs on the content it actually needs. The 10 agent calls combined cost $0.0357 with caching. Without caching they'd cost ~$0.28. The 88% aggregate cache hit rate is doing most of the cost work.

The SSE Event Stream

847 SSE chunks across one query. Here's the event taxonomy — every event type the client receives and what triggers it:

client/sse-events.ts — full event taxonomy

Event Types

// Phase 1 events — fire as each agent completes (T+300ms → T+8.4s)

'agent_stream_start'  // agent began streaming — show loading state
'agent_chunk'         // text delta from agent's stream (most frequent)
'agent_complete'      // agent finished — render summary card

// Orchestrator phase events — fire after all 10 complete

'orchestrating'       // "Mapping tensions..." loading indicator
'round2_triggered'    // shows which 2 agents are re-querying and why
'round2_complete'     // round 2 responses received
'synthesis_chunk'     // streaming synthesis text delta

// Terminal events — one of these always closes the stream

'tension_map'         // complete validated TensionMap JSON
'stream_complete'     // all done — dismiss loading states
'error'               // { code, retry, agentId? } — client handles

// Event volume breakdown for this query:
// agent_chunk:       782 events (92.3% of total)
// agent_complete:    10 events
// synthesis_chunk:   48 events
// round2_triggered:  1 event
// tension_map:       1 event
// orchestrating:     1 event
// stream_complete:   1 event
// Other:             3 events
// ─────────────────────────────
// Total:             847 events

// The Full Stack — Layer by Layer

The Layer That Surprises Everyone

Most people assume the Cloudflare routing layer adds meaningful latency. It adds 4–8ms total — round-trip included — because Workers run at the PoP closest to the user, not in a central data center. For a user in Dallas, the Consilium Worker runs in Dallas. The 11 agent Workers are called via Cloudflare Service Bindings — Worker-to-Worker calls that never leave Cloudflare's network. No DNS resolution, no TLS handshake, no public internet hops between the orchestrator and agents. The only calls that leave the network are the 11 Anthropic API calls, which go to Anthropic's servers from Cloudflare's edge.

Frequently Asked

Why does the total query cost $0.041 when individual agent calls seem cheap? +

The orchestrator alone costs $0.025 — 60% of the total — because it processes all 10 agent responses as dynamic context on every call (no caching possible). The 10 agent calls combined cost $0.016 with 88% cache hit rate. Without prompt caching across all agents, the same query would cost approximately $0.31. The $0.041 figure is what caching actually buys you: a 7.5× cost reduction on the agents, with the orchestrator remaining the unavoidable cost center.

Why does the user see content at 300ms if all 10 agents take up to 8.4 seconds? +

Two-phase streaming. Phase 1: all 10 agent calls fire simultaneously at T+16ms. Each agent streams its response as it completes — the first agent typically finishes around T+300ms, and the client renders that summary immediately. The user sees content building progressively rather than waiting for all 10 to finish. Phase 2 begins at T+8.4s when the last agent completes, triggering clash detection and synthesis. Total perceived latency is dominated by time-to-first-content (~300ms), not time-to-complete (~11s).

What are Cloudflare Service Bindings and why do they matter for this architecture? +

Service Bindings allow one Cloudflare Worker to call another Worker directly without going over the public internet — no DNS, no TLS handshake, no egress cost. For the Consilium, the orchestrator calls all 10 agent Workers via Service Bindings. This means 10 internal API calls add roughly 2–4ms each in overhead rather than 20–60ms for public HTTPS calls. It also means agent Workers don't need public endpoints — they're inaccessible from the internet and can only be called by the orchestrator via binding.

What happens if one of the 10 agent Workers fails mid-query? +

The orchestrator uses Promise.allSettled() not Promise.all(). If one agent throws or times out, the other nine complete and the orchestrator proceeds with the responses it received. The tension map includes a failedAgents field that lists any agents that didn't respond. The synthesis prompt is aware of this and explicitly instructs the orchestrator not to draw conclusions from absent agent domains. A single agent failure degrades the output but doesn't break the query.

How does Round 2 affect the total cost? +

Round 2 re-queries only the two conflicting agents, not all ten. For this query, that added approximately $0.008 — two additional agent calls at cached rates. The orchestrator's clash detection pass (determining whether Round 2 is needed) costs roughly $0.003. Total Round 2 overhead: ~$0.011, or about 27% of the total query cost. Whether that cost is justified depends on the severity of the conflict — for a severity-8 factual contradiction on a load-bearing claim, it's the most valuable $0.011 in the query.

// More From PropTechUSA.ai

Engineering · Post 1

Prompt Caching: The API Cost Kill Shot

Engineering · Post 3

SSE Streaming: Eleven Workers. One Response.

Engineering · Post 4

I Deleted 10,000 Lines of Code. The System Got Better.

Engineering · Post 5

AI Observability: What to Log When Your System Has Eleven Brains

Engineering · Post 6

The Orchestrator: Tension Maps and Clash Detection

Editorial