Engineering Series Post #06
Architecture Deep Dive · The 11th Worker · Full Technical Specification

The Orchestrator.
How to Build an AI
That Reads Ten Others
and Finds Where They
Actually Disagree.

The ten domain experts get all the attention. But the 11th worker is where the system earns its value. Full architecture: tension map JSON schema, clash detection algorithm, Round 2 trigger logic, the synthesis prompt, and the failure mode that makes false consensus the most dangerous bug in multi-agent AI.

Worker Position
11th
Orchestrator only
Inputs
10
Agent responses
Clash Threshold
6/10
Round 2 trigger
Output Format
JSON
Structured tension map
R2 Trigger Rate
28%
Healthy: 20–40%
J
Justin Erickson — Founder & CTO, PropTechUSA.ai
// March 2026 · 18 min read · Post #06

Most multi-agent systems aggregate. They collect responses from multiple models and merge them into a summary. That's not orchestration — that's averaging. Real orchestration means understanding where agents agree, where they're in genuine tension, why the tension exists, and what should be done about it before presenting anything to the human.

Posts 1–5 reference the orchestrator constantly. This one opens it up completely. The tension map schema, the clash scoring algorithm, the exact condition that fires Round 2, the synthesis prompt, the streaming architecture, and the failure mode that makes the whole system worse if you get it wrong.

Aggregation vs. Orchestrated Disagreement

Here's the difference in concrete terms. You ask ten domain experts whether a real estate deal is sound. Aggregation returns: "Most agents found merit in the deal. Some concerns were raised around financing." Orchestrated disagreement returns: "The economist and the risk officer have a material irresolvable clash on whether the cap rate assumption is realistic. The legal analyst flags a title issue none of the other agents addressed. Eight of ten agents agree on the exit timeline. Round 2 has been triggered for the financing disagreement."

The first output sounds like a conclusion. The second output is a decision map. For high-stakes decisions, the clash between two agents on a specific causal claim is more valuable than ten agreeing paragraphs. The orchestrator's job is to surface that clash, not paper over it.

If your orchestrator always produces a clean synthesis, it's probably hiding something. Genuine multi-expert disagreement is messy. That mess is the signal.
— Justin Erickson, PropTechUSA.ai

The Tension Map Schema

The tension map is a required output field — not optional, not conditional. If the orchestrator returns a response without it, the client treats the response as invalid and retries. This structural constraint prevents the most common failure mode: the orchestrator summarizing without mapping.

types/tension-map.ts — the complete schema
Core Schema
interface TensionMap {

  // Required — validated on every orchestrator response
  version: string;          // schema version for backwards compat
  queryId: string;          // correlates with agent call logs
  generatedAt: number;      // unix timestamp
  round: 1 | 2;            // which synthesis pass produced this

  // Consensus zones — where agents meaningfully agree
  consensus: {
    claim: string;            // the agreed-upon assertion
    supportingAgents: string[];  // which agents hold this view
    confidence: 0..1;           // orchestrator's confidence in the consensus
    loadBearing: boolean;       // does this affect the conclusion?
  }[];

  // Tension entries — the core value of the system
  tensions: {
    id: string;                // unique — used in Round 2 targeting
    agentA: string;            // first agent in conflict
    agentB: string;            // second agent in conflict
    claimA: string;            // agentA's specific position
    claimB: string;            // agentB's specific position
    type: 'factual'|'interpretive'|'emphasis'; // clash type
    severity: 1..10;            // 6+ triggers Round 2
    loadBearing: boolean;      // affects conclusion?
    resolvable: boolean;       // can more info resolve it?
    recommendation: string;    // how should the human weigh this?
  }[];

  // Synthesis — the orchestrator's reading of the full landscape
  synthesis: {
    headline: string;          // one-sentence summary of state of play
    majorFindings: string[];   // top 3-5 substantive conclusions
    openQuestions: string[];   // unresolved after Round 2 (if applicable)
    confidenceProfile: {        // not a single score — per-domain confidence
      [agentId: string]: 0..1
    };
  };

  // Round 2 targeting — populated before R2 call, nulled after
  round2Target?: {
    tensionId: string;          // which tension to resolve
    agents: [string, string];   // only these two agents are re-queried
    prompt: string;            // the specific clash framed as a question
  };
}
Why confidenceProfile Is Per-Domain, Not Aggregate

A single confidence score on multi-agent output is meaningless — it collapses ten different epistemic contexts into one number. The orchestrator might be highly confident in the legal analysis (clear statute, unambiguous application) and deeply uncertain in the economic forecast (contested empirical assumptions). Separate confidence scores per agent expose the actual distribution of certainty. The human can then decide which domains to weight more heavily for this specific decision.

The Clash Detection Algorithm

Clash detection is the hardest part of the orchestrator to get right. Every pair of ten agents touching the same complex question will have hundreds of surface-level differences — different word choices, different emphasis, different framings. The algorithm has to distinguish those from genuine material disagreements.

// Clash Scoring — Live Example
Vasquez ↔ Okafor · Causal claim
8/10
Chen ↔ Diallo · Interpretive
7/10
Mitchell ↔ Webb · Framework
5/10
Harlow ↔ Nakamura · Emphasis
2/10
Round 2 Triggered 2 clashes ≥ 6/10 · Load-bearing · Factual type
orchestrator/clash-detection.ts — scoring algorithm
Core Algorithm
interface ClashScore {
  agentPair: [string, string];
  severity: number;      // 1-10
  type: 'factual' | 'interpretive' | 'emphasis';
  loadBearing: boolean;
}

async function detectClashes(
  responses: AgentResponse[],
  env: Env
): Promise<ClashScore[]> {

  // Phase 1: Extract all claims from each response
  // The orchestrator reads each response and identifies
  // load-bearing assertions (claims that affect the conclusion)
  const claims = await extractClaims(responses, env);

  // Phase 2: Compare claims across agent pairs
  // Only compare load-bearing claims — peripheral diffs are noise
  const clashes: ClashScore[] = [];

  for (let i = 0; i < responses.length; i++) {
    for (let j = i + 1; j < responses.length; j++) {

      const pairClash = await scorePairClash({
        agentA: responses[i].agentId,
        agentB: responses[j].agentId,
        claimsA: claims[i].loadBearing,
        claimsB: claims[j].loadBearing,
      }, env);

      // Severity scoring by type:
      // Factual contradiction:    8-10 (direct truth claim conflict)
      // Interpretive divergence:  4-7  (same facts, different meaning)
      // Emphasis difference:      1-3  (same view, different priority)
      if (pairClash.severity > 0) clashes.push(pairClash);
    }
  }

  return clashes.sort((a, b) => b.severity - a.severity);
}

function shouldTriggerRound2(clashes: ClashScore[]): boolean {
  // Round 2 condition: 2+ load-bearing clashes scoring ≥ 6/10
  const highSeverity = clashes.filter(c =>
    c.severity >= 6 && c.loadBearing && c.type !== 'emphasis'
  );
  return highSeverity.length >= 2;
}

// If Round 2 triggers, only the two conflicting agents are re-queried
// Not all ten — targeted, not expensive
function buildRound2Prompt(clash: ClashScore, responses: AgentResponse[]): string {
  const [a, b] = clash.agentPair;
  return `
    ${responses[a].agentId} argued: "${clash.claimA}"
    ${responses[b].agentId} argued: "${clash.claimB}"

    These claims are in direct conflict on a load-bearing point.
    Address the opposing argument specifically.
    Do not restate your original position without engaging the challenge.
  `;
}
Emphasis Differences Are Not Clashes

Two agents can both agree that a risk exists but disagree on how prominently to flag it. That's an emphasis difference — score 1–3, never triggers Round 2. It belongs in the tension map for human visibility but it's not a factual or interpretive disagreement. The algorithm must classify before scoring. Collapsing emphasis differences with factual contradictions produces a R2 trigger rate that's too high and burns unnecessary API cost on noise.

The Synthesis Prompt

The synthesis prompt is the most carefully engineered part of the orchestrator. It has to produce structured JSON output, preserve unresolved conflicts, avoid false consensus, and render a useful decision map — all in one pass. Here's the exact production prompt:

orchestrator/synthesis-prompt.ts — production system prompt
System Prompt
const ORCHESTRATOR_SYSTEM_PROMPT = `
You are the Consilium Orchestrator. You receive responses from 10 domain
expert AI agents and produce a structured tension map.

YOUR CARDINAL RULES:

1. PRESERVE DISAGREEMENT. Do not synthesize away genuine conflict.
   If two agents disagree on a load-bearing claim, that conflict must
   appear in the tensions array regardless of how uncomfortable it is.

2. CLASSIFY BEFORE SCORING. Every disagreement is one of:
   - factual: directly contradictory truth claims (score 8-10)
   - interpretive: same facts, different meaning (score 4-7)
   - emphasis: same view, different priority (score 1-3)

3. STRUCTURED OUTPUT REQUIRED. Your entire response must be valid JSON
   matching the TensionMap schema. No prose, no preamble, no markdown.
   A response without a tensions array will be treated as invalid.

4. CONFIDENCE IS PER-DOMAIN. Do not produce a single confidence score.
   Rate each agent's domain contribution independently.

5. ROUND 2 ONLY FOR LOAD-BEARING FACTUAL CLASHES. Emphasis
   differences do not trigger Round 2. Cost is real.

WHAT A GOOD SYNTHESIS LOOKS LIKE:
- tensions array has 3-8 entries for a complex query
- At least one consensus entry per major topic area
- openQuestions lists what Round 2 did NOT resolve (honesty)
- headline is one sentence, no hedging, no "it depends"

WHAT A BAD SYNTHESIS LOOKS LIKE:
- Empty or single-item tensions array on a complex topic
- Headline that begins with "It depends" or "Both perspectives..."
- Confidence scores all above 0.85 on contested empirical claims
- openQuestions is empty after a contested Round 2
`;

The Streaming Architecture

The orchestrator can't start streaming until it has all 10 agent responses. That's a hard dependency — you can't detect clashes without all the inputs. But you also can't make the user wait 8–12 seconds staring at a blank screen. The two-phase streaming approach solves this without changing the underlying computation:

// Two-Phase Stream — Perceived Latency vs Actual Latency
orchestrator/streaming.ts — two-phase stream pattern
Streaming Architecture
async function streamOrchestratedResponse(message: string, env: Env) {
  const { readable, writable } = new TransformStream();
  const writer = writable.getWriter();
  const enc = new TextEncoder();
  const emit = (event: string, data: any) =>
    writer.write(enc.encode(`event: ${event}\ndata: ${JSON.stringify(data)}\n\n`));

  // PHASE 1: Fan out to all 10 agents, stream summaries as they arrive
  // User sees content within ~300ms — not blank screen for 10 seconds
  const agentPromises = AGENTS.map(id =>
    callAgent(id, message, env).then(response => {
      emit('agent_complete', { agentId: id, summary: response.summary });
      return response;
    })
  );

  // Wait for all 10 — allSettled so one failure doesn't block synthesis
  const settled = await Promise.allSettled(agentPromises);
  const responses = settled
    .filter(r => r.status === 'fulfilled')
    .map(r => (r as any).value);

  // PHASE 2: Clash detection + synthesis — begins after all 10 complete
  await emit('orchestrating', { message: 'Mapping tensions...', agentCount: responses.length });

  const clashes = await detectClashes(responses, env);
  let finalResponses = responses;

  if (shouldTriggerRound2(clashes)) {
    await emit('round2_triggered', { clashes: clashes.filter(c => c.severity >= 6) });
    finalResponses = await runRound2(responses, clashes, env);
  }

  // Stream the tension map as it generates (SSE from Claude)
  const tensionMap = await synthesize(finalResponses, env, (chunk: string) => {
    writer.write(enc.encode(`event: synthesis_chunk\ndata: ${chunk}\n\n`));
  });

  // Final emit: validated tension map JSON
  if (!validateTensionMap(tensionMap)) {
    await emit('error', { code: 'INVALID_TENSION_MAP', retry: true });
  } else {
    await emit('tension_map', tensionMap);
  }

  writer.close();
  return new Response(readable, {
    headers: { 'Content-Type': 'text/event-stream', 'Cache-Control': 'no-cache' }
  });
}

The Tension Map Visualized

// Live Consilium Tension Map — Real Agent Relationships
// War Story · The False Consensus
When the Orchestrator Lied About Agreement

Three weeks after launching the orchestrator, I noticed the Round 2 trigger rate had dropped from 28% to under 5% over a four-day period. The system was still running. Agents were still responding. No errors in the logs.

The investigation: pulled random tension maps from that period and read them. The synthesis was clean. Almost too clean. Eight-agent queries with no major unresolved tensions. Confidence scores all above 0.85. The headline on one response: "All domain experts agree this represents a sound investment opportunity." Ten agents. Zero tension entries.

The cause was subtle. A prompt update to the synthesis system prompt had added a line intended to improve readability: "Prioritize producing a clear, actionable synthesis the user can act on immediately." That single instruction shifted the orchestrator's optimization target from "accurately represent the state of disagreement" to "produce something the user can act on." The model correctly inferred that a clean synthesis is more actionable than a messy tension map. So it produced clean syntheses. By suppressing the disagreements.

The fix: removed the readability instruction entirely, added the explicit anti-pattern rules now in the production prompt ("A bad synthesis looks like: empty tensions array on a complex topic"). Required minimum tension entries for queries above a complexity threshold. Added an automated check: if a query contains more than 800 tokens of agent responses and produces zero tension entries, flag for manual review.

The lesson: the orchestrator's synthesis incentive must be truth-first, not clarity-first. If you optimize for readable output, you get readable lies. The tension map exists precisely because the world is complicated. Making it look simple is the failure mode.

Frequently Asked
What is an AI orchestrator in a multi-agent system? +
An orchestrator receives responses from multiple domain expert agents, analyzes them for agreement and disagreement, and synthesizes a structured output. In the Consilium, the 11th worker reads all 10 domain expert responses, generates a tension map showing material disagreements, triggers targeted Round 2 debate when clashes are substantive, and produces a final synthesis with explicit uncertainty attribution per domain.
What is a tension map? +
A structured JSON document that identifies where agents agree, where they have surface-level differences, and where they have material irresolvable disagreements. Each tension entry includes the agents in conflict, the nature of the disagreement, a severity score (1–10), whether it's resolvable, and the orchestrator's recommendation for how the human should weigh the conflicting views. The tension map is what makes multi-agent output actionable rather than just voluminous.
How does Round 2 trigger logic work? +
Round 2 triggers when the clash detection scores at least two agent pairs above 6/10 severity AND the disagreements are load-bearing (directly affecting the conclusion) AND the clash type is factual or interpretive — not emphasis. When triggered, only the two conflicting agents are re-queried with the specific clash framed as a challenge — not all ten. Targeted, not expensive. Typical R2 adds ~2 API calls, not 10.
What is the most dangerous orchestrator failure mode? +
False consensus — the orchestrator synthesizes a clean answer where genuine disagreement exists, making the conflict invisible. It happens when the synthesis prompt is optimized for clarity or actionability rather than accuracy. The fix is structural: make the tensions array a required output field, add explicit anti-pattern rules to the system prompt ("empty tensions on a complex query is a bad synthesis"), and build automated monitoring that flags responses with zero tension entries above a complexity threshold.
How do you stream the orchestrator when it needs all 10 agents to finish first? +
Two-phase streaming. Phase 1: fan out to all 10 agents simultaneously, stream individual agent summaries to the client as each one completes — user sees content within ~300ms of the first agent finishing. Phase 2: after all 10 complete, run clash detection and stream the tension map synthesis. The user sees the system working from the first agent response rather than staring at a blank screen for 10 seconds while all 10 agents and the orchestrator complete sequentially.
What's the difference between aggregation and orchestrated disagreement? +
Aggregation combines responses into a summary — "here is what several perspectives say." Orchestrated disagreement maps where perspectives conflict, why, what type of conflict it is, and what it means for the decision. Aggregation loses the signal in the noise. Orchestrated disagreement makes the signal the output. For high-stakes decisions, knowing that the economist and the risk officer fundamentally disagree on a specific causal claim is more valuable than a synthesized paragraph that mentions both views neutrally.
More From PropTechUSA.ai
// Running In Production
See the Orchestrator
Do Its Job

Ask the Consilium something genuinely hard. The tension map is visible in the output — you'll see exactly which agents are in conflict and why.

Open The Consilium