The monolith wasn't a mistake. It was the fastest way to get from zero to working. But there's a point where every line added to a single file makes the whole system harder to understand, slower to deploy, and more expensive to run. I hit that point somewhere around line 7,000 and kept going anyway. This is what I learned when I finally stopped.
The Monolith Problem
The original PropTechUSA Consilium was one Cloudflare Worker. One file. Every agent — the economist, the legal analyst, the risk officer, all of them — lived inside a single export default function. Routing logic, system prompts, streaming handlers, error recovery, caching, model fallback. Everything. 10,247 lines.
It worked. That's important to say. The monolith shipped the product. But three things were slowly becoming impossible:
// This is what the routing logic looked like at line ~3,400 // Every agent, every model, every error path, every config — one file export default { async fetch(req: Request, env: Env) { const path = new URL(req.url).pathname; // 847 lines of routing logic — grew by ~60 lines every time we added an agent if (path === '/economist') return handleEconomist(req, env); if (path === '/legal') return handleLegal(req, env); if (path === '/risk') return handleRisk(req, env); // ... 7 more blocks ... // Shared error handler — one bug here broke every agent simultaneously return new Response('Not found', { status: 404 }); } } // handleEconomist was at line 1,240. handleLegal at 2,890. // Finding anything required knowing the file like a map. // Debugging anything required understanding everything.
1. Deployment fear. You hesitate before deploying because a change in one agent might silently break another. 2. Archaeological debugging. Every bug investigation starts with searching the file rather than knowing where to look. 3. Cold start bloat. The entire bundle initializes on every cold start even when only one agent is being called — you're paying initialization cost for ten agents to serve one request.
The Decision To Blow It Up
The trigger wasn't a catastrophic failure. It was a small one. I was fixing a bug in the streaming handler for the economics agent. The fix was three lines. But I had to read 400 lines of surrounding context to be confident the fix wouldn't break the legal agent's streaming path, which used the same shared utility function, which was also called by the orchestrator's synthesis logic.
Three lines. Four hundred lines of reading. That's the monolith tax — you pay it on every change, forever, and it compounds as the file grows. I did the math: at the current growth rate, the file would hit 15,000 lines before I'd finished building the full ten-agent roster. The refactor wasn't optional anymore.
The New Architecture
The rule was simple: one domain, one Worker. Each of the ten domain expert agents became its own Cloudflare Worker — independent deployment, independent cache, independent environment variables, independent failure domain. An 11th orchestrator Worker handles fanout and tension map synthesis.
The key insight is that Cloudflare Workers bill per CPU millisecond of actual compute, not per deployed Worker. Having 11 Workers instead of 1 costs exactly the same if total compute is identical — you only pay for what runs. The difference is isolation. Each Worker fails independently, caches independently, and can be deployed independently without touching any of the others.
// 284 lines. Does one thing. Owns it completely. import { VASQUEZ_SYSTEM_PROMPT } from './prompts'; import { streamWithFallback, verifyInternalAuth } from '../shared'; export default { async fetch(req: Request, env: Env): Promise<Response> { // Auth: only the orchestrator can call this Worker if (!verifyInternalAuth(req, env)) return new Response('Unauthorized', { status: 401 }); const { message } = await req.json(); // Cache: this prompt caches independently of all other agents return streamWithFallback({ system: [{ type: 'text', text: VASQUEZ_SYSTEM_PROMPT, cache_control: { type: 'ephemeral' }}], messages: [{ role: 'user', content: message }], temperature: 0.3, // Vasquez: deterministic clinical reasoning }, env); } };
| Dimension | Monolith (Before) | 11 Workers (After) |
|---|---|---|
| Deployment | All-or-nothing. One deploy touches every agent. | Deploy any Worker independently. Others unaffected. |
| Failures | One bug in shared error handler = all agents down. | One Worker fails. Nine keep running. |
| Cold starts | Entire 10K-line bundle initializes for every request. | 280-line Worker initializes. 9 others not touched. |
| Debugging | Grep a 10K-line file. Understand entire system to fix any part. | Open one 280-line file. Entire context fits in one screen. |
| Prompt caching | One shared cache namespace. Cache invalidation cascades. | Each Worker caches its own system prompt independently at 90% savings. |
| Cost | Same API costs — shared compute billing | Same API costs. Identical billing per CPU ms. |
The Shared Layer
Microservices don't mean no shared code. They mean shared code lives in a deliberate shared layer, not in the same file as everything else. For the Consilium Workers, there's a small shared package that every Worker imports: auth verification, the streaming + fallback handler, error event emitters, and the usage logger. Everything specific to an agent lives in that agent's Worker.
// ~180 lines. The entire shared surface area. // Every Worker imports from here. Nothing else is shared. export async function streamWithFallback(payload: Payload, env: Env) { const models = ['claude-sonnet-4-20250514', 'claude-haiku-4-5-20251001']; for (const model of models) { try { return await callAnthropicStream({ ...payload, model }, env); } catch (e) { if (model === models.at(-1)) throw e; } } } export function verifyInternalAuth(req: Request, env: Env): boolean { const token = req.headers.get('Authorization')?.replace('Bearer ', ''); return token === env.INTERNAL_KEY; } export function logUsage(usage: Usage, agentId: string, env: Env) { // Log to Analytics Engine — cache hit rates, token counts, latency env.ANALYTICS.writeDataPoint({ blobs: [agentId], doubles: [usage.input_tokens, usage.output_tokens, usage.cache_read_input_tokens ?? 0], }); }
If changing it would require redeployment of every Worker, it belongs in shared. If it's specific to one agent's behavior, reasoning, or configuration — it stays in that agent's Worker. When in doubt, keep it in the individual Worker. Premature abstraction into shared is how the monolith grows back.
The Refactor Timeline
Two weeks of evenings. The actual migration was straightforward once the seams were identified — the hard work was deciding what counted as a boundary.
shared/index.ts — the streaming handler, auth verifier, usage logger. Tested in isolation before any agent touched it.Promise.allSettled fanout, tension map generation, Round 2 trigger logic. All the coordination that used to live buried in the monolith now lived explicitly in one place.git rm monolith/index.ts. 10,247 lines gone. Replaced by 11 Workers averaging 280 lines each and a 180-line shared package. The deletion felt better than any feature ship.Don't cut over from monolith to microservices in one deployment. Run both in parallel — new Workers go live, old monolith routes stay active. Shift traffic gradually per agent. This means no big-bang release risk: if a new Worker has a bug, the monolith route is still there. Once you're confident in a Worker, remove the monolith route for that agent. On day 12, the last route was removed and the file was deleted.
The Numbers
The 97% cold start reduction comes from bundle size. The monolith initialized a ~400KB bundle. Each individual Worker is under 15KB. V8 isolates initialize roughly proportional to bundle size — smaller bundle, faster cold start. At the call volumes the Consilium runs, this compounds into real perceived latency savings on the first request of each session.
The more important number is zero shared-state bugs since the refactor. In the monolith, three incidents in two months were caused by one agent's error handling path interfering with another's state. That class of bug is structurally impossible in the new architecture — Workers share no state at runtime. They communicate only through explicit API calls with defined contracts.
In Action
The Consilium runs all eleven Workers simultaneously — 10 domain experts, 1 orchestrator, all streaming, all cached, all independent.
Open The Consilium