What is the right size for a Cloudflare Worker microservice?

A Worker should do one thing and own one domain of responsibility. Good boundary signals: if you'd want to deploy it independently, if it has its own failure mode, if it has meaningfully different caching or compute needs, or if a different team member should reasonably own it. For the PropTechUSA Consilium, each domain expert is a separate Worker because each has a distinct system prompt, temperature, and epistemic architecture — they fail independently, cache independently, and scale independently.

How do you handle shared state between multiple Cloudflare Workers?

Cloudflare KV for eventually-consistent reads (config, cached responses, rate limit counters). Durable Objects for strongly-consistent state that needs coordination (session state, locks). R2 for large object storage. The pattern to avoid: Workers calling each other synchronously in a chain — this couples them tightly and compounds latency. Service bindings allow Worker-to-Worker calls without going over the public internet, which is faster and more secure for internal communication.

Does splitting into microservices increase cost on Cloudflare Workers?

No — Workers bill per CPU millisecond, not per deployed Worker. Having 11 Workers vs 1 Worker costs the same if the total compute is the same. The cost advantage comes from independent scaling and caching: each Worker only processes what it's responsible for, and prompt caching applies per-Worker so each domain expert's system prompt caches independently at 90% input cost reduction.

What was wrong with the 10,000-line monolith?

Three compounding problems: (1) Every deployment changed everything — a fix to the economist agent could break the legal agent's routing logic. (2) Cold starts grew with file size — the entire 10K-line bundle initialized on every cold start even if only one agent was being called. (3) Debugging was archaeology — tracing a bug through a single massive file where all agents shared state, shared error handling, and shared configuration meant understanding the entire system to fix any part of it.

I Deleted 10,000 Lines of Code and the System Got Better

Q: Why are Cloudflare Workers better for microservices than traditional containers?

Cloudflare Workers run at the edge in V8 isolates — no Docker, no Kubernetes, no warm-up time beyond the first cold start (typically under 5ms). Each Worker is independently deployed, scales to zero, and bills only for actual CPU time used. A microservice architecture in traditional containers requires load balancers, orchestration, networking config, and baseline cost even at zero traffic. Workers give you microservice isolation with none of the infrastructure overhead.

Q: How long did the refactor take?

About two weeks of evenings. The actual migration was straightforward once the boundaries were defined — the hard work was identifying the right seams. Each agent's system prompt, temperature, and routing logic was already logically separate in the monolith; it just lived in the same file. Extracting them into individual Workers was mostly copy-paste plus wiring up the orchestrator to call each endpoint. The first Worker deployed on day two. The last one on day twelve.

The monolith wasn't a mistake. It was the fastest way to get from zero to working. But there's a point where every line added to a single file makes the whole system harder to understand, slower to deploy, and more expensive to run. I hit that point somewhere around line 7,000 and kept going anyway. This is what I learned when I finally stopped.

The Monolith Problem

The original PropTechUSA Consilium was one Cloudflare Worker. One file. Every agent — the economist, the legal analyst, the risk officer, all of them — lived inside a single export default function. Routing logic, system prompts, streaming handlers, error recovery, caching, model fallback. Everything. 10,247 lines.

It worked. That's important to say. The monolith shipped the product. But three things were slowly becoming impossible:

monolith/index.ts — the problem in miniature

Before

// This is what the routing logic looked like at line ~3,400
// Every agent, every model, every error path, every config — one file

export default {
  async fetch(req: Request, env: Env) {
    const path = new URL(req.url).pathname;

    // 847 lines of routing logic — grew by ~60 lines every time we added an agent
    if (path === '/economist') return handleEconomist(req, env);
    if (path === '/legal') return handleLegal(req, env);
    if (path === '/risk') return handleRisk(req, env);
    // ... 7 more blocks ...

    // Shared error handler — one bug here broke every agent simultaneously
    return new Response('Not found', { status: 404 });
  }
}

// handleEconomist was at line 1,240. handleLegal at 2,890.
// Finding anything required knowing the file like a map.
// Debugging anything required understanding everything.

The Three Signs Your Worker Is A Monolith Problem

1. Deployment fear. You hesitate before deploying because a change in one agent might silently break another. 2. Archaeological debugging. Every bug investigation starts with searching the file rather than knowing where to look. 3. Cold start bloat. The entire bundle initializes on every cold start even when only one agent is being called — you're paying initialization cost for ten agents to serve one request.

The Decision To Blow It Up

The trigger wasn't a catastrophic failure. It was a small one. I was fixing a bug in the streaming handler for the economics agent. The fix was three lines. But I had to read 400 lines of surrounding context to be confident the fix wouldn't break the legal agent's streaming path, which used the same shared utility function, which was also called by the orchestrator's synthesis logic.

Three lines. Four hundred lines of reading. That's the monolith tax — you pay it on every change, forever, and it compounds as the file grows. I did the math: at the current growth rate, the file would hit 15,000 lines before I'd finished building the full ten-agent roster. The refactor wasn't optional anymore.

The monolith tax is invisible until it isn't. Then it's the only thing you can see.

— Justin Erickson, PropTechUSA.ai

The New Architecture

The rule was simple: one domain, one Worker. Each of the ten domain expert agents became its own Cloudflare Worker — independent deployment, independent cache, independent environment variables, independent failure domain. An 11th orchestrator Worker handles fanout and tension map synthesis.

// 11-Worker Architecture — Live

The key insight is that Cloudflare Workers bill per CPU millisecond of actual compute, not per deployed Worker. Having 11 Workers instead of 1 costs exactly the same if total compute is identical — you only pay for what runs. The difference is isolation. Each Worker fails independently, caches independently, and can be deployed independently without touching any of the others.

workers/vasquez/index.ts — one Worker, one responsibility

After

// 284 lines. Does one thing. Owns it completely.

import { VASQUEZ_SYSTEM_PROMPT } from './prompts';
import { streamWithFallback, verifyInternalAuth } from '../shared';

export default {
  async fetch(req: Request, env: Env): Promise<Response> {

    // Auth: only the orchestrator can call this Worker
    if (!verifyInternalAuth(req, env)) return new Response('Unauthorized', { status: 401 });

    const { message } = await req.json();

    // Cache: this prompt caches independently of all other agents
    return streamWithFallback({
      system: [{ type: 'text', text: VASQUEZ_SYSTEM_PROMPT, cache_control: { type: 'ephemeral' }}],
      messages: [{ role: 'user', content: message }],
      temperature: 0.3, // Vasquez: deterministic clinical reasoning
    }, env);
  }
};

Dimension	Monolith (Before)	11 Workers (After)
Deployment	All-or-nothing. One deploy touches every agent.	Deploy any Worker independently. Others unaffected.
Failures	One bug in shared error handler = all agents down.	One Worker fails. Nine keep running.
Cold starts	Entire 10K-line bundle initializes for every request.	280-line Worker initializes. 9 others not touched.
Debugging	Grep a 10K-line file. Understand entire system to fix any part.	Open one 280-line file. Entire context fits in one screen.
Prompt caching	One shared cache namespace. Cache invalidation cascades.	Each Worker caches its own system prompt independently at 90% savings.
Cost	Same API costs — shared compute billing	Same API costs. Identical billing per CPU ms.

The Shared Layer

Microservices don't mean no shared code. They mean shared code lives in a deliberate shared layer, not in the same file as everything else. For the Consilium Workers, there's a small shared package that every Worker imports: auth verification, the streaming + fallback handler, error event emitters, and the usage logger. Everything specific to an agent lives in that agent's Worker.

shared/index.ts — the only code that touches every Worker

Shared Layer

// ~180 lines. The entire shared surface area.
// Every Worker imports from here. Nothing else is shared.

export async function streamWithFallback(payload: Payload, env: Env) {
  const models = ['claude-sonnet-4-20250514', 'claude-haiku-4-5-20251001'];
  for (const model of models) {
    try {
      return await callAnthropicStream({ ...payload, model }, env);
    } catch (e) {
      if (model === models.at(-1)) throw e;
    }
  }
}

export function verifyInternalAuth(req: Request, env: Env): boolean {
  const token = req.headers.get('Authorization')?.replace('Bearer ', '');
  return token === env.INTERNAL_KEY;
}

export function logUsage(usage: Usage, agentId: string, env: Env) {
  // Log to Analytics Engine — cache hit rates, token counts, latency
  env.ANALYTICS.writeDataPoint({
    blobs: [agentId],
    doubles: [usage.input_tokens, usage.output_tokens, usage.cache_read_input_tokens ?? 0],
  });
}

The Rule For What Goes In Shared

If changing it would require redeployment of every Worker, it belongs in shared. If it's specific to one agent's behavior, reasoning, or configuration — it stays in that agent's Worker. When in doubt, keep it in the individual Worker. Premature abstraction into shared is how the monolith grows back.

The Refactor Timeline

Two weeks of evenings. The actual migration was straightforward once the seams were identified — the hard work was deciding what counted as a boundary.

Day 1–2

Defined the boundaries. Drew the 11-Worker map on paper first. Identified what was agent-specific vs what was genuinely shared. The routing logic was the biggest question — ultimately it all moved to the orchestrator.

Day 3–4

Extracted the shared layer. Built shared/index.ts — the streaming handler, auth verifier, usage logger. Tested in isolation before any agent touched it.

Day 5–9

Migrated agents one by one. Vasquez first (smallest, most self-contained). Each new Worker went live alongside the monolith — traffic split until confidence was established, then the monolith route was removed.

Day 10–11

Built the orchestrator Worker. The hardest piece — Promise.allSettled fanout, tension map generation, Round 2 trigger logic. All the coordination that used to live buried in the monolith now lived explicitly in one place.

Day 12

Deleted the monolith. git rm monolith/index.ts. 10,247 lines gone. Replaced by 11 Workers averaging 280 lines each and a 180-line shared package. The deletion felt better than any feature ship.

Migrate Alongside, Not In Place

Don't cut over from monolith to microservices in one deployment. Run both in parallel — new Workers go live, old monolith routes stay active. Shift traffic gradually per agent. This means no big-bang release risk: if a new Worker has a bug, the monolith route is still there. Once you're confident in a Worker, remove the monolith route for that agent. On day 12, the last route was removed and the file was deleted.

The Numbers

97%Cold Start Reduction

11×Deploy Isolation

280Avg Lines / Worker

0Shared State Bugs Since

12Days to Complete

The 97% cold start reduction comes from bundle size. The monolith initialized a ~400KB bundle. Each individual Worker is under 15KB. V8 isolates initialize roughly proportional to bundle size — smaller bundle, faster cold start. At the call volumes the Consilium runs, this compounds into real perceived latency savings on the first request of each session.

The more important number is zero shared-state bugs since the refactor. In the monolith, three incidents in two months were caused by one agent's error handling path interfering with another's state. That class of bug is structurally impossible in the new architecture — Workers share no state at runtime. They communicate only through explicit API calls with defined contracts.

Frequently Asked

Why are Cloudflare Workers better for microservices than traditional containers? +

Workers run in V8 isolates — no Docker, no Kubernetes, no warm-up infrastructure. Each Worker is independently deployed, scales to zero, and bills only for actual CPU time. Traditional container microservices require load balancers, orchestration, networking config, and baseline cost even at zero traffic. Workers give you full microservice isolation with none of the infrastructure overhead.

What's the right size for a Cloudflare Worker? +

A Worker should do one thing and own one domain of responsibility. Good boundary signals: you'd want to deploy it independently, it has its own failure mode, it has meaningfully different caching or compute needs. For the Consilium, each domain expert is a separate Worker because each has a distinct system prompt, temperature, and epistemic architecture — they fail, cache, and scale independently.

Does splitting into microservices increase Cloudflare billing? +

No. Workers bill per CPU millisecond, not per deployed Worker. 11 Workers vs 1 Worker costs identically if total compute is the same. The cost advantage comes from independent prompt caching — each domain expert's 2,200-token system prompt now caches independently at 90% input cost reduction, rather than sharing a cache namespace that required careful invalidation management.

How do you handle shared state between Workers? +

Cloudflare KV for eventually-consistent reads (config, cached responses, rate limit counters). Durable Objects for strongly-consistent coordination. R2 for large object storage. The pattern to avoid: synchronous Worker-to-Worker chains that couple them tightly and compound latency. Service bindings allow Worker-to-Worker calls without going over the public internet, which is faster and more secure for internal orchestration.

How long did the refactor actually take? +

Twelve days of evenings. The migration itself was straightforward once boundaries were defined — the hard work was identifying the right seams. Each agent's system prompt, temperature, and routing logic was already logically separate in the monolith; it just lived in the same file. Extracting them was mostly copy-paste plus wiring the orchestrator. The first Worker deployed on day two. The monolith was deleted on day twelve.

What's the biggest mistake to avoid when splitting a monolith? +

Cutting over all at once. Run the new Workers in parallel with the monolith — shift traffic per-agent gradually. If a new Worker has a bug, the monolith route is still there as a fallback. The monolith only gets deleted when every route has been migrated, tested, and running in production with confidence. Big-bang refactors fail because the risk is too concentrated. Incremental migration means any individual failure is contained and reversible.

I Deleted10,000Lines Of Code.The SystemGot Better.