Cloudflare Workers Architecture

Building a Service Mesh on Cloudflare Workers

How to architect worker-to-worker communication at the edge. Service bindings vs HTTP, error handling, retry logic, and observability patterns.

๐Ÿ“– 12 min read January 24, 2026

When you move from a monolithic worker to a distributed system, you need a communication layer. Traditional service meshes like Istio or Linkerd don't exist at the edge. You have to build your own.

This is the architecture pattern running in production across 28 workers, handling millions of requests with sub-50ms latency.

The Architecture

A service mesh at the edge looks different from traditional microservices. There's no central orchestrator, no sidecar proxies. Each worker is both a service and a potential mesh participant.

Edge Service Mesh Architecture
๐Ÿ“ฅ
Ingress
api-gateway
๐Ÿ”
Auth
auth-worker
๐Ÿ“Š
Analytics
metrics-worker
๐Ÿ“
Logging
log-worker
โ†“ โ†“ โ†“ โ†“
โšก
Core Services
leads โ€ข valuation โ€ข notify โ€ข crm
๐Ÿ’พ
Data Layer
KV โ€ข D1 โ€ข R2

Service Bindings vs HTTP Calls

There are two ways for workers to communicate: Service Bindings (direct invocation) and HTTP calls (network round-trip). The choice has significant implications.

Factor Service Bindings HTTP Calls
Latency <1ms overhead 5-15ms overhead
Cold Starts None Possible
Billing Single request Multiple requests
Configuration wrangler.toml None needed
Cross-account Not supported Fully supported
Debugging Harder to trace Standard HTTP tools

Rule of thumb: Use Service Bindings for internal, high-frequency calls. Use HTTP for external integrations and cross-account communication.

Service Binding Configuration

wrangler.toml
# Define service bindings in the calling worker [[services]] binding = "AUTH" service = "auth-worker" [[services]] binding = "NOTIFY" service = "notification-worker" [[services]] binding = "METRICS" service = "metrics-worker"

Calling via Service Binding

api-gateway.ts
export default { async fetch(request: Request, env: Env): Promise<Response> { // Service binding call - no network overhead const authResponse = await env.AUTH.fetch( new Request('https://auth/verify', { method: 'POST', headers: { 'Authorization': request.headers.get('Authorization') } }) ); if (!authResponse.ok) { return new Response('Unauthorized', { status: 401 }); } // Continue with authenticated request... } }
Performance Note
Service bindings share the same isolate when possible. The URL in env.SERVICE.fetch() is ignored for routingโ€”it's just for logging. The binding handles routing automatically.

Building a Request Router

The gateway worker needs to route requests to appropriate services. Here's a pattern that scales:

router.ts
type RouteHandler = (req: Request, env: Env) => Promise<Response>; const routes: Record<string, RouteHandler> = { '/api/leads': (req, env) => env.LEADS.fetch(req), '/api/valuation': (req, env) => env.VALUATION.fetch(req), '/api/offers': (req, env) => env.OFFERS.fetch(req), '/api/notify': (req, env) => env.NOTIFY.fetch(req), }; export function route(request: Request, env: Env): Promise<Response> { const url = new URL(request.url); // Find matching route for (const [pattern, handler] of Object.entries(routes)) { if (url.pathname.startsWith(pattern)) { return handler(request, env); } } return new Response('Not Found', { status: 404 }); }

Error Handling & Retry Logic

Distributed systems fail. The question is how gracefully. Here's a retry wrapper with exponential backoff:

retry.ts
interface RetryOptions { maxAttempts: number; baseDelay: number; maxDelay: number; } const defaults: RetryOptions = { maxAttempts: 3, baseDelay: 100, maxDelay: 2000 }; export async function withRetry<T>( fn: () => Promise<T>, options: Partial<RetryOptions> = {} ): Promise<T> { const opts = { ...defaults, ...options }; let lastError: Error; for (let attempt = 1; attempt <= opts.maxAttempts; attempt++) { try { return await fn(); } catch (error) { lastError = error as Error; if (attempt === opts.maxAttempts) break; // Exponential backoff with jitter const delay = Math.min( opts.baseDelay * Math.pow(2, attempt - 1), opts.maxDelay ); const jitter = delay * 0.1 * Math.random(); await sleep(delay + jitter); } } throw lastError; }

Circuit Breaker Pattern

For services that might be down, implement a circuit breaker to fail fast:

circuit-breaker.ts
interface CircuitState { failures: number; lastFailure: number; state: 'closed' | 'open' | 'half-open'; } export class CircuitBreaker { private state: CircuitState = { failures: 0, lastFailure: 0, state: 'closed' }; private threshold = 5; private timeout = 30000; // 30 seconds async call<T>(fn: () => Promise<T>, fallback?: () => T): Promise<T> { // Check if circuit should stay open if (this.state.state === 'open') { if (Date.now() - this.state.lastFailure < this.timeout) { if (fallback) return fallback(); throw new Error('Circuit breaker is open'); } this.state.state = 'half-open'; } try { const result = await fn(); this.reset(); return result; } catch (error) { this.recordFailure(); if (fallback) return fallback(); throw error; } } private recordFailure() { this.state.failures++; this.state.lastFailure = Date.now(); if (this.state.failures >= this.threshold) { this.state.state = 'open'; } } private reset() { this.state = { failures: 0, lastFailure: 0, state: 'closed' }; } }

Observability Layer

Without observability, distributed debugging is impossible. Every request through the mesh needs tracing:

tracing.ts
interface TraceContext { traceId: string; spanId: string; parentSpanId?: string; service: string; startTime: number; } export function createTrace(service: string, parentCtx?: TraceContext): TraceContext { return { traceId: parentCtx?.traceId || crypto.randomUUID(), spanId: crypto.randomUUID().slice(0, 8), parentSpanId: parentCtx?.spanId, service, startTime: Date.now() }; } export function injectTraceHeaders(headers: Headers, ctx: TraceContext) { headers.set('x-trace-id', ctx.traceId); headers.set('x-span-id', ctx.spanId); if (ctx.parentSpanId) { headers.set('x-parent-span-id', ctx.parentSpanId); } } export async function logTrace(ctx: TraceContext, env: Env) { const duration = Date.now() - ctx.startTime; // Fire and forget to logging worker env.LOGGER.fetch(new Request('https://log/trace', { method: 'POST', body: JSON.stringify({ ...ctx, duration }) })); }

Production Metrics

After running this architecture in production:

47ms
P95 Latency
2.3M
Requests/Month
$17
Monthly Cost

Common Pitfalls

  • Circular dependencies. Worker A calls B, B calls A. Use dependency injection and clear service boundaries.
  • Missing timeouts. Always set timeouts on service calls. Default to 10 seconds max.
  • No fallbacks. Every external call should have a degraded response path.
  • Over-fetching context. Don't pass the entire request through the mesh. Extract what's needed.
  • Ignoring cold starts. Even with bindings, first calls may be slower. Warm critical paths.

Implementation Checklist

  • Service bindings configured for internal communication
  • HTTP calls only for external services
  • Retry logic with exponential backoff
  • Circuit breakers on critical paths
  • Trace context propagation across all calls
  • Centralized logging worker
  • Timeouts on every service call
  • Fallback responses defined

A service mesh isn't a product you install. It's a pattern you implement. At the edge, you build it yourselfโ€”but you also control it completely.

Related Articles

28 Cloudflare Workers: Real-Time SaaS Architecture
Read more โ†’
API Gateway Patterns at the Edge
Read more โ†’
Real-Time Data Pipelines at the Edge
Read more โ†’

Next: Choosing the Right Storage

KV, D1, R2, Durable Objectsโ€”when to use each.

โ†’ Read Storage Guide
๐ŸŒ™