DevOps Observability

Monitoring & Observability
at the Edge

Structured logging, real-time metrics, alerting strategies, and debugging patterns for 28 Cloudflare Workers in production.

๐Ÿ“– 13 min read January 24, 2026

You can't fix what you can't see. With 28 Workers processing requests across 300+ edge locations, observability isn't optionalโ€”it's the difference between "we noticed a 10% revenue drop" and "we fixed the bug before users noticed."

Here's the monitoring stack keeping our edge infrastructure visible and debuggable.

28
Workers Monitored
2.1M
Requests/Month
23ms
Avg Latency
99.97%
Success Rate

Pattern 1: Structured Logging

Every log entry follows the same structure. No exceptions:

logger.ts
interface LogEntry { timestamp: string; level: 'debug' | 'info' | 'warn' | 'error'; requestId: string; worker: string; environment: string; message: string; data?: Record<string, any>; error?: { name: string; message: string; stack?: string; }; duration?: number; cf?: { colo: string; country: string; }; } class Logger { constructor( private worker: string, private requestId: string, private cf?: IncomingRequestCfProperties ) {} info(message: string, data?: Record<string, any>) { this.log('info', message, data); } error(message: string, error: Error, data?: Record<string, any>) { this.log('error', message, { ...data, error: { name: error.name, message: error.message, stack: error.stack } }); } private log(level: LogEntry['level'], message: string, data?: any) { const entry: LogEntry = { timestamp: new Date().toISOString(), level, requestId: this.requestId, worker: this.worker, environment: ENV, message, data, cf: this.cf ? { colo: this.cf.colo, country: this.cf.country } : undefined }; console.log(JSON.stringify(entry)); } }

Pattern 2: Request Tracing

One request ID flows through all services:

tracing.ts
export function withTracing(handler: Handler): Handler { return async (request, env, ctx) => { // Get or create request ID const requestId = request.headers.get('X-Request-ID') || crypto.randomUUID(); const startTime = Date.now(); const logger = new Logger('api-gateway', requestId, request.cf); logger.info('Request started', { method: request.method, url: request.url, userAgent: request.headers.get('User-Agent') }); try { const response = await handler(request, env, ctx); logger.info('Request completed', { status: response.status, duration: Date.now() - startTime }); // Add tracing headers to response const headers = new Headers(response.headers); headers.set('X-Request-ID', requestId); headers.set('X-Response-Time', `${Date.now() - startTime}ms`); return new Response(response.body, { ...response, headers }); } catch (error) { logger.error('Request failed', error, { duration: Date.now() - startTime }); throw error; } }; }

Pattern 3: Real-Time Metrics

Push metrics to Analytics Engine or external services:

metrics.ts
export function trackMetrics( request: Request, response: Response, duration: number, ctx: ExecutionContext, env: Env ) { const datapoint = { // Dimensions (groupable) worker: 'api-gateway', method: request.method, path: new URL(request.url).pathname, status: response.status.toString(), statusGroup: Math.floor(response.status / 100) + 'xx', colo: request.cf?.colo || 'unknown', country: request.cf?.country || 'unknown', // Metrics (aggregatable) count: 1, duration, success: response.ok ? 1 : 0, error: response.ok ? 0 : 1 }; // Fire and forget ctx.waitUntil( env.ANALYTICS.writeDataPoint({ blobs: [datapoint.worker, datapoint.method, datapoint.path], doubles: [datapoint.duration, datapoint.count], indexes: [datapoint.status] }) ); }

Pattern 4: Health Checks with Cron

health-monitor.ts
const ENDPOINTS = [ { name: 'API Gateway', url: 'https://api.proptechusa.ai/health' }, { name: 'Lead Processor', url: 'https://leads.proptechusa.ai/health' }, { name: 'AI Chatbot', url: 'https://chat.proptechusa.ai/health' }, ]; export default { async scheduled(event: ScheduledEvent, env: Env, ctx: ExecutionContext) { const results = await Promise.all( ENDPOINTS.map(async (endpoint) => { const start = Date.now(); try { const res = await fetch(endpoint.url, { signal: AbortSignal.timeout(5000) }); return { name: endpoint.name, healthy: res.ok, latency: Date.now() - start, status: res.status }; } catch (e) { return { name: endpoint.name, healthy: false, latency: Date.now() - start, error: e.message }; } }) ); const unhealthy = results.filter(r => !r.healthy); if (unhealthy.length > 0) { await sendSlackAlert({ text: `๐Ÿšจ Health Check Failed`, blocks: unhealthy.map(r => ({ type: 'section', text: { type: 'mrkdwn', text: `*${r.name}*: ${r.error || r.status}` } })) }, env); } } };

Pattern 5: Error Alerting

alerting.ts
async function sendSlackAlert( error: Error, context: { requestId: string; worker: string; url: string; }, env: Env ) { await fetch(env.SLACK_WEBHOOK_URL, { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ blocks: [ { type: 'header', text: { type: 'plain_text', text: '๐Ÿšจ Production Error' } }, { type: 'section', fields: [ { type: 'mrkdwn', text: `*Worker:*\n${context.worker}` }, { type: 'mrkdwn', text: `*Request ID:*\n\`${context.requestId}\`` }, { type: 'mrkdwn', text: `*Error:*\n${error.message}` }, { type: 'mrkdwn', text: `*URL:*\n${context.url}` } ] }, { type: 'section', text: { type: 'mrkdwn', text: `\`\`\`${error.stack?.slice(0, 500)}\`\`\`` } } ] }) }); }
Alert Fatigue Prevention
Deduplicate alerts by error signature. Send one alert for the first occurrence, then aggregate. "This error occurred 47 times in the last 5 minutes" is more useful than 47 individual alerts.

Observability Checklist

  • Structured JSON logs with consistent schema
  • Request IDs propagated through all services
  • Latency tracking at p50, p95, p99 percentiles
  • Error rate monitoring with alerting thresholds
  • Health checks running every minute via Cron
  • Slack alerts for critical errors and outages
  • Dashboard for real-time metrics visualization
  • Log retention for debugging historical issues

Observability isn't about collecting dataโ€”it's about answering questions. "Why did that request fail?" should take seconds to answer, not hours of digging through logs.

Related Articles

28 Cloudflare Workers Architecture
Read more โ†’
CI/CD for Cloudflare Workers
Read more โ†’
Designing for Model Failure
Read more โ†’

Need Observability Setup?

We build monitoring systems that catch issues before users do.

โ†’ Get Started