AI Engineering Production

Prompt Engineering for
Production Systems

Real prompts from production, not theory. System prompts, few-shot examples, structured outputs, and versioning strategies.

๐Ÿ“– 14 min read January 24, 2026

Most prompt engineering content is useless. "Be specific" and "provide context" doesn't help when you need 99.9% reliability at 2 million requests per month. This is what actually works in production.

These are real prompts running in production systemsโ€”the patterns that survived A/B testing, edge cases, and scale.

Pattern 1: The Production System Prompt

A good system prompt does three things: defines the role, sets constraints, and specifies output format. Here's our lead qualification prompt:

System Prompt: Lead Qualifier
You are a lead qualification assistant for a real estate investment company. ROLE: Analyze incoming leads and extract structured data for our CRM. CONSTRAINTS: - Never invent information not present in the input - If a field is unclear, use null instead of guessing - Phone numbers must be 10 digits (US) or marked invalid - Dates should be ISO 8601 format OUTPUT: Always respond with valid JSON matching this schema: { "name": string | null, "phone": string | null, "email": string | null, "property_address": string | null, "motivation": "high" | "medium" | "low" | "unknown", "timeline": string | null, "confidence": number (0-1) } If input is completely unintelligible, return {"error": "unparseable", "raw": ""}.

Pattern 2: Few-Shot Examples

Examples are worth 1000 words of instructions. Include 2-3 examples covering common cases and edge cases:

few-shot-examples.ts
const fewShotExamples = [ { role: "user", content: "hi im john smith at 123 main st need to sell fast my number is 5551234567" }, { role: "assistant", content: JSON.stringify({ name: "John Smith", phone: "5551234567", email: null, property_address: "123 Main St", motivation: "high", // "sell fast" indicates urgency timeline: "immediate", confidence: 0.85 }) }, { role: "user", content: "asdf keyboard smash 12345" }, { role: "assistant", content: JSON.stringify({ error: "unparseable", raw: "asdf keyboard smash 12345" }) } ];
Critical: Always Include Edge Cases
Your few-shot examples should include at least one "bad input" example. Without it, the model will try to extract data from garbage, leading to hallucinated outputs that pollute your database.

Pattern 3: Structured Output Enforcement

JSON mode isn't enough. Validate and coerce outputs to your schema:

output-validation.ts
import { z } from 'zod'; const LeadSchema = z.object({ name: z.string().nullable(), phone: z.string().regex(/^\d{10}$/).nullable(), email: z.string().email().nullable(), property_address: z.string().nullable(), motivation: z.enum(['high', 'medium', 'low', 'unknown']), timeline: z.string().nullable(), confidence: z.number().min(0).max(1) }); async function qualifyLead(input: string): Promise<Lead> { const response = await anthropic.messages.create({ model: 'claude-sonnet-4-20250514', max_tokens: 500, system: SYSTEM_PROMPT, messages: [...fewShotExamples, { role: 'user', content: input }] }); const text = response.content[0].text; try { const parsed = JSON.parse(text); return LeadSchema.parse(parsed); // Throws if invalid } catch (e) { // Log failure for prompt iteration await logPromptFailure(input, text, e); throw new Error('Output validation failed'); } }

Pattern 4: Token Optimization

Tokens cost money. At scale, every word matters:

Technique Token Savings Trade-off
Abbreviate instructions 20-30% Slight accuracy drop
Remove verbose examples 40-50% Edge case handling
Use shorter field names 10-15% Readability
Compress system prompt 25-35% Maintainability
Before: 847 tokens
You are a helpful assistant that analyzes real estate leads and extracts relevant information from them. Please carefully read the input and identify the following fields if they are present...
After: 312 tokens
Extract lead data. Output JSON only. Fields: name, phone (10 digits), email, address, motivation (high/med/low/unknown), timeline, confidence (0-1). Unknown = null. Bad input = {"error":"unparseable"}.

Pattern 5: Prompt Versioning

Prompts are code. Version them like code:

prompts/lead-qualifier/v2.3.ts
export const LEAD_QUALIFIER_PROMPT = { version: '2.3', model: 'claude-sonnet-4-20250514', system: `Extract lead data. Output JSON only...`, examples: [...], // Track changes changelog: [ '2.3: Added confidence score', '2.2: Fixed phone validation edge case', '2.1: Reduced tokens by 40%', '2.0: Complete rewrite for Claude 3' ], // A/B test config testConfig: { enabled: true, variants: ['v2.2', 'v2.3'], metric: 'conversion_rate' } };

Pattern 6: Graceful Degradation

When the model fails, have a fallback:

fallback-chain.ts
async function qualifyWithFallback(input: string) { // Try AI extraction first try { return await qualifyLead(input); } catch (e) { console.log('AI extraction failed, trying regex'); } // Fallback to regex extraction const phone = input.match(/\d{10}/)?.[0] || null; const email = input.match(/[\w.-]+@[\w.-]+/)?.[0] || null; return { name: null, phone, email, property_address: null, motivation: 'unknown', timeline: null, confidence: 0.3, // Low confidence for regex extraction_method: 'regex_fallback' }; }

Production Checklist

  • System prompt defines role, constraints, and output format
  • Few-shot examples cover common cases AND edge cases
  • Zod or similar validates all AI outputs
  • Prompts are versioned with changelogs
  • Token usage is monitored and optimized
  • Fallback extraction exists for failures
  • Failed extractions are logged for iteration
  • A/B testing infrastructure for prompt variants

The difference between a demo and production is error handling. Your prompt will fail. Plan for it.

Related Articles

Designing for Model Failure
Read more โ†’
Building AI Chatbots That Convert
Read more โ†’
The Real Cost of Serverless
Read more โ†’

Need Production AI Systems?

We build AI infrastructure that works at scale.

โ†’ Get Started