Structured Outputs: Making LLMs Reliable for Production

The biggest challenge with LLMs in production isn't capability—it's consistency. A model that gives brilliant, insightful responses 95% of the time but returns unparseable garbage 5% of the time is unusable for production systems. Structured outputs solve this problem by constraining model responses to predictable formats.

This isn't about limiting what LLMs can do. It's about making them reliable enough to build real systems around. When your application expects JSON with specific fields, you need JSON with those specific fields—every single time.

JSON Mode vs Function Calling vs Grammar Constraints

Different providers offer different mechanisms for structured outputs, each with distinct characteristics.

JSON Mode tells the model to output valid JSON, but doesn't constrain the schema. Useful when you need JSON but can handle variable structures.

const response = await openai.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: prompt }],
  response_format: { type: "json_object" }
});

Function Calling provides schema enforcement through tool/function definitions. This is the sweet spot for most applications:

const tools = [{
  type: "function",
  function: {
    name: "extract_contact",
    parameters: {
      type: "object",
      properties: {
        name: { type: "string" },
        email: { type: "string", format: "email" }
      },
      required: ["name"]
    }
  }
}];

Grammar Constraints (llama.cpp, Outlines) provide the strongest guarantees by constraining token generation at the decoding level.

Defining Schemas: Zod, JSON Schema, and Pydantic

Your schema definition should be the source of truth for both generation and validation. Zod is the TypeScript standard:

import { z } from "zod";

const ContactSchema = z.object({
  name: z.string().describe("Full name of the contact"),
  email: z.string().email().optional(),
  phone: z.string().optional()
});

type Contact = z.infer<typeof ContactSchema>;

The .describe() calls are crucial—they become part of the schema sent to the model, helping it understand what each field represents.

Error Handling: When Structured Outputs Fail

Even with structured outputs, failures happen. Robust error handling is essential:

async function extractStructured<T>(
  prompt: string,
  schema: z.ZodType<T>,
  maxRetries = 3
): Promise<T> {
  for (let attempt = 1; attempt <= maxRetries; attempt++) {
    try {
      const response = await callLLM(prompt, schema);
      return schema.parse(response);
    } catch (error) {
      if (attempt === maxRetries) throw error;
      await delay(Math.pow(2, attempt) * 100);
    }
  }
}

Key patterns: retry with backoff, validate at the edge, and log failures for debugging.

Best Practices

Start with function calling for most use cases. Use enums liberally for fixed value sets. Provide examples in descriptions. Keep schemas focused—multiple focused calls are more reliable than one complex call. Test with edge cases.

The investment in structured output infrastructure pays dividends in reliability, debuggability, and developer confidence. When you can trust that LLM calls return exactly what you expect, building sophisticated AI applications becomes dramatically easier.