JSON Mode — Prompt Engineering Fundamentals (Chapter 10)

Many models support a JSON mode that constrains output to valid JSON. This reduces parsing errors and ensures structured data regardless of prompt phrasing.

JSON mode is activated differently depending on your inference stack:

Ollama:

curl -X POST http://localhost:11434/api/generate \
  -d '{"model": "llama3.1", "prompt": "...", "format": "json"}'

LM Studio: Enable JSON mode in the chat settings, or use the API:

curl -X POST http://localhost:1234/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"messages": [{"role": "user", "content": "..."}], "response_format": {"type": "json_object"}}'

llama.cpp server:

curl -X POST http://localhost:8080/completion \
  -d '{"prompt": "...", "json_schema": {"type": "object", "properties": {"sentiment": {"type": "string"}}}}'

JSON mode does not enforce your schema—it only ensures the output is parseable JSON. You still need to specify what fields you want:

Return valid JSON with these fields:
- sentiment: "positive" | "negative" | "neutral"
- confidence: number between 0 and 1
- reason: one sentence explaining the classification

Input: [text]

Without field specification, JSON mode produces syntactically valid but semantically wrong output.

Common issues with JSON mode:

Schema drift: The model invents fields not in your spec
Type errors: Model outputs string where number is expected
Incomplete objects: Model truncates output mid-generation

To prevent schema drift, include a schema definition:

Return JSON matching this schema exactly. Do not add fields not in this schema:

{
  "type": "object",
  "properties": {
    "entities": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "name": {"type": "string"},
          "type": {"enum": ["person", "organization", "location"]}
        },
        "required": ["name", "type"]
      }
    }
  },
  "required": ["entities"]
}

Local verification checkpoint

Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.