10. JSON Mode
Many models support a JSON mode that constrains output to valid JSON. This reduces parsing errors and ensures structured data regardless of prompt phrasing.
JSON mode is activated differently depending on your inference stack:
Ollama:
curl -X POST http://localhost:11434/api/generate \
-d '{"model": "llama3.1", "prompt": "...", "format": "json"}'
LM Studio: Enable JSON mode in the chat settings, or use the API:
curl -X POST http://localhost:1234/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"messages": [{"role": "user", "content": "..."}], "response_format": {"type": "json_object"}}'
llama.cpp server:
curl -X POST http://localhost:8080/completion \
-d '{"prompt": "...", "json_schema": {"type": "object", "properties": {"sentiment": {"type": "string"}}}}'
JSON mode does not enforce your schema—it only ensures the output is parseable JSON. You still need to specify what fields you want:
Return valid JSON with these fields:
- sentiment: "positive" | "negative" | "neutral"
- confidence: number between 0 and 1
- reason: one sentence explaining the classification
Input: [text]
Without field specification, JSON mode produces syntactically valid but semantically wrong output.
Common issues with JSON mode:
- Schema drift: The model invents fields not in your spec
- Type errors: Model outputs string where number is expected
- Incomplete objects: Model truncates output mid-generation
To prevent schema drift, include a schema definition:
Return JSON matching this schema exactly. Do not add fields not in this schema:
{
"type": "object",
"properties": {
"entities": {
"type": "array",
"items": {
"type": "object",
"properties": {
"name": {"type": "string"},
"type": {"enum": ["person", "organization", "location"]}
},
"required": ["name", "type"]
}
}
},
"required": ["entities"]
}
Local verification checkpoint
Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.
Configure JSON mode in your inference stack and test a prompt with complex output requirements. Validate the JSON structure programmatically.