02. OpenAI API Format

Chapter 2 of 18 · 15 min

KEY INSIGHT

Understanding the exact JSON structure of OpenAI API requests and responses is essential for building compatible endpoints. The format is well-documented, but subtle details like null handling, default values, and field naming conventions cause most compatibility issues in practice. ### Request Structure A chat completions request carries a messages array, model identifier, and several optional parameters. The messages array contains objects with `role` and `content` fields. Roles include `system`, `user`, and `assistant`. Each role instructs the model behavior differently. ```json { "model": "llama3.2:latest", "messages": [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Explain API design."} ], "temperature": 0.7, "max_tokens": 512, "stream": false } ``` The `model` field identifies which model should process the request. In a local setup, this string might map to a local model file or a container image. The API layer is responsible for resolving this identifier. ### Response Structure A non-streaming response follows this structure: ```json { "id": "chatcmpl-abc123", "object": "chat.completion", "created": 1700000000, "model": "llama3.2:latest", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "API design involves..." }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 20, "completion_tokens": 45, "total_tokens": 65 } } ``` The `finish_reason` field indicates why generation stopped. Common values are `stop` (natural completion), `length` (hit max_tokens), and `content_filter` (content flagged). Always include usage statistics even for local models. Clients rely on token counts for cost tracking and analytics. ### Common Compatibility Pitfalls Omitting the `usage` field breaks clients that expect to track token consumption. Using inconsistent field casing (camelCase vs snake_case) breaks clients that parse based on schema expectations. Returning `finish_reason: "stop"` with incorrect casing will cause validation failures in strict clients.

Local verification checkpoint

Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.

EXERCISE

Create a Python dictionary that matches the chat completions response structure. Include all required fields with realistic placeholder values. Then validate it against the JSON schema for chat completions responses.