RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Learn
  4. /Courses
  5. /Local AI APIs and Integration
  6. /Ch. 2
Local AI APIs and Integration

02. OpenAI API Format

Chapter 2 of 18 · 15 min
KEY INSIGHT

Understanding the exact JSON structure of OpenAI API requests and responses is essential for building compatible endpoints. The format is well-documented, but subtle details like null handling, default values, and field naming conventions cause most compatibility issues in practice. ### Request Structure A chat completions request carries a messages array, model identifier, and several optional parameters. The messages array contains objects with `role` and `content` fields. Roles include `system`, `user`, and `assistant`. Each role instructs the model behavior differently. ```json { "model": "llama3.2:latest", "messages": [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Explain API design."} ], "temperature": 0.7, "max_tokens": 512, "stream": false } ``` The `model` field identifies which model should process the request. In a local setup, this string might map to a local model file or a container image. The API layer is responsible for resolving this identifier. ### Response Structure A non-streaming response follows this structure: ```json { "id": "chatcmpl-abc123", "object": "chat.completion", "created": 1700000000, "model": "llama3.2:latest", "choices": [ { "index": 0, "message": { "role": "assistant", "content": "API design involves..." }, "finish_reason": "stop" } ], "usage": { "prompt_tokens": 20, "completion_tokens": 45, "total_tokens": 65 } } ``` The `finish_reason` field indicates why generation stopped. Common values are `stop` (natural completion), `length` (hit max_tokens), and `content_filter` (content flagged). Always include usage statistics even for local models. Clients rely on token counts for cost tracking and analytics. ### Common Compatibility Pitfalls Omitting the `usage` field breaks clients that expect to track token consumption. Using inconsistent field casing (camelCase vs snake_case) breaks clients that parse based on schema expectations. Returning `finish_reason: "stop"` with incorrect casing will cause validation failures in strict clients.

Local verification checkpoint

Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.

EXERCISE

Create a Python dictionary that matches the chat completions response structure. Include all required fields with realistic placeholder values. Then validate it against the JSON schema for chat completions responses.

← Chapter 1
API Design Principles
Chapter 3 →
FastAPI Basics