06. Ollama REST API
The REST API exposes Ollama's functionality over HTTP. By default, it listens on port 11434. All endpoints accept JSON payloads and return JSON responses.
Core Endpoints
Generate endpoint (POST /api/generate):
curl http://localhost:11434/api/generate -d '{
"model": "llama3.2:1b",
"prompt": "Write a haiku about programming",
"stream": false
}'
The stream parameter defaults to true. Setting it to false returns the complete response in one JSON object. Streaming returns newline-delimited JSON objects as tokens are generated.
Chat endpoint (POST /api/chat):
curl http://localhost:11434/api/chat -d '{
"model": "llama3.2:1b",
"messages": [
{"role": "user", "content": "What is a kernel?"},
{"role": "assistant", "content": "A kernel is the core of an operating system."},
{"role": "user", "content": "Give an example"}
],
"stream": false
}'
The chat endpoint maintains conversation history in the request body. For stateless behavior, send only the latest message.
Embeddings endpoint (POST /api/embeddings):
curl http://localhost:11434/api/embeddings -d '{
"model": "nomic-embed-text",
"prompt": "The quick brown fox jumps over the lazy dog"
}'
Returns a vector of floating-point numbers representing the semantic embedding of the input text.
API Server Configuration
By default, the API binds to 127.0.0.1:11434. To make it accessible from other machines, set the OLLAMA_HOST environment variable:
# Linux/macOS - bind to all interfaces
export OLLAMA_HOST=0.0.0.0
ollama serve
# Windows PowerShell
$env:OLLAMA_HOST = "0.0.0.0"
ollama serve
This exposes the API on all network interfaces. For production deployments, put the service behind a reverse proxy with TLS and authentication.
Streaming Responses
Streaming returns data incrementally. Each chunk is a JSON object:
curl http://localhost:11434/api/generate -d '{
"model": "llama3.2:1b",
"prompt": "Count to 5",
"stream": true
}'
Output:
{"model":"llama3.2:1b","created_at":"2024-11-15T10:00:00Z","response":"1","done":false}
{"model":"llama3.2:1b","created_at":"2024-11-15T10:00:00Z","response":"2","done":false}
{"model":"llama3.2:1b","created_at":"2024-11-15T10:00:00Z","response":"3","done":false}
{"model":"llama3.2:1b","created_at":"2024-11-15T10:00:00Z","response":"...4...5","done":true}
Use curl to send a chat message to Ollama with a system prompt embedded in the messages array. Verify the response respects the system prompt by checking the output style.