RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Learn
  4. /Courses
  5. /Ollama — Installation to Mastery
  6. /Ch. 6
Ollama — Installation to Mastery

06. Ollama REST API

Chapter 6 of 20 · 20 min
KEY INSIGHT

The API is stateless per request (except `/api/chat` which requires conversation history in the payload). For persistent sessions, implement client-side history management.

The REST API exposes Ollama's functionality over HTTP. By default, it listens on port 11434. All endpoints accept JSON payloads and return JSON responses.

Core Endpoints

Generate endpoint (POST /api/generate):

curl http://localhost:11434/api/generate -d '{
  "model": "llama3.2:1b",
  "prompt": "Write a haiku about programming",
  "stream": false
}'

The stream parameter defaults to true. Setting it to false returns the complete response in one JSON object. Streaming returns newline-delimited JSON objects as tokens are generated.

Chat endpoint (POST /api/chat):

curl http://localhost:11434/api/chat -d '{
  "model": "llama3.2:1b",
  "messages": [
    {"role": "user", "content": "What is a kernel?"},
    {"role": "assistant", "content": "A kernel is the core of an operating system."},
    {"role": "user", "content": "Give an example"}
  ],
  "stream": false
}'

The chat endpoint maintains conversation history in the request body. For stateless behavior, send only the latest message.

Embeddings endpoint (POST /api/embeddings):

curl http://localhost:11434/api/embeddings -d '{
  "model": "nomic-embed-text",
  "prompt": "The quick brown fox jumps over the lazy dog"
}'

Returns a vector of floating-point numbers representing the semantic embedding of the input text.

API Server Configuration

By default, the API binds to 127.0.0.1:11434. To make it accessible from other machines, set the OLLAMA_HOST environment variable:

# Linux/macOS - bind to all interfaces
export OLLAMA_HOST=0.0.0.0
ollama serve

# Windows PowerShell
$env:OLLAMA_HOST = "0.0.0.0"
ollama serve

This exposes the API on all network interfaces. For production deployments, put the service behind a reverse proxy with TLS and authentication.

Streaming Responses

Streaming returns data incrementally. Each chunk is a JSON object:

curl http://localhost:11434/api/generate -d '{
  "model": "llama3.2:1b",
  "prompt": "Count to 5",
  "stream": true
}'

Output:

{"model":"llama3.2:1b","created_at":"2024-11-15T10:00:00Z","response":"1","done":false}
{"model":"llama3.2:1b","created_at":"2024-11-15T10:00:00Z","response":"2","done":false}
{"model":"llama3.2:1b","created_at":"2024-11-15T10:00:00Z","response":"3","done":false}
{"model":"llama3.2:1b","created_at":"2024-11-15T10:00:00Z","response":"...4...5","done":true}
EXERCISE

Use curl to send a chat message to Ollama with a system prompt embedded in the messages array. Verify the response respects the system prompt by checking the output style.

← Chapter 5
Modelfiles: Customizing Models
Chapter 7 →
Ollama Python Client