Ollama LLM Integration — LangChain for Local AI (Chapter 2)

Ollama runs GGUF-quantized models locally as a long-running HTTP server. By default it listens on localhost:11434 and exposes a REST API for chat completions. LangChain's Ollama integration connects to this API using either the ChatOllama class (for chat models) or OllamaLLM class (for legacy completion models). As of LangChain 0.1.x, use the langchain-ollama package rather than the older langchain.llms.ollama path, which was deprecated.

Verify Ollama is running first:

# Check if Ollama backend is available
curl -s http://localhost:11434/api/tags | head -20

If you see JSON listing available models, Ollama is up. If you see a connection error, start it:

# Linux/macOS
ollama serve

# Or start as a background service depending on your init system
sudo systemctl start ollama

Once running, list your installed models:

import json
import subprocess

result = subprocess.run(
    ["ollama", "list"], capture_output=True, text=True
)
print(result.stdout)
# Output looks like:
# NAME                    ID              SIZE      MODIFIED
# llama3.2:3b             a9e1f02f0de8    1.8GB     2024-12-01 10:00:00
# mixtral:8x7b            7d4e0f02f1a9    26GB      2024-11-28 08:00:00

Connect LangChain to the running Ollama instance:

from langchain_ollama import ChatOllama

# Initialize with a model you have installed
llm = ChatOllama(
    model="llama3.2:3b",
    base_url="http://localhost:11434",
    temperature=0.7,
    # Optional: stream all responses
    streaming=True,
)

# Test the connection with a simple invocation
response = llm.invoke("Say hello in exactly three words.")
print(response.content)
# Expected: something like "Hello there, friend." (3 words)

The ChatOllama class returns AIMessage objects (part of langchain_core.messages). This matters because downstream components like ChatPromptTemplate expect the message schema to conform to LangChain's BaseMessage interface.

Common failure modes with the Ollama integration:

Error	Cause	Fix
`ConnectionError: HTTPConnectionPool`	Ollama not running	`ollama serve` in another terminal
`ValueError: model not found`	Model not pulled	`ollama pull llama3.2:3b`
`APIStatusError: 500`	Model loaded from previous session, context mismatch	`ollama ps`, then `ollama kill <model-id>` or restart
Slow first response	Cold start loading model into VRAM	Keep Ollama running; first call always slow

The 500 error is the most insidious. Ollama reloads the model if the context window size changes between invocations or if the model was evicted. Check its status:

ollama ps
# NAME              ID              SIZE      MODIFIED
# llama3.2:3b       a9e1f02f0de8    2.1GB     2 minutes ago

If the model shows no recent activity and you get 500s, kill and reload:

ollama kill llama3.2:3b
# Then try your LangChain call again—it will reload automatically

For production-style deployments, keep Ollama running as a persistent service rather than on-demand. The model load time dominates latency for interactive applications if you restart Ollama between calls.