RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Learn
  4. /Courses
  5. /LangChain for Local AI
  6. /Ch. 2
LangChain for Local AI

02. Ollama LLM Integration

Chapter 2 of 18 · 25 min
KEY INSIGHT

The `langchain-ollama` integration connects LangChain to a local Ollama HTTP server; verify Ollama is running and the model is loaded before initializing `ChatOllama`.

Ollama runs GGUF-quantized models locally as a long-running HTTP server. By default it listens on localhost:11434 and exposes a REST API for chat completions. LangChain's Ollama integration connects to this API using either the ChatOllama class (for chat models) or OllamaLLM class (for legacy completion models). As of LangChain 0.1.x, use the langchain-ollama package rather than the older langchain.llms.ollama path, which was deprecated.

Verify Ollama is running first:

# Check if Ollama backend is available
curl -s http://localhost:11434/api/tags | head -20

If you see JSON listing available models, Ollama is up. If you see a connection error, start it:

# Linux/macOS
ollama serve

# Or start as a background service depending on your init system
sudo systemctl start ollama

Once running, list your installed models:

import json
import subprocess

result = subprocess.run(
    ["ollama", "list"], capture_output=True, text=True
)
print(result.stdout)
# Output looks like:
# NAME                    ID              SIZE      MODIFIED
# llama3.2:3b             a9e1f02f0de8    1.8GB     2024-12-01 10:00:00
# mixtral:8x7b            7d4e0f02f1a9    26GB      2024-11-28 08:00:00

Connect LangChain to the running Ollama instance:

from langchain_ollama import ChatOllama

# Initialize with a model you have installed
llm = ChatOllama(
    model="llama3.2:3b",
    base_url="http://localhost:11434",
    temperature=0.7,
    # Optional: stream all responses
    streaming=True,
)

# Test the connection with a simple invocation
response = llm.invoke("Say hello in exactly three words.")
print(response.content)
# Expected: something like "Hello there, friend." (3 words)

The ChatOllama class returns AIMessage objects (part of langchain_core.messages). This matters because downstream components like ChatPromptTemplate expect the message schema to conform to LangChain's BaseMessage interface.

Common failure modes with the Ollama integration:

Error Cause Fix
ConnectionError: HTTPConnectionPool Ollama not running ollama serve in another terminal
ValueError: model not found Model not pulled ollama pull llama3.2:3b
APIStatusError: 500 Model loaded from previous session, context mismatch ollama ps, then ollama kill <model-id> or restart
Slow first response Cold start loading model into VRAM Keep Ollama running; first call always slow

The 500 error is the most insidious. Ollama reloads the model if the context window size changes between invocations or if the model was evicted. Check its status:

ollama ps
# NAME              ID              SIZE      MODIFIED
# llama3.2:3b       a9e1f02f0de8    2.1GB     2 minutes ago

If the model shows no recent activity and you get 500s, kill and reload:

ollama kill llama3.2:3b
# Then try your LangChain call again—it will reload automatically

For production-style deployments, keep Ollama running as a persistent service rather than on-demand. The model load time dominates latency for interactive applications if you restart Ollama between calls.

EXERCISE

Write a Python script that checks Ollama availability, lists installed models, initializes ChatOllama, and prints a one-sentence response from the model. Handle the ConnectionError case explicitly with a message telling the operator to start Ollama.

← Chapter 1
What is LangChain?
Chapter 3 →
Prompt Templates