07. Ollama Python Client

Chapter 7 of 20 · 20 min

The Python library provides programmatic access to Ollama's API with typed objects and convenience methods. Install it with pip:

pip install ollama

Basic Usage

from ollama import chat
from ollama import ChatResponse

response: ChatResponse = chat(model='llama3.2:1b', messages=[
  {
    'role': 'user',
    'content': 'What is recursion?',
  },
])

print(response.message.content)

The chat function sends a request to http://localhost:11434/api/chat. It waits for the complete response by default.

Streaming Responses

from ollama import chat

stream = chat(model='llama3.2:1b', messages=[
  {'role': 'user', 'content': 'List the planets'}
], stream=True)

for chunk in stream:
    print(chunk['message']['content'], end='', flush=True)
print()

Streaming yields dictionaries as each chunk arrives. Set stream=True for real-time output.

Embeddings

from ollama import embeddings

response = embeddings(model='nomic-embed-text', prompt='Hello world')
embedding = response['embedding']
print(f'Embedding dimension: {len(embedding)}')

Error Handling

import httpx
from ollama import chat

try:
    response = chat(model='nonexistent-model', messages=[
      {'role': 'user', 'content': 'Hello'}
    ])
except httpx.ConnectError:
    print("Cannot connect to Ollama. Is the server running?")
except httpx.HTTPStatusError as e:
    print(f"API error: {e.response.status_code} - {e.response.text}")

The client raises httpx.ConnectError when the server is unreachable and httpx.HTTPStatusError for API-level errors (like requesting a non-existent model).

Client Configuration

from ollama import Client

client = Client(host='http://localhost:11434')
# Or connect to a remote host
client = Client(host='http://192.168.1.100:11434')

response = client.chat(model='llama3.2:1b', messages=[
  {'role': 'user', 'content': 'Hello'}
])

The explicit client lets you target remote Ollama instances. By default, the client connects to http://localhost:11434.

Local verification checkpoint

Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.

EXERCISE

Write a Python script that uses the streaming chat function to echo tokens as they arrive, and measure the time to first token versus time to complete response.