03. First Model

Chapter 3 of 20 · 20 min

Pulling and running your first model confirms the installation works end-to-end. Start with a small model to avoid long download times and memory issues.

Pulling a Model

ollama pull llama3.2:1b

This downloads the model manifest and weights. The 1B parameter variant of Llama 3.2 requires approximately 1.3 GB of disk space and runs comfortably on CPUs with 8 GB of RAM. Larger models like llama3.2:3b require 2 GB and benefit from GPU acceleration.

On Windows, the download progress appears in the terminal. If you see a connection error, check your proxy settings:

$env:HTTPS_PROXY = "http://proxy.example.com:8080"
ollama pull llama3.2:1b

Running Interactively

ollama run llama3.2:1b

This starts an interactive REPL. Type exit or send Ctrl+C to end the session. The first run loads the model into memory, which can take 10-30 seconds on CPU-only systems.

First API Call

With the server running, make a REST request:

curl http://localhost:11434/api/generate -d '{
  "model": "llama3.2:1b",
  "prompt": "What is 2+2?",
  "stream": false
}'

The response JSON contains response, done, and timing fields:

{
  "model": "llama3.2:1b",
  "response": "4",
  "done": true,
  "total_duration": 5234194500,
  "load_duration": 2109842500,
  "prompt_eval_count": 13,
  "prompt_eval_duration": 234000000,
  "eval_count": 6,
  "eval_duration": 2888000000
}

Common failure: the server is not running. If you get Connection refused, start the server with ollama serve in another terminal, or check if a previous instance crashed:

ps aux | grep ollama

Local verification checkpoint

Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.

EXERCISE

Run ollama run llama3.2:1b and ask the model to "List three programming languages." Then make the same request via curl with stream: false and compare the response structure.