03. First Model
Pulling and running your first model confirms the installation works end-to-end. Start with a small model to avoid long download times and memory issues.
Pulling a Model
ollama pull llama3.2:1b
This downloads the model manifest and weights. The 1B parameter variant of Llama 3.2 requires approximately 1.3 GB of disk space and runs comfortably on CPUs with 8 GB of RAM. Larger models like llama3.2:3b require 2 GB and benefit from GPU acceleration.
On Windows, the download progress appears in the terminal. If you see a connection error, check your proxy settings:
$env:HTTPS_PROXY = "http://proxy.example.com:8080"
ollama pull llama3.2:1b
Running Interactively
ollama run llama3.2:1b
This starts an interactive REPL. Type exit or send Ctrl+C to end the session. The first run loads the model into memory, which can take 10-30 seconds on CPU-only systems.
First API Call
With the server running, make a REST request:
curl http://localhost:11434/api/generate -d '{
"model": "llama3.2:1b",
"prompt": "What is 2+2?",
"stream": false
}'
The response JSON contains response, done, and timing fields:
{
"model": "llama3.2:1b",
"response": "4",
"done": true,
"total_duration": 5234194500,
"load_duration": 2109842500,
"prompt_eval_count": 13,
"prompt_eval_duration": 234000000,
"eval_count": 6,
"eval_duration": 2888000000
}
Common failure: the server is not running. If you get Connection refused, start the server with ollama serve in another terminal, or check if a previous instance crashed:
ps aux | grep ollama
Local verification checkpoint
Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.
Run ollama run llama3.2:1b and ask the model to "List three programming languages." Then make the same request via curl with stream: false and compare the response structure.