08. Managing Models with CLI

Chapter 8 of 15 · 20 min

The Ollama CLI works inside WSL2 (Linux binary) and in Windows CMD/PowerShell (native binary). The commands are identical.

List installed models:

ollama list
# NAME                ID              SIZE      MODIFIED
# llama3.2:1b         693f87b4fb18    1.3GB     2026-05-28
# codellama:7b        8c6aaf8e6c25    3.8GB     2026-05-27

Pull a new model:

ollama pull mistral:7b-instruct-q4_0

This downloads the weights file. Progress shows as a percentage with a size estimate. If the download fails midway, ollama pull resumes automatically from the last checkpoint on retry.

Remove a model:

ollama rm codellama:7b

Rename a model (creates a new tag pointing to the same file):

ollama cp llama3.2:1b my-local-finetune

Run a model interactively:

ollama run llama3.2:1b

Inside the REPL, type /bye to exit. Pass a one-shot prompt:

ollama run llama3.2:1b "Explain WSL2 in one sentence"

Check model details (quantization, parameters, context length):

ollama show llama3.2:1b

The --verbose flag on a run command shows timing breakdown:

ollama run llama3.2:1b "Write a Python quicksort" --verbose

If a model fails to load with "model requires more system RAM than is available", check free -h inside WSL2 or Task Manager on Windows. The model size shown by ollama list is the compressed size—uncompressed it may need 2-3x more RAM during inference.

Local verification checkpoint

Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.

EXERCISE

Run ollama list, note a model's size, check your available RAM with free -h (WSL2) or Task Manager (Windows), then run the largest model that fits comfortably and confirm it loads without OOM errors.