08. Managing Models with CLI
The Ollama CLI works inside WSL2 (Linux binary) and in Windows CMD/PowerShell (native binary). The commands are identical.
List installed models:
ollama list
# NAME ID SIZE MODIFIED
# llama3.2:1b 693f87b4fb18 1.3GB 2026-05-28
# codellama:7b 8c6aaf8e6c25 3.8GB 2026-05-27
Pull a new model:
ollama pull mistral:7b-instruct-q4_0
This downloads the weights file. Progress shows as a percentage with a size estimate. If the download fails midway, ollama pull resumes automatically from the last checkpoint on retry.
Remove a model:
ollama rm codellama:7b
Rename a model (creates a new tag pointing to the same file):
ollama cp llama3.2:1b my-local-finetune
Run a model interactively:
ollama run llama3.2:1b
Inside the REPL, type /bye to exit. Pass a one-shot prompt:
ollama run llama3.2:1b "Explain WSL2 in one sentence"
Check model details (quantization, parameters, context length):
ollama show llama3.2:1b
The --verbose flag on a run command shows timing breakdown:
ollama run llama3.2:1b "Write a Python quicksort" --verbose
If a model fails to load with "model requires more system RAM than is available", check free -h inside WSL2 or Task Manager on Windows. The model size shown by ollama list is the compressed size—uncompressed it may need 2-3x more RAM during inference.
Local verification checkpoint
Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.
Run ollama list, note a model's size, check your available RAM with free -h (WSL2) or Task Manager (Windows), then run the largest model that fits comfortably and confirm it loads without OOM errors.