How to stop a running model and free up memory
Ollama running with a model currently loaded in memory
What this does
Unloads an active model from RAM or VRAM, reclaiming system memory for other processes. After completing this guide the model will no longer consume resources on the host.
Steps
Exit interactive mode gracefully. Ends the current inference session without disrupting the server.
>>> /byeExpected output: Shell prompt returns;
ollama psshows no running models.Stop the Ollama service entirely. Halts all loaded models at once and frees all associated memory.
sudo systemctl stop ollamaExpected output: No output;
systemctl status ollamashows "inactive (dead)".Force immediate model unload via API. Useful when a remote client is holding the model open.
curl -X POST http://localhost:11434/api/generate -d '{"model": "llama3.2", "keep_alive": 0}'Expected output: A JSON response with
"done": trueand the model unloaded.
- Record the local run evidence. Save the exact command, runtime or package version, model name if applicable, and observed output so the result can be reproduced later.
Verification
ollama ps
# Expected: empty output (no running models)
# Then: free -h shows increased MemAvailable compared to before unloading
Common failures
/bye not recognized- Not in an interactive Ollama session; press Ctrl+D or Ctrl+C to exit first.service stop affects other users- All active model sessions terminate simultaneously; notify other users before stopping.keep_alive 0 not honored- Ollama version may require"keep_alive": "0s"as a string instead of integer 0.model reloads immediately- An external tool or script is automatically re-querying the API, restarting the model.
Operator checkpoint
Before treating this as solved, write down the local runtime, model or package version, hardware/backend if relevant, and the verification output. This keeps the guide useful as a Will-It-Run style decision instead of a one-off command transcript.