HOW-TO · INF

How to stop a running model and free up memory

beginner5 minBy Fredoline Eruo
Target environment
Ubuntu 24.04 · Ollama 0.4.x
PREREQUISITES

Ollama running with a model currently loaded in memory

What this does

Unloads an active model from RAM or VRAM, reclaiming system memory for other processes. After completing this guide the model will no longer consume resources on the host.

Steps

  1. Exit interactive mode gracefully. Ends the current inference session without disrupting the server.

    >>> /bye
    

    Expected output: Shell prompt returns; ollama ps shows no running models.

  2. Stop the Ollama service entirely. Halts all loaded models at once and frees all associated memory.

    sudo systemctl stop ollama
    

    Expected output: No output; systemctl status ollama shows "inactive (dead)".

  3. Force immediate model unload via API. Useful when a remote client is holding the model open.

    curl -X POST http://localhost:11434/api/generate -d '{"model": "llama3.2", "keep_alive": 0}'
    

    Expected output: A JSON response with "done": true and the model unloaded.

  • Record the local run evidence. Save the exact command, runtime or package version, model name if applicable, and observed output so the result can be reproduced later.

Verification

ollama ps
# Expected: empty output (no running models)
# Then: free -h shows increased MemAvailable compared to before unloading

Common failures

  • /bye not recognized - Not in an interactive Ollama session; press Ctrl+D or Ctrl+C to exit first.
  • service stop affects other users - All active model sessions terminate simultaneously; notify other users before stopping.
  • keep_alive 0 not honored - Ollama version may require "keep_alive": "0s" as a string instead of integer 0.
  • model reloads immediately - An external tool or script is automatically re-querying the API, restarting the model.

Operator checkpoint

Before treating this as solved, write down the local runtime, model or package version, hardware/backend if relevant, and the verification output. This keeps the guide useful as a Will-It-Run style decision instead of a one-off command transcript.

Related guides

RELATED GUIDES