04. Multiple Models
Ollama supports running multiple models simultaneously. Each model runs as an independent process, and you can switch between them or query them in parallel via the API.
Listing Installed Models
ollama list
Output shows the model name, ID, size, and last modified date:
NAME ID SIZE MODIFIED
llama3.2:1b 46536d0c3d4d 1.3GB 2024-11-15 10:23:41
llama3.2:3b a3fe2398f87b 2.0GB 2024-11-15 11:45:12
codellama:7b f4e2de43f668 3.8GB 2024-11-14 09:12:33
Running Multiple Interactively
You can run multiple ollama run sessions in separate terminals. Each session consumes memory independently. To free up resources for a new model:
# Stop a running model
ollama stop llama3.2:1b
# Check running models
ollama ps
The ollama ps output shows the active model, its memory usage, and when it was loaded:
NAME ID SIZE PROCESSOR UNTIL
llama3.2:3b a3fe239 2.0GB 100% GPU 5 minutes ago
API Parallel Requests
The REST API handles concurrent requests. With two models running, you can send requests to different models in parallel:
curl http://localhost:11434/api/generate -d '{"model":"llama3.2:1b","prompt":"Hello","stream":false}' &
curl http://localhost:11434/api/generate -d '{"model":"codellama:7b","prompt":"def hello():","stream":false}' &
wait
Ollama queues requests when GPU memory is constrained. If you see degraded performance with multiple models, you may need to stop unused models or adjust memory allocation.
Copying and Removing Models
# Create a copy with a new name
ollama cp llama3.2:1b my-custom-llama
# Remove a model
ollama rm codellama:7b
Removing a model frees disk space immediately. There is no trash folder-deletion is permanent.
Local verification checkpoint
Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.
Pull a second model (like nomic-embed-text), run it via ollama run briefly, then use ollama ps to see both models listed. Stop one with ollama stop and verify the other remains active.