How to configure Ollama for concurrent multi-user access
Ollama installed, multiple users or applications
What this does
By default Ollama runs on localhost. This guide configures it as a network service with concurrent request handling, queue management, and user isolation for team use.
Steps
Bind Ollama to all network interfaces.
# Linux/macOS OLLAMA_HOST=0.0.0.0:11434 ollama serveOn Windows:
$env:OLLAMA_HOST="0.0.0.0:11434"; ollama servePersist the binding as a system service.
sudo systemctl edit ollamaAdd:
[Service] Environment="OLLAMA_HOST=0.0.0.0:11434"Then:
sudo systemctl daemon-reload && sudo systemctl restart ollamaSet concurrency limits to prevent resource exhaustion.
sudo systemctl edit ollamaAdd:
Environment="OLLAMA_MAX_CONCURRENT_REQUESTS=8" Environment="OLLAMA_MAX_QUEUE=16" Environment="OLLAMA_NUM_PARALLEL=4"Add authentication via reverse proxy. Install nginx and create a
.htpasswdfile:sudo apt install nginx apache2-utils sudo htpasswd -c /etc/nginx/.htpasswd user1Configure nginx:
server { listen 11435 ssl; location / { auth_basic "Ollama"; auth_basic_user_file /etc/nginx/.htpasswd; proxy_pass http://localhost:11434; } }Test concurrent access from multiple clients.
# Client 1 curl -u user1:pass http://server:11435/api/generate -d '{"model":"llama3","prompt":"Hello"}' # Client 2 curl -u user2:pass http://server:11435/api/generate -d '{"model":"llama3","prompt":"Hi"}'
Verification
# Send parallel requests
curl -s http://server:11434/api/generate -d '{"model":"llama3","prompt":"test"}' &
curl -s http://server:11434/api/generate -d '{"model":"llama3","prompt":"test"}' &
wait
# Expected: Both requests complete without "server busy" errors
Common failures
- Firewall blocking: Ensure port 11434 is open:
sudo ufw allow 11434. - TLS required: For external access, configure SSL via nginx/Caddy. Never expose unauthenticated Ollama to the internet.
- Queue full errors: Increase
OLLAMA_MAX_QUEUEor scale horizontally with a load balancer and multiple Ollama instances.
Operator checkpoint
Before treating this as solved, write down the local runtime, model or package version, hardware/backend if relevant, and the verification output. This keeps the guide useful as a Will-It-Run style decision instead of a one-off command transcript.