Systemd Service for AI — Local AI on Linux (Chapter 7)

Running Ollama or llama.cpp as a background process with nohup is fragile. Systemd manages the process lifecycle, restarts it on failure, collects logs, and enforces resource limits.

Create a systemd service for Ollama:

sudo nano /etc/systemd/system/ollama.service

[Unit]
Description=Ollama Service
After=network-online.target
Wants=network-online.target

[Service]
Type=simple
User=ollama
Group=ollama
ExecStart=/usr/local/bin/ollama serve
Environment="OLLAMA_HOST=0.0.0.0:11434"
Environment="OLLAMA_NUM_PARALLEL=4"
Environment="OLLAMA_GPU_OVERHEAD=0"
Restart=always
RestartSec=10
StandardOutput=journal
StandardError=journal

[Install]
WantedBy=multi-user.target

Create the user before enabling:

sudo useradd --system --no-create-home --shell /usr/sbin/nologin ollama
sudo systemctl daemon-reload
sudo systemctl enable ollama
sudo systemctl start ollama

Verify:

sudo systemctl status ollama
# ● ollama.service - Ollama Service
#    Loaded: loaded (/etc/systemd/system/ollama.service; enabled; vendor preset: enabled)
#    Active: active (running) since Mon 2026-05-25 10:00:00 UTC; 5s ago
journalctl -u ollama -f  # follow logs

Failure mode: Service starts but the process exits immediately with exit code 203. The User=ollama directive references a user that does not exist. Create it first or temporarily use User=root for debugging.

Failure mode: journalctl -u ollama shows Killed with OOM reason. The Ollama process exceeded the memory limit. Default systemd MemoryMax is unlimited unless set. Add MemoryMax=32G under [Service] if the system has limited RAM and you want to protect other services.

Failure mode: Service restarts repeatedly with Restart=always. The underlying cause (e.g., port already in use, CUDA out of memory) is not being resolved before the restart. Add ExecStartPre=/bin/sleep 2 to add a delay and check logs with journalctl -u ollama -n 50 before debugging.

Local verification checkpoint

Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.