07. Systemd Service for AI
Running Ollama or llama.cpp as a background process with nohup is fragile. Systemd manages the process lifecycle, restarts it on failure, collects logs, and enforces resource limits.
Create a systemd service for Ollama:
sudo nano /etc/systemd/system/ollama.service
[Unit]
Description=Ollama Service
After=network-online.target
Wants=network-online.target
[Service]
Type=simple
User=ollama
Group=ollama
ExecStart=/usr/local/bin/ollama serve
Environment="OLLAMA_HOST=0.0.0.0:11434"
Environment="OLLAMA_NUM_PARALLEL=4"
Environment="OLLAMA_GPU_OVERHEAD=0"
Restart=always
RestartSec=10
StandardOutput=journal
StandardError=journal
[Install]
WantedBy=multi-user.target
Create the user before enabling:
sudo useradd --system --no-create-home --shell /usr/sbin/nologin ollama
sudo systemctl daemon-reload
sudo systemctl enable ollama
sudo systemctl start ollama
Verify:
sudo systemctl status ollama
# ● ollama.service - Ollama Service
# Loaded: loaded (/etc/systemd/system/ollama.service; enabled; vendor preset: enabled)
# Active: active (running) since Mon 2026-05-25 10:00:00 UTC; 5s ago
journalctl -u ollama -f # follow logs
Failure mode: Service starts but the process exits immediately with exit code 203. The User=ollama directive references a user that does not exist. Create it first or temporarily use User=root for debugging.
Failure mode: journalctl -u ollama shows Killed with OOM reason. The Ollama process exceeded the memory limit. Default systemd MemoryMax is unlimited unless set. Add MemoryMax=32G under [Service] if the system has limited RAM and you want to protect other services.
Failure mode: Service restarts repeatedly with Restart=always. The underlying cause (e.g., port already in use, CUDA out of memory) is not being resolved before the restart. Add ExecStartPre=/bin/sleep 2 to add a delay and check logs with journalctl -u ollama -n 50 before debugging.
Local verification checkpoint
Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.
Create a systemd service for Ollama, start it, verify it is running with systemctl status, view logs with journalctl -u ollama, then trigger a simulated crash with kill -9 and observe the automatic restart.