Headless Server Setup — Local AI on Linux (Chapter 8)

A headless AI server runs without a monitor, keyboard, or display server. SSH access, GPU persistence mode, and thermal management are the critical configuration areas.

SSH configuration for remote access:

sudo nano /etc/ssh/sshd_config

Port 22
PermitRootLogin no
PasswordAuthentication no
PubkeyAuthentication yes
MaxAuthTries 3
ClientAliveInterval 60
ClientAliveCountMax 3

Enforce key-based authentication:

ssh-copy-id user@your-server-ip
sudo systemctl restart sshd

GPU persistence mode keeps GPU power state active so subsequent inferences do not suffer a cold-start delay:

sudo nvidia-smi -pm ENABLED
# Enabled persistence mode for GPU 0
# Enabled persistence mode for GPU 1

Add this to /etc/rc.local or a systemd oneshot service so it survives reboots:

sudo nano /etc/systemd/system/gpu-persistence.service

[Unit]
Description=GPU Persistence Mode

[Service]
Type=oneshot
ExecStart=/usr/bin/nvidia-smi -pm ENABLED
RemainAfterExit=yes

[Install]
WantedBy=multi-user.target

Thermal management: AI workloads generate sustained heat. Set power limits and fan curves:

# Set max power to 300W instead of default (RTX 3090 default is 350W)
nvidia-smi -i 0 -pl 300
# Check current temperature and power draw
nvidia-smi -q -d TEMPERATURE,POWER

Failure mode: SSH works but nvidia-smi returns No devices found when logged in over SSH. The display session variable DISPLAY is not set and some nvidia-smi invocations fail when the X server is not accessible. Always use DISPLAY= nvidia-smi (empty value before the command) or invoke from a script that unsets DISPLAY.

Failure mode: GPU persistence mode silently fails on multi-GPU systems where one GPU has a different driver version. Check nvidia-smi -q | grep 'Persistence Mode' for each GPU individually with -i 0, -i 1.

Local verification checkpoint

Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.