08. Headless Server Setup
A headless AI server runs without a monitor, keyboard, or display server. SSH access, GPU persistence mode, and thermal management are the critical configuration areas.
SSH configuration for remote access:
sudo nano /etc/ssh/sshd_config
Port 22
PermitRootLogin no
PasswordAuthentication no
PubkeyAuthentication yes
MaxAuthTries 3
ClientAliveInterval 60
ClientAliveCountMax 3
Enforce key-based authentication:
ssh-copy-id user@your-server-ip
sudo systemctl restart sshd
GPU persistence mode keeps GPU power state active so subsequent inferences do not suffer a cold-start delay:
sudo nvidia-smi -pm ENABLED
# Enabled persistence mode for GPU 0
# Enabled persistence mode for GPU 1
Add this to /etc/rc.local or a systemd oneshot service so it survives reboots:
sudo nano /etc/systemd/system/gpu-persistence.service
[Unit]
Description=GPU Persistence Mode
[Service]
Type=oneshot
ExecStart=/usr/bin/nvidia-smi -pm ENABLED
RemainAfterExit=yes
[Install]
WantedBy=multi-user.target
Thermal management: AI workloads generate sustained heat. Set power limits and fan curves:
# Set max power to 300W instead of default (RTX 3090 default is 350W)
nvidia-smi -i 0 -pl 300
# Check current temperature and power draw
nvidia-smi -q -d TEMPERATURE,POWER
Failure mode: SSH works but nvidia-smi returns No devices found when logged in over SSH. The display session variable DISPLAY is not set and some nvidia-smi invocations fail when the X server is not accessible. Always use DISPLAY= nvidia-smi (empty value before the command) or invoke from a script that unsets DISPLAY.
Failure mode: GPU persistence mode silently fails on multi-GPU systems where one GPU has a different driver version. Check nvidia-smi -q | grep 'Persistence Mode' for each GPU individually with -i 0, -i 1.
Local verification checkpoint
Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.
Configure SSH key-only access, enable GPU persistence mode as a systemd service, set a power limit 20% below default, and verify both configurations persist across a reboot.