Homelab: 24/7 local AI you don't babysit
For: Owners of a homelab box (Proxmox, bare metal, or NAS-adjacent) who want a stable always-on inference service. By the end: A homelab inference service that survives a power cycle, reports its own health, and lets you reach it from anywhere securely.
A homelab inference service is an entirely different problem than a workstation that runs a model when you ask. The box has to survive a power blip at 3am. The fans need to behave when nobody's watching. The model has to come back up after a kernel upgrade without you SSH'ing in. This path walks the eight operational disciplines that turn a stack-on-a-shelf into a service you can ignore.
Pick the chassis and rule out the wrong ones
Used Threadripper, refurb Xeon, prosumer ATX with a 1000W Platinum — all fine. Mini-PCs with a soldered GPU and a laptop-sized PSU — not for 24/7 inference. The rule of thumb: at sustained 80% GPU load, your PSU should run at 60-70% of rated, not 90%. If you can't measure this, you can't run 24/7.
Cooling is the other half. Run your model at full tilt for an hour and watch GPU temp. If it climbs past 83°C in the first 30 minutes you have a thermals problem now and a reliability problem in three months.
Lock the OS, lock the kernel
The single biggest cause of "homelab inference broke overnight" is an automatic kernel upgrade that left the GPU driver behind. Disable unattended-upgrades for kernel packages. When you do upgrade, do it on a schedule you choose, with a rollback plan you've tested.
Distro choice matters less than the discipline. Ubuntu 22.04/24.04 LTS or Debian Stable both work. Avoid rolling distros for an unattended box.
Run inference as a managed service
Tmux sessions are not a service. Screen is not a service. A systemd unit (or Docker container with restart=unless- stopped) is. Pick one and stick to it. The unit file is now part of your homelab definition — version-control it.
Validate by killing the process and watching it come back. Then power-cycle the box and watch the model come up without intervention. If it doesn't, you don't have a homelab service yet.
Add observability before you need it
Set this up while everything is healthy. The whole point of observability is that when something breaks at 2am, you already have the dashboard, the alert, and the baseline. Adding metrics during an incident is not a strategy.
Minimum viable: GPU temperature, GPU memory utilization, request rate, error rate, system load. If your dashboard has fewer than five tiles, it's too sparse. If it has fifty, you'll never look at it.
Power and UPS discipline
A 4090 + Threadripper at full tilt pulls 700-900W. Your UPS sizing must include real margin — a "1000VA" UPS is closer to 600W usable. The job of the UPS is not to keep the box running through an outage; it's to give you 5 clean minutes for an automatic shutdown.
Test it. With the model loaded and serving requests, pull the wall plug and watch what happens. If the box hard- crashes, you don't have a UPS, you have a heavy paperweight.
Remote access without exposing the model port
Do not port-forward 8000 to your model. Don't put it behind a basic-auth proxy and call it secure. Use a mesh VPN — Tailscale is the path of least resistance, WireGuard if you want full control. Then your phone, your laptop, and your other boxes reach the model on a private address that simply doesn't exist on the public internet.
Verify with a port scanner from outside your network. If 8000 responds, you did it wrong; go back.
Restart discipline and rollback plan
"It works, don't touch it" is not a maintenance plan. A real plan: schedule monthly maintenance windows, snapshot the system before changes (LVM, ZFS, or just a config backup), test rollback before you ever need it. The difference between a homelab and a "machine that runs sometimes" is exactly this discipline.
Document the runbook
Future-you, three months from now, has forgotten exactly why the GPU power limit is set to 350W and why the inference service depends on the local DNS server starting first. Document it. The runbook is for that future-you.
If the box dies and you have to rebuild on new hardware, the runbook is the spec. Treat it like infrastructure code, because it is.
Next recommended step
The cadence questions: weekly health checks, monthly upgrades, quarterly hardware inspections.