COURSE · FND · B009
Local AI on Linux
Learn local ai on linux through RunLocalAI's practical lens: linux, cuda, server and docker, hardware fit, runtime settings, verification habits and local-vs-cloud tradeoffs.
PREREQUISITES
- B001
- B003
Course B009: Local AI on Linux
Why this course exists
Linux is the only OS where you control the full stack: kernel parameters, driver versions, CUDA/ROCm compatibility, container runtimes, and service lifecycles. macOS ties you to Apple Silicon. Windows ties you to whatever NVIDIA and Microsoft decide. Linux gives you the hardware.
This course builds a Linux-based AI environment from the metal up. You will install GPU drivers, compile inference engines, containerize AI workloads, and set up production-grade remote access and monitoring. Every chapter uses real commands against real hardware scenarios.
What you will know after
- Install and verify NVIDIA or AMD GPU drivers on Ubuntu and Fedora
- Compile llama.cpp with hardware-specific SIMD flags and measure throughput
- Run Ollama as a systemd service with proper resource limits
- Deploy AI stacks with Docker Compose including GPU passthrough
- Tune kernel parameters for AI workloads and secure remote access
CHAPTERS
- 01Why Linux for AILinux is the only OS where you control every layer from the kernel ABI to the container runtime, giving you full GPU access with zero virtualization overhead.10 min
- 02NVIDIA Driver InstallationThe Ubuntu `.deb` driver package is preferred because it integrates with DKMS and survives kernel updates automatically.20 min
- 03CUDA Toolkit Setup`nvidia-smi` reports the driver's maximum CUDA version, not the installed toolkit version—these are two separate installations that must be compatible.20 min
- 04ROCm for AMD GPUsROCm exposes GPUs through `/dev/kfd` and requires membership in the `video` and `render` groups—not through `nvidia-smi` or `/dev/nvidia*`.20 min
- 05Ollama on Linux`ollama serve` runs as a foreground process by default—production deployments require a service manager like systemd, covered in Chapter 7.20 min
- 06llama.cpp from SourceCompiling llama.cpp with `-DLLAMA_CUDA=ON` and measuring throughput before/after GPU offloading quantifies exactly how much the GPU accelerates your specific model and hardware.20 min
- 07Systemd Service for AISystemd turns a manual process run into a managed service with automatic restart, resource limits, and log collection—essential for server deployments.20 min
- 08Headless Server SetupHeadless servers require explicit GPU persistence mode, SSH key authentication, and thermal management—none of these are configured by default.20 min
- 09Docker on Linux for AIDocker with `nvidia-container-toolkit` exposes GPU devices inside containers by mounting `/dev/nvidia*` and injecting the NVIDIA libraries through the OCI hooks system—no emulation, no performance loss.20 min
- 10Docker Compose AI StackDocker Compose lets you compose GPU inference, rate limiting, and monitoring into a single declarative stack where all services restart together and logs are centrally accessible.20 min
- 11Kernel Tuning for AISetting `overcommit_memory=1`, allocating huge pages, and using a no-op I/O scheduler are the three changes that measurably reduce model loading time and inference latency variance.20 min
- 12Firewall and SecurityDocker publishes ports in iptables directly, bypassing UFW unless containers bind to 127.0.0.1 or UFW after.rules are configured.20 min
- 13Remote Access SetupSSH tunnels encrypt all traffic and bypass firewall restrictions on individual ports by routing everything through port 22—essential for accessing GPU monitoring UIs on remote servers.20 min
- 14Monitoring AI ServicesGPU temperature, memory utilization, and inference latency form the minimum viable monitoring stack—you cannot optimize what you do not measure.20 min
- 15Linux AI Workstation ProjectA production AI workstation is built by layering GPU drivers, runtime, inference engine, container orchestration, monitoring, and security—each layer must be verified independently before integrating the next.25 min