COURSE · FND · B007

Local AI on Windows

Learn local ai on windows through RunLocalAI's practical lens: windows, wsl2, cuda and docker, hardware fit, runtime settings, verification habits and local-vs-cloud tradeoffs.

15 chapters5hFoundations trackBy Fredoline Eruo
PREREQUISITES
  • B001
  • B003

Course B007: Local AI on Windows

Why this course exists

Running local AI on Windows is harder than it should be. The ecosystem splits across WSL2, native Windows binaries, Docker containers, and GUI tools that do not always agree on where files live or how memory gets allocated. This course builds a working local AI environment from scratch on Windows hardware, with concrete configuration values and commands you can copy-paste. We cover the setups that break, why they break, and how to fix them in under 15 minutes.

What you will know after

  • Install and configure WSL2 with GPU passthrough for CUDA workloads
  • Run Ollama, Docker AI containers, LM Studio, and Open WebUI on the same machine without port conflicts
  • Diagnose and fix the most common WSL2 failure modes: networking, memory leaks, GPU detection
  • Tune Windows Defender, power settings, and memory limits to prevent runtime crashes during long inference sessions
  • Configure PATH and environment variables so all tools are reachable from every shell
CHAPTERS
  1. 01Introduction to Local AI on WindowsWindows AI setup is a three-layer problem (Windows host, WSL2 layer, and tool runtime) and most failures occur at the boundaries between layers.10 min
  2. 02WSL2 Installation and ConfigurationWSL2 runs a real Linux kernel as a VM—it is not a compatibility layer. This means kernel version matters, and updating WSL2 requires a Windows driver update, not an `apt upgrade`.20 min
  3. 03WSL2 Memory and Performance TuningThe `.wslconfig` file is the single most important tuning knob for WSL2 stability—set memory and swap before running any AI workload on a machine with less than 32 GB RAM.20 min
  4. 04NVIDIA CUDA Setup with WSL2WSL2 CUDA is not emulation—it is direct GPU access through the NVIDIA driver—but the WSL2 driver and Windows driver are separate packages with separate version numbers and separate update cycles.20 min
  5. 05Docker Desktop for WindowsDocker Desktop on Windows runs as a WSL2 distribution named "docker-desktop". Its resource limits are independent of `.wslconfig`—you must tune both if you want predictable resource allocation across all Linux tooling.20 min
  6. 06Running AI Containers in DockerGPU access inside a Docker container depends on three components being correctly installed and connected: the NVIDIA WSL2 driver, the NVIDIA Container Toolkit, and the `--gpus all` flag in the docker run command. Any one missing breaks GPU inference.20 min
  7. 07Native Ollama on WindowsRunning Ollama natively on Windows and inside Docker simultaneously is possible but requires explicit port management—default port 11434 always goes to whichever starts first.20 min
  8. 08Managing Models with CLI`ollama list` shows the compressed model size, not the inference RAM requirement. A 4 GB Q4 model needs 6-8 GB of free RAM to run without swapping.20 min
  9. 09LM Studio GUI AlternativeLM Studio's local server mimics the OpenAI API format, which makes it a drop-in replacement for applications designed for the OpenAI API—but you must start the server manually before those applications will connect.15 min
  10. 10Path ConfigurationThe Windows PATH and the WSL2 Linux PATH are completely separate namespaces. Executables installed in one are invisible to the other unless you use cross-shell invocation commands like `wsl` or `/mnt/c/`.20 min
  11. 11Antivirus ConsiderationsAntivirus real-time scanning is the most common cause of slow model downloads and intermittent binary execution failures on Windows. Adding exclusions for model storage and binary directories eliminates the problem entirely.20 min
  12. 12WSL2 TroubleshootingMost WSL2 failures resolve by shutting down WSL2 (`wsl --shutdown`), fixing the underlying issue, and restarting. WSL2 has no persistent state beyond the VHDX file—it is safe to restart.20 min
  13. 13Performance OptimizationFor models that fit in GPU VRAM, NVMe storage and RAM speed matter more than raw GPU speed because the bottleneck is data transfer, not compute.20 min
  14. 14Open WebUI on WindowsOpen WebUI running inside Docker connects to Ollama through `host.docker.internal`, which routes to the Windows host. If Ollama runs natively on Windows, the connection succeeds without additional configuration.20 min
  15. 15Windows AI Tools EcosystemAll these tools compete for the same GPU VRAM and system RAM. Running more than one simultaneously without proper resource limits causes OOM crashes. Designate one as the primary runtime and stop others before switching contexts.20 min