WSL2 Memory and Performance Tuning — Local AI on Windows (Chapter 3)

WSL2 by default consumes up to 50% of your system RAM or 8 GB, whichever is lower. On a machine with 32 GB this is fine. On a machine with 16 GB running Ollama, Docker, and a browser simultaneously, WSL2 memory growth will eventually hit the limit, trigger the Linux OOM killer, and crash your inference session without warning.

Check current memory use inside WSL2:

free -h
#               total        used        free      shared  buff/cache   available
# Mem:           7.7Gi       3.2Gi       4.4Gi       0.0Ki       1.0Gi       4.4Gi

The .wslconfig file at C:\Users\YOUR_USERNAME\.wslconfig controls WSL2's memory and CPU allocation. Create it if it does not exist:

[wsl2]
memory=12GB
processors=8
swap=8GB
localhostForwarding=true

This limits WSL2 to 12 GB RAM with 8 GB swap on the WSL2 virtual disk. The swap file lives in %USERPROFILE%\AppData\Local\wsl\swap.vhdx and grows dynamically. If the swap file reaches 100% and WSL2 cannot expand it (disk full or quota reached), you get silent failures.

Apply changes by shutting down WSL2:

wsl --shutdown

Then reopen Ubuntu. The new limits take effect immediately.

Performance-critical setting: localhostForwarding=true lets you access WSL2 services (Ollama on port 11434, for example) from the Windows host browser at http://localhost:11434. If this is false, you must use the WSL2 internal IP address, which changes after every wsl --shutdown.

To find the current WSL2 IP:

hostname -I | awk '{print $1}'

Store this in an environment variable in PowerShell:

$env:WSL_IP = (wsl.exe hostname -I | ForEach-Object { $_.Split(' ')[0] })

Local verification checkpoint

Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.