RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Learn
  4. /Courses
  5. /Local AI on Linux
  6. /Ch. 11
Local AI on Linux

11. Kernel Tuning for AI

Chapter 11 of 15 · 20 min
KEY INSIGHT

Setting `overcommit_memory=1`, allocating huge pages, and using a no-op I/O scheduler are the three changes that measurably reduce model loading time and inference latency variance.

Linux kernel parameters affect AI workloads in three measurable ways: memory allocator behavior, CPU scheduler latency, and I/O throughput for model loading.

Memory settings are the highest priority. Model weights are loaded into RAM before GPU inference, and large models (70B+) benefit from huge pages:

# Check current huge page allocation
cat /proc/meminfo | grep -E 'Hugepagesize|HugePages_Total|HugePages_Free'

# Allocate 128 2MB huge pages (256MB)
echo 128 | sudo tee /proc/sys/vm/nr_hugepages

# Make persistent across reboots
sudo bash -c 'cat >> /etc/sysctl.conf << EOF
vm.nr_hugepages = 128
vm.overcommit_memory = 1
vm.overcommit_ratio = 95
EOF'
sudo sysctl -p

overcommit_memory=1 tells the kernel to always allow malloc() to succeed regardless of physical RAM. Without this, loading a 70B parameter model (140GB of float16 weights) may cause OOM killer to terminate the process even though the model is only resident in memory temporarily.

CPU governor for consistent latency:

# Set CPU governor to performance (no frequency scaling latency)
for cpu in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor; do
  echo performance | sudo tee $cpu
done

# Or set for specific cores used for preprocessing
echo performance | sudo tee /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor

I/O scheduler for NVMe model storage:

# Check current scheduler for nvme0n1
cat /sys/block/nvme0n1/queue/scheduler
# Set to none (optimal for NVMe)
echo none | sudo tee /sys/block/nvme0n1/queue/scheduler
# Persist
sudo bash -c 'cat >> /etc/rc.local << EOF
echo none > /sys/block/nvme0n1/queue/scheduler
EOF'

Failure mode: vm.nr_hugepages allocation fails with Cannot allocate memory. Existing huge pages are already in use. Free them with echo 0 | sudo tee /proc/sys/vm/nr_hugepages, then retry. Or the system does not support huge pages (some VPS kernels disable them). Check grep -i humongous /boot/config-$(uname -r).

Failure mode: Setting overcommit_memory=1 causes the system to run out of actual RAM and swap. overcommit_ratio=95 caps the overcommit at 95% of RAM. Monitor with watch -n 5 free -h. If you exceed physical RAM with model weights and CPU-side tensor allocations, you need more RAM or model quantization, not more overcommit.

Failure mode: Changing CPU governor requires root and writes to per-CPU files. tee: /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor: Permission denied means you forgot sudo. Or the CPU does not support frequency scaling (some server CPUs are fixed frequency).

EXERCISE

Set overcommit_memory=1 and vm.overcommit_ratio=90 in sysctl.conf, allocate 64 huge pages, and verify with cat /proc/meminfo. Time model loading before and after and compare.

← Chapter 10
Docker Compose AI Stack
Chapter 12 →
Firewall and Security