System RAM Benefits — Hardware Planning for Local AI (Chapter 9)

System RAM plays a crucial role even in GPU-accelerated systems. Understanding when RAM matters prevents bottlenecks.

RAM Requirements by Configuration

Configuration	Minimum RAM	Recommended RAM
GPU inference, 7B	16GB	32GB
GPU inference, 13B	32GB	64GB
GPU inference, 34B	64GB	128GB
CPU-only, 7B	16GB	32GB
CPU-only, 13B	32GB	64GB

When RAM Bottlenecks Occur

RAM becomes critical during:

Model loading: Weights transferred from storage to RAM before GPU upload
CPU preprocessing: Tokenization, prompt processing
Quantization operations: CPU-based weight conversion
Multi-model serving: Multiple models in RAM simultaneously
Large context handling: Context buffer management

RAM Specifications for AI Workloads

Frequency: DDR5-5600 vs DDR5-6400 affects tokenization speed. The difference is 5-10% for typical workloads—not critical but measurable.

Channels: Always use dual-channel minimum. Single-channel halves memory bandwidth.

# Check RAM configuration on Linux
sudo dmidecode -t memory | grep -A 5 "Memory Device"

# Verify dual-channel on Windows
wmic memorychip get Manufacturer, Speed, Capacity, MemoryType

# Expected output at minimum:
# Channel: Dual
# Configured Memory Speed: 5600 MT/s
# Capacity: 16384 MB (per module)

Swap Usage Unexpected

If system RAM is insufficient for model loading, the system swaps to storage. This causes:

30-second model load times instead of 3 seconds
Complete system freeze during swapping
Disk wear if SSD
Inference failure if storage too slow

Configure swappiness carefully:

# Reduce swap tendency for AI workloads
sudo sysctl vm.swappiness=10

# Add to /etc/sysctl.conf for persistence
echo 'vm.swappiness=10' | sudo tee -a /etc/sysctl.conf

Offloading Strategies

Modern inference servers support model offloading:

# llama.cpp with partial GPU offloading
./main -m model.gguf -ngl 24  # Offload 24 layers to GPU
# Remaining layers use system RAM/CPU

This technique allows larger models on smaller VRAM, at the cost of reduced speed.