07. Hardware Minimums
The Components That Matter
Running local AI requires certain hardware capabilities. Let's be specific about what you actually need.
RAM (System Memory)
RAM holds the model weights during inference. More RAM = larger models you can run.
Requirements by model size (quantized):
- 1-2B parameters: 4-8GB RAM
- 7B parameters: 8-16GB RAM
- 13B parameters: 16-32GB RAM
- 70B parameters: 32-64GB RAM
Key point: You need RAM for both the model AND your operating system and other programs. If you have 16GB RAM and want to run a 7B model, you'll have very little RAM left for anything else. 32GB is more comfortable.
GPU (Graphics Processing Unit)
GPUs accelerate inference dramatically. A mid-range GPU (RTX 3060) can run a 7B model at 20-30 tokens/second. The same model on CPU might run at 5 tokens/second.
GPU VRAM (Video RAM) is especially important:
- 4GB VRAM: Can run small models (1-3B) comfortably
- 8GB VRAM: Can run 7B models (quantized)
- 12GB VRAM: Can run 13B models comfortably, or 7B at higher quality
- 16GB+ VRAM: Can run larger models (30B+)
Important: Not all RAM is equal. If you have a GPU without enough VRAM, system RAM won't help—GPU inference needs VRAM specifically.
CPU
CPU-only inference is possible but slow. If you're running without a GPU:
- Modern multi-core CPU (Intel i5/i7 10th gen+, AMD Ryzen 5/7 3000+) works
- More cores help (parallelization)
- Clock speed matters less than core count for these workloads
Storage
Model files are large:
- Small models (1-2B): 1-4GB
- 7B models: 4-8GB
- 13B models: 8-16GB
- 70B models: 40-80GB
An SSD is much faster than HDD for loading models. Once loaded, storage speed matters less (model stays in RAM/VRAM).
Realistic Minimum Configurations
Bare minimum (CPU only, slow):
- 16GB RAM
- Modern quad-core CPU
- 10GB free disk space
- Can run: TinyLlama (1.1B), Phi-2 (2.7B) at 3-8 tok/s
Comfortable minimum (GPU helps a lot):
- 16GB RAM
- GPU with 8GB+ VRAM (GTX 1080, RTX 3060, or better)
- 20GB free disk space
- Can run: Llama 3.2 7B at 15-25 tok/s
Good experience (recommended):
- 32GB RAM
- GPU with 12GB+ VRAM (RTX 3060 Ti, RTX 4070)
- 50GB free disk space
- Can run: Llama 3.2 13B or Mistral 7B at 25-40 tok/s
High-end (approaching cloud quality):
- 64GB RAM
- GPU with 20GB+ VRAM (RTX 4090, A100)
- 100GB+ free disk space
- Can run: Llama 3.1 70B at 15-25 tok/s
How to Check Your Hardware
On Windows:
- Press Ctrl+Shift+Esc to open Task Manager
- Go to Performance tab
- Check: Memory (total), GPU (VRAM available)
On macOS:
- Click Apple menu → About This Mac
- Check: Memory, Graphics
On Linux:
# Check RAM
free -h
# Check GPU
lspci | grep -i vga
nvidia-smi # if NVIDIA
Check your current hardware using the methods above. Write down: total RAM, GPU model (or "none" for integrated graphics), approximate VRAM if applicable. Then use a resource like "GPU hierarchy" to understand where your GPU sits relative to RTX 3060 as a baseline.