Hardware Minimums — What is Local AI — And Why It Matters (Chapter 7)

The Components That Matter

Running local AI requires certain hardware capabilities. Let's be specific about what you actually need.

RAM (System Memory)

RAM holds the model weights during inference. More RAM = larger models you can run.

Requirements by model size (quantized):

1-2B parameters: 4-8GB RAM
7B parameters: 8-16GB RAM
13B parameters: 16-32GB RAM
70B parameters: 32-64GB RAM

Key point: You need RAM for both the model AND your operating system and other programs. If you have 16GB RAM and want to run a 7B model, you'll have very little RAM left for anything else. 32GB is more comfortable.

GPU (Graphics Processing Unit)

GPUs accelerate inference dramatically. A mid-range GPU (RTX 3060) can run a 7B model at 20-30 tokens/second. The same model on CPU might run at 5 tokens/second.

GPU VRAM (Video RAM) is especially important:

4GB VRAM: Can run small models (1-3B) comfortably
8GB VRAM: Can run 7B models (quantized)
12GB VRAM: Can run 13B models comfortably, or 7B at higher quality
16GB+ VRAM: Can run larger models (30B+)

Important: Not all RAM is equal. If you have a GPU without enough VRAM, system RAM won't help—GPU inference needs VRAM specifically.

CPU

CPU-only inference is possible but slow. If you're running without a GPU:

Modern multi-core CPU (Intel i5/i7 10th gen+, AMD Ryzen 5/7 3000+) works
More cores help (parallelization)
Clock speed matters less than core count for these workloads

Storage

Model files are large:

Small models (1-2B): 1-4GB
7B models: 4-8GB
13B models: 8-16GB
70B models: 40-80GB

An SSD is much faster than HDD for loading models. Once loaded, storage speed matters less (model stays in RAM/VRAM).

Realistic Minimum Configurations

Bare minimum (CPU only, slow):

16GB RAM
Modern quad-core CPU
10GB free disk space
Can run: TinyLlama (1.1B), Phi-2 (2.7B) at 3-8 tok/s

Comfortable minimum (GPU helps a lot):

16GB RAM
GPU with 8GB+ VRAM (GTX 1080, RTX 3060, or better)
20GB free disk space
Can run: Llama 3.2 7B at 15-25 tok/s

Good experience (recommended):

32GB RAM
GPU with 12GB+ VRAM (RTX 3060 Ti, RTX 4070)
50GB free disk space
Can run: Llama 3.2 13B or Mistral 7B at 25-40 tok/s

High-end (approaching cloud quality):

64GB RAM
GPU with 20GB+ VRAM (RTX 4090, A100)
100GB+ free disk space
Can run: Llama 3.1 70B at 15-25 tok/s

How to Check Your Hardware

On Windows:

Press Ctrl+Shift+Esc to open Task Manager
Go to Performance tab
Check: Memory (total), GPU (VRAM available)

On macOS:

Click Apple menu → About This Mac
Check: Memory, Graphics

On Linux:

# Check RAM
free -h

# Check GPU
lspci | grep -i vga
nvidia-smi  # if NVIDIA