RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Learn
  4. /Courses
  5. /What is Local AI — And Why It Matters
  6. /Ch. 7
What is Local AI — And Why It Matters

07. Hardware Minimums

Chapter 7 of 20 · 18 min
KEY INSIGHT

For useful local AI, you need 16GB+ RAM and ideally a GPU with 8GB+ VRAM—the minimums exist because model weights must fit in memory during inference, and GPUs accelerate this by 5-10x compared to CPU.

The Components That Matter

Running local AI requires certain hardware capabilities. Let's be specific about what you actually need.

RAM (System Memory)

RAM holds the model weights during inference. More RAM = larger models you can run.

Requirements by model size (quantized):

  • 1-2B parameters: 4-8GB RAM
  • 7B parameters: 8-16GB RAM
  • 13B parameters: 16-32GB RAM
  • 70B parameters: 32-64GB RAM

Key point: You need RAM for both the model AND your operating system and other programs. If you have 16GB RAM and want to run a 7B model, you'll have very little RAM left for anything else. 32GB is more comfortable.

GPU (Graphics Processing Unit)

GPUs accelerate inference dramatically. A mid-range GPU (RTX 3060) can run a 7B model at 20-30 tokens/second. The same model on CPU might run at 5 tokens/second.

GPU VRAM (Video RAM) is especially important:

  • 4GB VRAM: Can run small models (1-3B) comfortably
  • 8GB VRAM: Can run 7B models (quantized)
  • 12GB VRAM: Can run 13B models comfortably, or 7B at higher quality
  • 16GB+ VRAM: Can run larger models (30B+)

Important: Not all RAM is equal. If you have a GPU without enough VRAM, system RAM won't help—GPU inference needs VRAM specifically.

CPU

CPU-only inference is possible but slow. If you're running without a GPU:

  • Modern multi-core CPU (Intel i5/i7 10th gen+, AMD Ryzen 5/7 3000+) works
  • More cores help (parallelization)
  • Clock speed matters less than core count for these workloads

Storage

Model files are large:

  • Small models (1-2B): 1-4GB
  • 7B models: 4-8GB
  • 13B models: 8-16GB
  • 70B models: 40-80GB

An SSD is much faster than HDD for loading models. Once loaded, storage speed matters less (model stays in RAM/VRAM).

Realistic Minimum Configurations

Bare minimum (CPU only, slow):

  • 16GB RAM
  • Modern quad-core CPU
  • 10GB free disk space
  • Can run: TinyLlama (1.1B), Phi-2 (2.7B) at 3-8 tok/s

Comfortable minimum (GPU helps a lot):

  • 16GB RAM
  • GPU with 8GB+ VRAM (GTX 1080, RTX 3060, or better)
  • 20GB free disk space
  • Can run: Llama 3.2 7B at 15-25 tok/s

Good experience (recommended):

  • 32GB RAM
  • GPU with 12GB+ VRAM (RTX 3060 Ti, RTX 4070)
  • 50GB free disk space
  • Can run: Llama 3.2 13B or Mistral 7B at 25-40 tok/s

High-end (approaching cloud quality):

  • 64GB RAM
  • GPU with 20GB+ VRAM (RTX 4090, A100)
  • 100GB+ free disk space
  • Can run: Llama 3.1 70B at 15-25 tok/s

How to Check Your Hardware

On Windows:

  1. Press Ctrl+Shift+Esc to open Task Manager
  2. Go to Performance tab
  3. Check: Memory (total), GPU (VRAM available)

On macOS:

  1. Click Apple menu → About This Mac
  2. Check: Memory, Graphics

On Linux:

# Check RAM
free -h

# Check GPU
lspci | grep -i vga
nvidia-smi  # if NVIDIA
EXERCISE

Check your current hardware using the methods above. Write down: total RAM, GPU model (or "none" for integrated graphics), approximate VRAM if applicable. Then use a resource like "GPU hierarchy" to understand where your GPU sits relative to RTX 3060 as a baseline.

← Chapter 6
Privacy - What Stays Yours
Chapter 8 →
Installing Your First Local AI