Hardware Planning for Local AI

17. Future-Proofing

Chapter 17 of 20 · 15 min

Hardware planning should account for model size trends and upcoming capabilities. Future-proofing means avoiding premature obsolescence.

Model Size Trajectory

Current trends indicate ongoing growth:

Year	Popular Models	Typical Size
2022	GPT-3 style	175B
2023	Llama 2	7B, 13B, 70B
2024	Llama 3	8B, 70B
2025 (projected)	100B+ models	70B-200B

The trend toward larger models continues, driven by research and commercial interest.

VRAM Requirements Forecast

Models increase in parameter count and context length:

Model	VRAM (2024)	VRAM (estimated 2026)
7B	14GB FP16 / 4GB INT4	7GB INT4
70B	140GB FP16 / 40GB INT4	35GB INT4
128B (est)	N/A	64GB INT4

Context length increases compound VRAM requirements:

4K context: Baseline
32K context: 4-8x KV cache
128K context: 16-32x KV cache

Upgrade Roadmap

Current GPU	Upgrade Path	Timeline
RTX 3060 8GB	RTX 4060 Ti 16GB	Immediate
RTX 3060 12GB	RTX 4080	1-2 years
RTX 3080 10GB	RTX 4090 / RTX 5090	1-3 years
RTX 3090 24GB	RTX 5090 (when available)	2-4 years

Supporting Infrastructure

Prioritize investments in infrastructure that survives GPU upgrades:

PSU headroom: Buy 1000W instead of 850W for GPU upgrade path
Motherboard PCIe 5.0: Next-gen GPUs may require it
RAM capacity: 64GB now accommodates future workloads
Case airflow: More capacity than minimally required
Storage speed: PCIe 5.0 NVMe for faster model loading

Framework Evolution

AI frameworks evolve quickly:

llama.cpp: Active development, efficient quantization
vLLM: Paged attention reduces KV cache memory by 2-10x
ExLlamaV2: Hardware-optimized kernels

A newer framework release can extend hardware capability significantly.

EXERCISE

Estimate your VRAM requirements in 2 years for your primary use case. Research the models likely to be available and calculate whether your planned build supports them.