17. Future-Proofing
Chapter 17 of 20 · 15 min
Hardware planning should account for model size trends and upcoming capabilities. Future-proofing means avoiding premature obsolescence.
Model Size Trajectory
Current trends indicate ongoing growth:
| Year | Popular Models | Typical Size |
|---|---|---|
| 2022 | GPT-3 style | 175B |
| 2023 | Llama 2 | 7B, 13B, 70B |
| 2024 | Llama 3 | 8B, 70B |
| 2025 (projected) | 100B+ models | 70B-200B |
The trend toward larger models continues, driven by research and commercial interest.
VRAM Requirements Forecast
Models increase in parameter count and context length:
| Model | VRAM (2024) | VRAM (estimated 2026) |
|---|---|---|
| 7B | 14GB FP16 / 4GB INT4 | 7GB INT4 |
| 70B | 140GB FP16 / 40GB INT4 | 35GB INT4 |
| 128B (est) | N/A | 64GB INT4 |
Context length increases compound VRAM requirements:
- 4K context: Baseline
- 32K context: 4-8x KV cache
- 128K context: 16-32x KV cache
Upgrade Roadmap
| Current GPU | Upgrade Path | Timeline |
|---|---|---|
| RTX 3060 8GB | RTX 4060 Ti 16GB | Immediate |
| RTX 3060 12GB | RTX 4080 | 1-2 years |
| RTX 3080 10GB | RTX 4090 / RTX 5090 | 1-3 years |
| RTX 3090 24GB | RTX 5090 (when available) | 2-4 years |
Supporting Infrastructure
Prioritize investments in infrastructure that survives GPU upgrades:
- PSU headroom: Buy 1000W instead of 850W for GPU upgrade path
- Motherboard PCIe 5.0: Next-gen GPUs may require it
- RAM capacity: 64GB now accommodates future workloads
- Case airflow: More capacity than minimally required
- Storage speed: PCIe 5.0 NVMe for faster model loading
Framework Evolution
AI frameworks evolve quickly:
- llama.cpp: Active development, efficient quantization
- vLLM: Paged attention reduces KV cache memory by 2-10x
- ExLlamaV2: Hardware-optimized kernels
A newer framework release can extend hardware capability significantly.
EXERCISE
Estimate your VRAM requirements in 2 years for your primary use case. Research the models likely to be available and calculate whether your planned build supports them.