What runs on Dual RTX 3090 (24 GB × 2)?
The reference dual-GPU local-AI rig. NVLink optional. 48 GB total / ~46 GB effective with tensor parallelism. The cheapest path to 70B-class models at 2025-2026 prices.
Step-by-step setup with NVLink bridge verification, vLLM tensor-parallel-2 configuration, and operator-grade failure modes.
PCIe + optional NVLink between two RTX 3090s does NOT pool VRAM the way Apple unified memory does. Each card holds its half of the model weights via tensor parallelism (vLLM / SGLang) or pipeline parallelism (llama.cpp layer split). Effective VRAM is roughly total minus ~2 GB per card for activations, KV cache, and runtime overhead. Concretely: a 70B Q4 model (~40 GB weights) fits with ~6 GB of headroom for context and KV. Anything claiming 48 GB pooled is wrong.
Two cards with NVLink bridge. NVLink (~112.5 GB/s bidirectional) keeps tensor-parallel efficient but does NOT pool memory. Each card holds its share via tensor or pipeline parallelism. Effective 46 GB of total 48 GB.
See the multi-GPU guide for topology tradeoffs, and the RunLocalAI Will-It-Run Framework for the citable fit-tier method.
Topology
- 2×rtx-3090
Models that fit comfortably (24)
Effective VRAM utilization ≤ 85% at the smallest production quant. Comfortable headroom for KV cache.
Borderline (12)
Fits but with little headroom. KV cache for long context may not fit; verify before deployment.
Effective VRAM utilization >113% — KV cache for long context will not fit. Cap context at ~4-8K or move to a larger combo.
Effective VRAM utilization >104% — KV cache for long context will not fit. Cap context at ~4-8K or move to a larger combo.
Effective VRAM utilization >104% — KV cache for long context will not fit. Cap context at ~4-8K or move to a larger combo.
Effective VRAM utilization >104% — KV cache for long context will not fit. Cap context at ~4-8K or move to a larger combo.
Effective VRAM utilization >104% — KV cache for long context will not fit. Cap context at ~4-8K or move to a larger combo.
Effective VRAM utilization >104% — KV cache for long context will not fit. Cap context at ~4-8K or move to a larger combo.
Effective VRAM utilization >104% — KV cache for long context will not fit. Cap context at ~4-8K or move to a larger combo.
Effective VRAM utilization >104% — KV cache for long context will not fit. Cap context at ~4-8K or move to a larger combo.
Effective VRAM utilization >104% — KV cache for long context will not fit. Cap context at ~4-8K or move to a larger combo.
Effective VRAM utilization >104% — KV cache for long context will not fit. Cap context at ~4-8K or move to a larger combo.
Effective VRAM utilization >104% — KV cache for long context will not fit. Cap context at ~4-8K or move to a larger combo.
Effective VRAM utilization >104% — KV cache for long context will not fit. Cap context at ~4-8K or move to a larger combo.
Not practical (8)
Model weights exceed effective combo VRAM. Even with the recommended split strategy, this configuration won't run cleanly. Drop to a smaller quant or move to a larger combo.
Model weights exceed effective combo VRAM. Even with the recommended split strategy, this configuration won't run cleanly.
Model weights exceed effective combo VRAM. Even with the recommended split strategy, this configuration won't run cleanly.
Model weights exceed effective combo VRAM. Even with the recommended split strategy, this configuration won't run cleanly.
Model weights exceed effective combo VRAM. Even with the recommended split strategy, this configuration won't run cleanly.
Model weights exceed effective combo VRAM. Even with the recommended split strategy, this configuration won't run cleanly.
Model weights exceed effective combo VRAM. Even with the recommended split strategy, this configuration won't run cleanly.
Model weights exceed effective combo VRAM. Even with the recommended split strategy, this configuration won't run cleanly.
Model weights exceed effective combo VRAM. Even with the recommended split strategy, this configuration won't run cleanly.
Benchmark opportunities
estimates, not measurementsPending benchmark targets for this combo. Once measured, results land in the catalog as benchmarks.
Reference benchmark for the dual-3090 NVLink prosumer build. vLLM tensor-parallel-2, AWQ-INT4, 8K context. Compare against dual-4090 PCIe (no NVLink) to isolate interconnect impact.
Going deeper
- Full combo detail page — operational review with failure modes and runtime matrix.
- Multi-GPU buying guide — when multi-GPU is worth it and when it isn't.
- RunLocalAI Will-It-Run Framework — citable effective-VRAM, working-set, fit-tier, and evidence-tier method.
- Will-it-run home — single-card check + custom builds.