What runs on Dual RTX 4090 (24 GB × 2)?
Two consumer-flagship cards. PCIe 4.0 only — no NVLink on 4090. 48 GB total / ~45 GB effective with tensor parallelism. ~30% faster decode than dual 3090 at 2× the cost.
PCIe peer-to-peer verification (no NVLink), FP8 path, vLLM tensor-parallel-2 over PCIe.
Critical: RTX 4090 has NO NVLink. NVIDIA removed the connector. Two 4090s communicate ONLY via PCIe — typically PCIe 4.0 x8 each on a consumer board, x16 each on a workstation board. This means the cross-card bandwidth is ~32 GB/s, vs 112 GB/s on dual 3090 NVLink. For tensor parallelism, this matters — expect ~10-20% throughput penalty vs an NVLink-equipped pair. Effective VRAM is total minus ~2-3 GB per card for activations and KV cache; concretely, 70B Q4 fits with marginal headroom. Two 4090s do NOT pool to 48 GB usable — runtime overhead and per-card activations cost real VRAM.
PCIe-only multi-GPU. No NVLink means cross-card bandwidth is 32 GB/s — 3-4× slower than NVLink. Tensor parallelism still works but with ~10-20% throughput penalty. Effective 45 GB of total 48 GB.
See the multi-GPU guide for topology tradeoffs, and the RunLocalAI Will-It-Run Framework for the citable fit-tier method.
Topology
- 2×rtx-4090
Models that fit comfortably (24)
Effective VRAM utilization ≤ 85% at the smallest production quant. Comfortable headroom for KV cache.
Borderline (12)
Fits but with little headroom. KV cache for long context may not fit; verify before deployment.
Effective VRAM utilization >107% — KV cache for long context will not fit. Cap context at ~4-8K or move to a larger combo.
Effective VRAM utilization >107% — KV cache for long context will not fit. Cap context at ~4-8K or move to a larger combo.
Effective VRAM utilization >107% — KV cache for long context will not fit. Cap context at ~4-8K or move to a larger combo.
Effective VRAM utilization >107% — KV cache for long context will not fit. Cap context at ~4-8K or move to a larger combo.
Effective VRAM utilization >107% — KV cache for long context will not fit. Cap context at ~4-8K or move to a larger combo.
Effective VRAM utilization >107% — KV cache for long context will not fit. Cap context at ~4-8K or move to a larger combo.
Effective VRAM utilization >107% — KV cache for long context will not fit. Cap context at ~4-8K or move to a larger combo.
Effective VRAM utilization >107% — KV cache for long context will not fit. Cap context at ~4-8K or move to a larger combo.
Effective VRAM utilization >107% — KV cache for long context will not fit. Cap context at ~4-8K or move to a larger combo.
Effective VRAM utilization >107% — KV cache for long context will not fit. Cap context at ~4-8K or move to a larger combo.
Effective VRAM utilization >107% — KV cache for long context will not fit. Cap context at ~4-8K or move to a larger combo.
Effective VRAM utilization >109% — KV cache for long context will not fit. Cap context at ~4-8K or move to a larger combo.
Not practical (8)
Model weights exceed effective combo VRAM. Even with the recommended split strategy, this configuration won't run cleanly. Drop to a smaller quant or move to a larger combo.
Model weights exceed effective combo VRAM. Even with the recommended split strategy, this configuration won't run cleanly.
Model weights exceed effective combo VRAM. Even with the recommended split strategy, this configuration won't run cleanly.
Model weights exceed effective combo VRAM. Even with the recommended split strategy, this configuration won't run cleanly.
Model weights exceed effective combo VRAM. Even with the recommended split strategy, this configuration won't run cleanly.
Model weights exceed effective combo VRAM. Even with the recommended split strategy, this configuration won't run cleanly.
Model weights exceed effective combo VRAM. Even with the recommended split strategy, this configuration won't run cleanly.
Model weights exceed effective combo VRAM. Even with the recommended split strategy, this configuration won't run cleanly.
Model weights exceed effective combo VRAM. Even with the recommended split strategy, this configuration won't run cleanly.
Benchmark opportunities
estimates, not measurementsPending benchmark targets for this combo. Once measured, results land in the catalog as benchmarks.
Reference benchmark for dual-4090 PCIe (no NVLink). Same model + quant as dual-3090 entry; the comparison reveals NVLink vs PCIe impact at tensor-parallel-2.
Going deeper
- Full combo detail page — operational review with failure modes and runtime matrix.
- Multi-GPU buying guide — when multi-GPU is worth it and when it isn't.
- RunLocalAI Will-It-Run Framework — citable effective-VRAM, working-set, fit-tier, and evidence-tier method.
- Will-it-run home — single-card check + custom builds.