What runs on Quad RTX 3090 (24 GB × 4)?
Four used 3090s in a homelab chassis. 96 GB total / ~88 GB effective. The cheapest path to 100B+ class models and high-concurrency 70B serving.
Step-by-step setup with WRX80/W790 motherboard, NVLink pair verification, vLLM tensor-parallel-4 + power/thermal warnings.
Four 3090s in a single chassis with PCIe + NVLink (paired bridges between cards 0-1 and 2-3) does not produce 96 GB of pooled VRAM. Tensor parallelism across 4 ranks with vLLM yields ~88 GB effective for model weights — total minus ~2 GB per card for activations, KV cache, and runtime overhead. This is the configuration that fits 100B+ class MoE models like DeepSeek V2.5 (236B / 21B-active needs ~134 GB at Q4 — does NOT fit; 100B-class dense models like Llama 3.1 100B-tier do fit). The 88 GB envelope is the realistic ceiling for prosumer multi-GPU before you pay for datacenter hardware.
Two cards with NVLink bridge. NVLink (~112.5 GB/s bidirectional) keeps tensor-parallel efficient but does NOT pool memory. Each card holds its share via tensor or pipeline parallelism. Effective 88 GB of total 96 GB.
See the multi-GPU guide for the full math + tradeoffs.
Topology
- 4×rtx-3090
Models that fit comfortably (24)
Effective VRAM utilization ≤ 85% at the smallest production quant. Comfortable headroom for KV cache.
Borderline (8)
Fits but with little headroom. KV cache for long context may not fit; verify before deployment.
Effective VRAM utilization >109% — KV cache for long context will not fit. Cap context at ~4-8K or move to a larger combo.
Effective VRAM utilization >109% — KV cache for long context will not fit. Cap context at ~4-8K or move to a larger combo.
Effective VRAM utilization >109% — KV cache for long context will not fit. Cap context at ~4-8K or move to a larger combo.
Effective VRAM utilization >109% — KV cache for long context will not fit. Cap context at ~4-8K or move to a larger combo.
Effective VRAM utilization >109% — KV cache for long context will not fit. Cap context at ~4-8K or move to a larger combo.
Effective VRAM utilization >100% — KV cache for long context will not fit. Cap context at ~4-8K or move to a larger combo.
Effective VRAM utilization >95% — KV cache for long context will not fit. Cap context at ~4-8K or move to a larger combo.
Effective VRAM utilization >91% — KV cache for long context will not fit. Cap context at ~4-8K or move to a larger combo.
Not practical (8)
Model weights exceed effective combo VRAM. Even with the recommended split strategy, this configuration won't run cleanly. Drop to a smaller quant or move to a larger combo.
Model weights exceed effective combo VRAM. Even with the recommended split strategy, this configuration won't run cleanly.
Model weights exceed effective combo VRAM. Even with the recommended split strategy, this configuration won't run cleanly.
Model weights exceed effective combo VRAM. Even with the recommended split strategy, this configuration won't run cleanly.
Model weights exceed effective combo VRAM. Even with the recommended split strategy, this configuration won't run cleanly.
Model weights exceed effective combo VRAM. Even with the recommended split strategy, this configuration won't run cleanly.
Model weights exceed effective combo VRAM. Even with the recommended split strategy, this configuration won't run cleanly.
Model weights exceed effective combo VRAM. Even with the recommended split strategy, this configuration won't run cleanly.
Model weights exceed effective combo VRAM. Even with the recommended split strategy, this configuration won't run cleanly.
Benchmark opportunities
estimates, not measurementsPending benchmark targets for this combo. Once measured, results land in the catalog as benchmarks.
Reasoning workload on quad-3090. R1 distill produces 5-15× more tokens per query; per-stream throughput drops vs same-size non-reasoning model.
Going deeper
- Full combo detail page — operational review with failure modes and runtime matrix.
- Multi-GPU buying guide — when multi-GPU is worth it and when it isn't.
- Will-it-run home — single-card check + custom builds.