What runs on RTX 4090 + RTX 3090 (asymmetric 24+24 GB)?
Asymmetric multi-GPU: a 4090 paired with a 3090. PCIe 4.0 only — different SM counts, different memory bandwidth. Effective VRAM is bottlenecked by the slower card on most split strategies.
Asymmetric layer-split via llama.cpp — the operationally honest path when pair-matching isn't an option.
Mixed-GPU configurations are operationally honest about VRAM but compromised on throughput. The 4090 has 1008 GB/s memory bandwidth vs 936 GB/s on 3090 — close enough for tensor parallelism to work, but the 4090's faster compute is bottlenecked waiting for the 3090 every layer. Effective VRAM is roughly total minus ~3 GB per card (more overhead than symmetric pairs because the runtime loads slightly different weight shards). Tensor parallelism technically works; pipeline parallelism (layer split) works better in practice — different cards do different layers, no synchronization stall on faster card.
Asymmetric cards. Layer-split via llama.cpp distributes by ratio. 48 GB total but 42 GB usable due to runtime overhead and the slower card's bottleneck.
See the multi-GPU guide for the full math + tradeoffs.
Topology
Models that fit comfortably (24)
Effective VRAM utilization ≤ 85% at the smallest production quant. Comfortable headroom for KV cache.
Borderline (12)
Fits but with little headroom. KV cache for long context may not fit; verify before deployment.
Effective VRAM utilization >114% — KV cache for long context will not fit. Cap context at ~4-8K or move to a larger combo.
Effective VRAM utilization >114% — KV cache for long context will not fit. Cap context at ~4-8K or move to a larger combo.
Effective VRAM utilization >114% — KV cache for long context will not fit. Cap context at ~4-8K or move to a larger combo.
Effective VRAM utilization >114% — KV cache for long context will not fit. Cap context at ~4-8K or move to a larger combo.
Effective VRAM utilization >114% — KV cache for long context will not fit. Cap context at ~4-8K or move to a larger combo.
Effective VRAM utilization >114% — KV cache for long context will not fit. Cap context at ~4-8K or move to a larger combo.
Effective VRAM utilization >114% — KV cache for long context will not fit. Cap context at ~4-8K or move to a larger combo.
Effective VRAM utilization >114% — KV cache for long context will not fit. Cap context at ~4-8K or move to a larger combo.
Effective VRAM utilization >114% — KV cache for long context will not fit. Cap context at ~4-8K or move to a larger combo.
Effective VRAM utilization >114% — KV cache for long context will not fit. Cap context at ~4-8K or move to a larger combo.
Effective VRAM utilization >114% — KV cache for long context will not fit. Cap context at ~4-8K or move to a larger combo.
Effective VRAM utilization >114% — KV cache for long context will not fit. Cap context at ~4-8K or move to a larger combo.
Not practical (8)
Model weights exceed effective combo VRAM. Even with the recommended split strategy, this configuration won't run cleanly. Drop to a smaller quant or move to a larger combo.
Model weights exceed effective combo VRAM. Even with the recommended split strategy, this configuration won't run cleanly.
Model weights exceed effective combo VRAM. Even with the recommended split strategy, this configuration won't run cleanly.
Model weights exceed effective combo VRAM. Even with the recommended split strategy, this configuration won't run cleanly.
Model weights exceed effective combo VRAM. Even with the recommended split strategy, this configuration won't run cleanly.
Model weights exceed effective combo VRAM. Even with the recommended split strategy, this configuration won't run cleanly.
Model weights exceed effective combo VRAM. Even with the recommended split strategy, this configuration won't run cleanly.
Model weights exceed effective combo VRAM. Even with the recommended split strategy, this configuration won't run cleanly.
Model weights exceed effective combo VRAM. Even with the recommended split strategy, this configuration won't run cleanly.
Benchmark opportunities
estimates, not measurementsPending benchmark targets for this combo. Once measured, results land in the catalog as benchmarks.
MoE on a mixed-GPU rig via llama.cpp layer-split. Mixtral 8x22B (39B-active, 141B total) at Q4_K_M is ~80GB — does NOT fit on dual 24GB cards even with layer-split. Q3_K_M might fit. Marked pending to verify quant fit before measurement.
Going deeper
- Full combo detail page — operational review with failure modes and runtime matrix.
- Multi-GPU buying guide — when multi-GPU is worth it and when it isn't.
- Will-it-run home — single-card check + custom builds.