Mixed GPUPCIeadvanced

What runs on RTX 4090 + RTX 3090 (asymmetric 24+24 GB)?

Asymmetric multi-GPU: a 4090 paired with a 3090. PCIe 4.0 only — different SM counts, different memory bandwidth. Effective VRAM is bottlenecked by the slower card on most split strategies.

At a glance

Effective VRAM

42 / 48 GB

Not pooled

Speed penalty

~35%

vs ideal single-card

Recommended runtime

llama-cpp

pipeline parallel

Setup difficulty

advanced

~800W peak

Models fit

Borderline

Not practical

Deployment recipe

Mixed RTX 4090 + 3090 workstation →

Asymmetric layer-split via llama.cpp — the operationally honest path when pair-matching isn't an option.

Memory budget

Total VRAM

48 GB

Effective for inference

42 GB

88% of total

Not pooled

Mixed-GPU configurations are operationally honest about VRAM but compromised on throughput. The 4090 has 1008 GB/s memory bandwidth vs 936 GB/s on 3090 — close enough for tensor parallelism to work, but the 4090's faster compute is bottlenecked waiting for the 3090 every layer. Effective VRAM is roughly total minus ~3 GB per card (more overhead than symmetric pairs because the runtime loads slightly different weight shards). Tensor parallelism technically works; pipeline parallelism (layer split) works better in practice — different cards do different layers, no synchronization stall on faster card.

Why total VRAM is not the whole story

Asymmetric cards. Layer-split via llama.cpp distributes by ratio. 48 GB total but 42 GB usable due to runtime overhead and the slower card's bottleneck.

See the multi-GPU guide for the full math + tradeoffs.

Topology

mixed-gpu

Interconnect

pcie~32 GB/s

Component count

2 units

Components

1×rtx-4090
1×rtx-3090

Recommended runtime

llama-cpp

Also: exllamav2, ollama

Recommended split strategy

pipeline-parallel

Also: layer-split

Setup difficulty

advanced

~800W peak

Models that fit comfortably (24)

Effective VRAM utilization ≤ 85% at the smallest production quant. Comfortable headroom for KV cache.

Nemotron 3 Super 49B

Fits

49B·AWQ-INT4 → 32 GB·76% of effective VRAM·~35% speed penalty vs ideal

Mixtral 8x7B Instruct

Fits

47B·Q4_K_M → 32 GB·76% of effective VRAM·~35% speed penalty vs ideal

Command R 35B

Fits

35B·Q4_K_M → 26 GB·62% of effective VRAM·~35% speed penalty vs ideal

Aya 23 35B

Fits

35B·Q4_K_M → 24 GB·57% of effective VRAM·~35% speed penalty vs ideal

Yi 1.5 34B

Fits

34B·Q4_K_M → 24 GB·57% of effective VRAM·~35% speed penalty vs ideal

Phind CodeLlama 34B v2

Fits

34B·Q4_K_M → 24 GB·57% of effective VRAM·~35% speed penalty vs ideal

DeepSeek Coder V3

Fits

33B·AWQ-INT4 → 22 GB·52% of effective VRAM·~35% speed penalty vs ideal

Qwen 3 32B

Fits

32B·Q4_K_M → 24 GB·57% of effective VRAM·~35% speed penalty vs ideal

DeepSeek R1 Distill Qwen 3 32B

Fits

32B·AWQ-INT4 → 22 GB·52% of effective VRAM·~35% speed penalty vs ideal

OLMo 2 32B

Fits

32B·Q4_K_M → 24 GB·57% of effective VRAM·~35% speed penalty vs ideal

Qwen 3 Coder 32B

Fits

32B·AWQ-INT4 → 22 GB·52% of effective VRAM·~35% speed penalty vs ideal

QwQ 32B Preview

Fits

32B·Q4_K_M → 24 GB·57% of effective VRAM·~35% speed penalty vs ideal

Aya Expanse 32B

Fits

32B·AWQ-INT4 → 22 GB·52% of effective VRAM·~35% speed penalty vs ideal

Qwen 2.5 32B Instruct

Fits

32B·Q4_K_M → 24 GB·57% of effective VRAM·~35% speed penalty vs ideal

EXAONE 3.5 32B

Fits

32B·AWQ-INT4 → 22 GB·52% of effective VRAM·~35% speed penalty vs ideal

DeepSeek R1 Distill Qwen 32B

Fits

32B·Q4_K_M → 24 GB·57% of effective VRAM·~35% speed penalty vs ideal

Qwen 2.5 Coder 32B Instruct

Fits

32B·Q4_K_M → 24 GB·57% of effective VRAM·~35% speed penalty vs ideal

Magistral 32B

Fits

32B·AWQ-INT4 → 22 GB·52% of effective VRAM·~35% speed penalty vs ideal

Gemma 4 31B Dense

Fits

31B·Q4_K_M → 24 GB·57% of effective VRAM·~35% speed penalty vs ideal

Qwen 3 30B-A3B

Fits

30B·Q4_K_M → 22 GB·52% of effective VRAM·~35% speed penalty vs ideal

Nemotron 3 Nano (30B-A3B)

Fits

30B·Q4_K_M → 22 GB·52% of effective VRAM·~35% speed penalty vs ideal

Gemma 3 27B

Fits

27B·Q4_K_M → 20 GB·48% of effective VRAM·~35% speed penalty vs ideal

MedGemma 27B

Fits

27B·Q4_K_M → 20 GB·48% of effective VRAM·~35% speed penalty vs ideal

InternVL 2.5 26B

Fits

26B·Q4_K_M → 20 GB·48% of effective VRAM·~35% speed penalty vs ideal

Borderline (12)

Fits but with little headroom. KV cache for long context may not fit; verify before deployment.

Qwen 2.5 72B Instruct

Borderline

72B·Q4_K_M → 48 GB·114% of effective VRAM·~35% speed penalty vs ideal