Single-node multi-GPUNVLinkadvanced

What runs on Quad RTX 3090 (24 GB × 4)?

Four used 3090s in a homelab chassis. 96 GB total / ~88 GB effective. The cheapest path to 100B+ class models and high-concurrency 70B serving.

At a glance

Effective VRAM

88 / 96 GB

Not pooled

Speed penalty

~10%

vs ideal single-card

Recommended runtime

vllm

tensor parallel

Setup difficulty

advanced

~1400W peak

Models fit

Borderline

Not practical

Deployment recipe

Quad RTX 3090 workstation →

Step-by-step setup with WRX80/W790 motherboard, NVLink pair verification, vLLM tensor-parallel-4 + power/thermal warnings.

Memory budget

Total VRAM

96 GB

Effective for inference

88 GB

92% of total

Not pooled

Four 3090s in a single chassis with PCIe + NVLink (paired bridges between cards 0-1 and 2-3) does not produce 96 GB of pooled VRAM. Tensor parallelism across 4 ranks with vLLM yields ~88 GB effective for model weights — total minus ~2 GB per card for activations, KV cache, and runtime overhead. This is the configuration that fits 100B+ class MoE models like DeepSeek V2.5 (236B / 21B-active needs ~134 GB at Q4 — does NOT fit; 100B-class dense models like Llama 3.1 100B-tier do fit). The 88 GB envelope is the realistic ceiling for prosumer multi-GPU before you pay for datacenter hardware.

Why total VRAM is not the whole story

Two cards with NVLink bridge. NVLink (~112.5 GB/s bidirectional) keeps tensor-parallel efficient but does NOT pool memory. Each card holds its share via tensor or pipeline parallelism. Effective 88 GB of total 96 GB.

See the multi-GPU guide for the full math + tradeoffs.

Topology

single-node-multi-gpu

Interconnect

nvlink~112.5 GB/s

Component count

4 units

Components

4×rtx-3090

Recommended runtime

vllm

Also: sglang, exllamav2

Recommended split strategy

tensor-parallel

Also: pipeline-parallel, expert-routing

Setup difficulty

advanced

~1400W peak

Models that fit comfortably (24)

Effective VRAM utilization ≤ 85% at the smallest production quant. Comfortable headroom for KV cache.

Command R+ 104B

Fits

104B·Q4_K_M → 70 GB·80% of effective VRAM·~10% speed penalty vs ideal

Command R+ (Aug 2024)

Fits

104B·AWQ-INT4 → 72 GB·82% of effective VRAM·~10% speed penalty vs ideal

Llama 3.2 90B Vision

Fits

90B·AWQ-INT4 → 64 GB·73% of effective VRAM·~10% speed penalty vs ideal

Llama 3.2 90B Vision Instruct

Fits

90B·Q4_K_M → 60 GB·68% of effective VRAM·~10% speed penalty vs ideal

InternVL 2.5 78B

Fits

78B·Q4_K_M → 52 GB·59% of effective VRAM·~10% speed penalty vs ideal

Qwen 2.5 72B Instruct

Fits

72B·Q4_K_M → 48 GB·55% of effective VRAM·~10% speed penalty vs ideal

Molmo 72B

Fits

72B·Q4_K_M → 48 GB·55% of effective VRAM·~10% speed penalty vs ideal

Qwen 2.5-VL 72B

Fits

72B·AWQ-INT4 → 48 GB·55% of effective VRAM·~10% speed penalty vs ideal

Qwen 3 72B

Fits

72B·AWQ-INT4 → 48 GB·55% of effective VRAM·~10% speed penalty vs ideal

Qwen 2.5 Math 72B

Fits

72B·Q4_K_M → 48 GB·55% of effective VRAM·~10% speed penalty vs ideal

DeepSeek R1 Distill Llama 70B

Fits

70B·Q4_K_M → 48 GB·55% of effective VRAM·~10% speed penalty vs ideal

OpenBioLLM Llama 3 70B

Fits

70B·Q4_K_M → 48 GB·55% of effective VRAM·~10% speed penalty vs ideal

Llama 3.3 70B Instruct

Fits

70B·Q4_K_M → 48 GB·55% of effective VRAM·~10% speed penalty vs ideal

Dolphin 3 Llama 3.3 70B

Fits

70B·AWQ-INT4 → 48 GB·55% of effective VRAM·~10% speed penalty vs ideal

Llama 3.1 Nemotron 70B Instruct

Fits

70B·Q4_K_M → 48 GB·55% of effective VRAM·~10% speed penalty vs ideal

Tulu 3 70B

Fits

70B·Q4_K_M → 48 GB·55% of effective VRAM·~10% speed penalty vs ideal

Hermes 4 Llama 3.3 70B

Fits

70B·AWQ-INT4 → 48 GB·55% of effective VRAM·~10% speed penalty vs ideal

Hermes 3 Llama 3.1 70B

Fits

70B·Q4_K_M → 48 GB·55% of effective VRAM·~10% speed penalty vs ideal

Llama 3.1 70B Instruct

Fits

70B·Q4_K_M → 48 GB·55% of effective VRAM·~10% speed penalty vs ideal

Llama 4 70B

Fits

70B·AWQ-INT4 → 48 GB·55% of effective VRAM·~10% speed penalty vs ideal

EVA Llama 3.3 70B

Fits

70B·AWQ-INT4 → 48 GB·55% of effective VRAM·~10% speed penalty vs ideal

Jamba 1.5 Mini

Fits

52B·Q4_K_M → 36 GB·41% of effective VRAM·~10% speed penalty vs ideal

Nemotron 3 Super 49B

Fits

49B·AWQ-INT4 → 32 GB·36% of effective VRAM·~10% speed penalty vs ideal

Mixtral 8x7B Instruct

Fits

47B·Q4_K_M → 32 GB·36% of effective VRAM·~10% speed penalty vs ideal

Borderline (8)

Fits but with little headroom. KV cache for long context may not fit; verify before deployment.

GLM-5 Pro

Borderline

144B·AWQ-INT4 → 96 GB·109% of effective VRAM·~10% speed penalty vs ideal

Effective VRAM utilization >109% — KV cache for long context will not fit. Cap context at ~4-8K or move to a larger combo.

Mixtral 8x22B Instruct

Borderline

141B·Q4_K_M → 96 GB·109% of effective VRAM·~10% speed penalty vs ideal

Effective VRAM utilization >109% — KV cache for long context will not fit. Cap context at ~4-8K or move to a larger combo.

WizardLM-2 8x22B

Borderline

141B·Q4_K_M → 96 GB·109% of effective VRAM·~10% speed penalty vs ideal

Effective VRAM utilization >109% — KV cache for long context will not fit. Cap context at ~4-8K or move to a larger combo.

DBRX Base

Borderline

132B·Q4_K_M → 96 GB·109% of effective VRAM·~10% speed penalty vs ideal

Effective VRAM utilization >109% — KV cache for long context will not fit. Cap context at ~4-8K or move to a larger combo.

DBRX Instruct

Borderline

132B·AWQ-INT4 → 96 GB·109% of effective VRAM·~10% speed penalty vs ideal

Effective VRAM utilization >109% — KV cache for long context will not fit. Cap context at ~4-8K or move to a larger combo.

Mistral Large 2 (123B)

Borderline

123B·Q4_K_M → 88 GB·100% of effective VRAM·~10% speed penalty vs ideal

Effective VRAM utilization >100% — KV cache for long context will not fit. Cap context at ~4-8K or move to a larger combo.

Nemotron 3 Super (120B-A12B)

Borderline

120B·Q4_K_M → 84 GB·95% of effective VRAM·~10% speed penalty vs ideal

Effective VRAM utilization >95% — KV cache for long context will not fit. Cap context at ~4-8K or move to a larger combo.

Llama 4 Scout

Borderline

109B·Q4_K_M → 80 GB·91% of effective VRAM·~10% speed penalty vs ideal

Effective VRAM utilization >91% — KV cache for long context will not fit. Cap context at ~4-8K or move to a larger combo.

Not practical (8)

Model weights exceed effective combo VRAM. Even with the recommended split strategy, this configuration won't run cleanly. Drop to a smaller quant or move to a larger combo.

DeepSeek V4 Pro (1.6T MoE)

Not practical

1600B·Q4_K_M → 1024 GB·1164% of effective VRAM·~10% speed penalty vs ideal