Single-node multi-GPUNVLinkintermediate

What runs on Dual RTX 3090 (24 GB × 2)?

The reference dual-GPU local-AI rig. NVLink optional. 48 GB total / ~46 GB effective with tensor parallelism. The cheapest path to 70B-class models at 2025-2026 prices.

At a glance

Effective VRAM

46 / 48 GB

Not pooled

Speed penalty

~10%

vs ideal single-card

Recommended runtime

vllm

tensor parallel

Setup difficulty

intermediate

~700W peak

Models fit

Borderline

Not practical

Deployment recipe

Dual RTX 3090 workstation →

Step-by-step setup with NVLink bridge verification, vLLM tensor-parallel-2 configuration, and operator-grade failure modes.

Memory budget

Total VRAM

48 GB

Effective for inference

46 GB

96% of total

Not pooled

PCIe + optional NVLink between two RTX 3090s does NOT pool VRAM the way Apple unified memory does. Each card holds its half of the model weights via tensor parallelism (vLLM / SGLang) or pipeline parallelism (llama.cpp layer split). Effective VRAM is roughly total minus ~2 GB per card for activations, KV cache, and runtime overhead. Concretely: a 70B Q4 model (~40 GB weights) fits with ~6 GB of headroom for context and KV. Anything claiming 48 GB pooled is wrong.

Why total VRAM is not the whole story

Two cards with NVLink bridge. NVLink (~112.5 GB/s bidirectional) keeps tensor-parallel efficient but does NOT pool memory. Each card holds its share via tensor or pipeline parallelism. Effective 46 GB of total 48 GB.

See the multi-GPU guide for topology tradeoffs, and the RunLocalAI Will-It-Run Framework for the citable fit-tier method.

Topology

single-node-multi-gpu

Interconnect

nvlink~112.5 GB/s

Component count

2 units

Components

2×rtx-3090

Recommended runtime

vllm

Also: sglang, exllamav2, tgi

Recommended split strategy

tensor-parallel

Also: pipeline-parallel

Setup difficulty

intermediate

~700W peak

Models that fit comfortably (24)

Effective VRAM utilization ≤ 85% at the smallest production quant. Comfortable headroom for KV cache.

Jamba 1.5 Mini

Fits

52B·Q4_K_M → 36 GB·78% of effective VRAM·~10% speed penalty vs ideal

Nemotron 3 Super 49B

Fits

49B·AWQ-INT4 → 32 GB·70% of effective VRAM·~10% speed penalty vs ideal

Mixtral 8x7B Instruct

Fits

47B·Q4_K_M → 32 GB·70% of effective VRAM·~10% speed penalty vs ideal

Mixtral 8X7B Instruct v0.1 GPTQ

Fits

46.7B·Q4_K_M → 33 GB·72% of effective VRAM·~10% speed penalty vs ideal

Falcon 40B Instruct

Fits

40B·Q4_K_M → 28 GB·61% of effective VRAM·~10% speed penalty vs ideal

ALIA 40b instruct 2601

Fits

40B·Q4_K_M → 28 GB·61% of effective VRAM·~10% speed penalty vs ideal

Mihenk LLM v2 35B (Turkish Financial)

Fits

35B·Q4_K_M → 25 GB·54% of effective VRAM·~10% speed penalty vs ideal

Command R 35B

Fits

35B·Q4_K_M → 26 GB·57% of effective VRAM·~10% speed penalty vs ideal

Aya 23 35B

Fits

35B·Q4_K_M → 24 GB·52% of effective VRAM·~10% speed penalty vs ideal

Phind CodeLlama 34B v2

Fits

34B·Q4_K_M → 24 GB·52% of effective VRAM·~10% speed penalty vs ideal

Yi 1.5 34B

Fits

34B·Q4_K_M → 24 GB·52% of effective VRAM·~10% speed penalty vs ideal

DeepSeek Coder V3

Fits

33B·AWQ-INT4 → 22 GB·48% of effective VRAM·~10% speed penalty vs ideal

Magistral 32B

Fits

32B·AWQ-INT4 → 22 GB·48% of effective VRAM·~10% speed penalty vs ideal

Qwen 2.5 32B Instruct

Fits

32B·Q4_K_M → 24 GB·52% of effective VRAM·~10% speed penalty vs ideal

Qwen3 Swallow 32B RL v0.2

Fits

32B·Q4_K_M → 23 GB·50% of effective VRAM·~10% speed penalty vs ideal

EXAONE 3.5 32B

Fits

32B·AWQ-INT4 → 22 GB·48% of effective VRAM·~10% speed penalty vs ideal

EXAONE 4.0.1 32B

Fits

32B·Q4_K_M → 23 GB·50% of effective VRAM·~10% speed penalty vs ideal

Aya Expanse 32B

Fits

32B·AWQ-INT4 → 22 GB·48% of effective VRAM·~10% speed penalty vs ideal

llm-jp 4 32B A3B Thinking

Fits

32B·Q4_K_M → 23 GB·50% of effective VRAM·~10% speed penalty vs ideal

EXAONE 3.5 32B Instruct AWQ

Fits

32B·Q4_K_M → 23 GB·50% of effective VRAM·~10% speed penalty vs ideal

OLMo 2 32B

Fits

32B·Q4_K_M → 24 GB·52% of effective VRAM·~10% speed penalty vs ideal

Qwen 3 Coder 32B

Fits

32B·AWQ-INT4 → 22 GB·48% of effective VRAM·~10% speed penalty vs ideal

EXAONE 3.5 32B Instruct

Fits

32B·Q4_K_M → 23 GB·50% of effective VRAM·~10% speed penalty vs ideal

Qwen 3 32B

Fits

32B·Q4_K_M → 24 GB·52% of effective VRAM·~10% speed penalty vs ideal

Borderline (12)

Fits but with little headroom. KV cache for long context may not fit; verify before deployment.

InternVL 2.5 78B

Borderline

78B·Q4_K_M → 52 GB·113% of effective VRAM·~10% speed penalty vs ideal

Effective VRAM utilization >113% — KV cache for long context will not fit. Cap context at ~4-8K or move to a larger combo.

Molmo 72B

Borderline

72B·Q4_K_M → 48 GB·104% of effective VRAM·~10% speed penalty vs ideal