AMD Radeon RX 9070 XT for local AI

What it does well

The RX 9070 XT is AMD's 2025 RDNA 4 mid-flagship at $700-900 — the strongest argument AMD has made for "skip CUDA" since the RX 7900 XTX. 16 GB GDDR6 at ~640 GB/s bandwidth is enough for 13B-class workloads at competitive decode speeds. ROCm 6+ matured through 2024 enough that day-to-day local AI on RDNA 4 + Linux works without the constant kernel-pinning dance the 7000 series demanded early. llama.cpp, Ollama, and vLLM all support RDNA 4 via ROCm at day-zero or close to it. Power draw at 304 W TDP is reasonable — single 8-pin + 6-pin connectors, 750 W PSU comfortable.

Where it breaks

CUDA-locked stacks don't run. TensorRT-LLM, ExLlamaV2, SGLang — none have working ROCm paths or have them at quality parity with NVIDIA. If your workload, IDE plugin, or team deployment target is CUDA-first, the 9070 XT is fighting upstream.
16 GB caps daily-driver workloads at 13B-class. Same constraint as the RTX 5080. 32B-class needs 19-22 GB at Q4 — partial-offload territory. Pick the 7900 XTX (24 GB at similar pricing) if 32B-class is your goal.
Day-zero new model support lags CUDA. ROCm wheels for new architectures land hours-to-weeks after the CUDA paths are working. For frontier-model users this matters; for "Llama 3.1 8B daily driver" users it doesn't.
Windows ROCm is meaningfully worse than Linux ROCm. AMD's ROCm-on-Windows shipped in 2024 but remains second-tier. Linux is the production path; if you're on Windows + don't want WSL2, expect rougher edges.
Resale value uncertainty. AMD GPU resale historically lags NVIDIA. RDNA 4 is new enough that the secondary-market floor isn't established yet. Plan to keep the card for its lifetime, not flip it.

Ideal model range

Sweet spot: 13B-class at full 16-32K context — Qwen 2.5 14B, Phi 4 14B, R1 Distill Llama 8B at ~50-70 tok/s. Solid daily-driver capability.
Stretch: 32B-class at Q4 with partial offload — drops to ~12-18 tok/s on most architectures. Functional for occasional use; not for daily.
Comfortable: 7B-class at 100+ tok/s, embedding models, lightweight RAG pipelines, coding agents on smaller-tier models.
AMD-specific advantage: works with Open WebUI and AnythingLLM just like NVIDIA cards do — the GUI tools layer is platform-agnostic.

Bad use cases

Production CUDA stacks. vLLM tensor-parallel + Hopper FP8 + TensorRT-LLM ecosystem doesn't have an AMD answer at parity. Pick NVIDIA if your team's deployment target lives there.
70B daily-driver workloads. 16 GB doesn't fit; partial-offload is single-digit tok/s. Pick 7900 XTX (24 GB) or NVIDIA 24-32 GB tier.
Maximum tok/s on small models. A RTX 4070 Super at ~$600 used wins on $/throughput for sub-13B workloads in the CUDA ecosystem.
Anyone Windows-first who doesn't want WSL2. Linux + ROCm is the production path. Windows works but feels like a port.
Anyone who values predictable resale. AMD GPU resale is historically softer than NVIDIA. Don't buy this expecting 70%+ resale value at year 2.

Verdict

Buy this if you're Linux + ROCm-comfortable, your workload is 13B-class daily, you're price-sensitive enough that the $200-400 savings vs the 5080 matters, and you're not stuck in a CUDA-only stack. AMD's 2025 silicon + matured ROCm makes the 9070 XT the strongest "skip CUDA" pitch since the 7900 XTX — for the right operator.

Skip this if you need CUDA-ecosystem stacks (vLLM TP, TensorRT-LLM, EXL2, SGLang), if you're Windows-first, if your daily target is 32B-class or 70B (wrong VRAM tier), or if you value predictable software-update cadence over savings (NVIDIA's day-zero support is a real operator-time savings).

How it compares

vs RTX 5080 (16 GB) → similar VRAM tier, NVIDIA wins on bandwidth + CUDA ecosystem maturity, AMD wins on price ($700-900 vs $1,100-1,300). For the 13B-class operator on Linux who's done with the CUDA tax, this is the operative comparison.
vs RX 7900 XTX (24 GB) → 7900 XTX has 24 GB at ~$700-900 — same price tier, larger VRAM. Pick 7900 XTX for 32B-class headroom; pick 9070 XT for newer silicon (RDNA 4, better day-zero support, faster matured ROCm).
vs RTX 4070 Ti Super (16 GB) → similar VRAM, different ecosystem. NVIDIA wins on CUDA + day-zero support; AMD wins on price-tier-relative if you're willing to take ROCm. The 9070 XT is the "stay AMD on new gen" play.
vs Used RTX 3090 → 3090 used at $700-1000 with 24 GB VRAM beats the 9070 XT on $/VRAM and CUDA ecosystem. Pick 9070 XT only if you're committed AMD or want new-card warranty.
vs RX 9070 (non-XT) → non-XT is ~$550-650 at similar 16 GB VRAM. The XT premium pays for ~15-25% more compute + clock headroom. Pick non-XT for value, XT if you want max RDNA 4 silicon.

Frequently asked

What models can AMD Radeon RX 9070 XT run?

With 16GB VRAM, the AMD Radeon RX 9070 XT runs models up to 14B in 4-bit, or 7B at higher quantizations. See the model list below for tested combinations.

Does AMD Radeon RX 9070 XT support CUDA?

No — AMD Radeon RX 9070 XT is an AMD card. Use ROCm (Linux) or the Vulkan backend in llama.cpp instead. CUDA-only tools won't work.

How much does AMD Radeon RX 9070 XT cost?

Current street price for AMD Radeon RX 9070 XT is around $649 (MSRP $599). Prices vary by region and supply.

What it does well

Where it breaks

CUDA-locked stacks don't run. TensorRT-LLM, ExLlamaV2, SGLang — none have working ROCm paths or have them at quality parity with NVIDIA. If your workload, IDE plugin, or team deployment target is CUDA-first, the 9070 XT is fighting upstream.

16 GB caps daily-driver workloads at 13B-class. Same constraint as the RTX 5080. 32B-class needs 19-22 GB at Q4 — partial-offload territory. Pick the 7900 XTX (24 GB at similar pricing) if 32B-class is your goal.

Day-zero new model support lags CUDA. ROCm wheels for new architectures land hours-to-weeks after the CUDA paths are working. For frontier-model users this matters; for "Llama 3.1 8B daily driver" users it doesn't.

Windows ROCm is meaningfully worse than Linux ROCm. AMD's ROCm-on-Windows shipped in 2024 but remains second-tier. Linux is the production path; if you're on Windows + don't want WSL2, expect rougher edges.

Resale value uncertainty. AMD GPU resale historically lags NVIDIA. RDNA 4 is new enough that the secondary-market floor isn't established yet. Plan to keep the card for its lifetime, not flip it.

Ideal model range

Sweet spot: 13B-class at full 16-32K context — Qwen 2.5 14B, Phi 4 14B, R1 Distill Llama 8B at ~50-70 tok/s. Solid daily-driver capability.

Stretch: 32B-class at Q4 with partial offload — drops to ~12-18 tok/s on most architectures. Functional for occasional use; not for daily.

Comfortable: 7B-class at 100+ tok/s, embedding models, lightweight RAG pipelines, coding agents on smaller-tier models.

AMD-specific advantage: works with Open WebUI and AnythingLLM just like NVIDIA cards do — the GUI tools layer is platform-agnostic.

Bad use cases

Production CUDA stacks. vLLM tensor-parallel + Hopper FP8 + TensorRT-LLM ecosystem doesn't have an AMD answer at parity. Pick NVIDIA if your team's deployment target lives there.

70B daily-driver workloads. 16 GB doesn't fit; partial-offload is single-digit tok/s. Pick 7900 XTX (24 GB) or NVIDIA 24-32 GB tier.

Maximum tok/s on small models. A RTX 4070 Super at ~$600 used wins on $/throughput for sub-13B workloads in the CUDA ecosystem.

Anyone Windows-first who doesn't want WSL2. Linux + ROCm is the production path. Windows works but feels like a port.

Anyone who values predictable resale. AMD GPU resale is historically softer than NVIDIA. Don't buy this expecting 70%+ resale value at year 2.

Verdict

How it compares

vs RTX 5080 (16 GB) → similar VRAM tier, NVIDIA wins on bandwidth + CUDA ecosystem maturity, AMD wins on price ($700-900 vs $1,100-1,300). For the 13B-class operator on Linux who's done with the CUDA tax, this is the operative comparison.

vs RX 7900 XTX (24 GB) → 7900 XTX has 24 GB at ~$700-900 — same price tier, larger VRAM. Pick 7900 XTX for 32B-class headroom; pick 9070 XT for newer silicon (RDNA 4, better day-zero support, faster matured ROCm).

vs RTX 4070 Ti Super (16 GB) → similar VRAM, different ecosystem. NVIDIA wins on CUDA + day-zero support; AMD wins on price-tier-relative if you're willing to take ROCm. The 9070 XT is the "stay AMD on new gen" play.

vs Used RTX 3090 → 3090 used at $700-1000 with 24 GB VRAM beats the 9070 XT on $/VRAM and CUDA ecosystem. Pick 9070 XT only if you're committed AMD or want new-card warranty.

vs RX 9070 (non-XT) → non-XT is ~$550-650 at similar 16 GB VRAM. The XT premium pays for ~15-25% more compute + clock headroom. Pick non-XT for value, XT if you want max RDNA 4 silicon.

Frequently asked

What models can AMD Radeon RX 9070 XT run?

With 16GB VRAM, the AMD Radeon RX 9070 XT runs models up to 14B in 4-bit, or 7B at higher quantizations. See the model list below for tested combinations.

Does AMD Radeon RX 9070 XT support CUDA?

No — AMD Radeon RX 9070 XT is an AMD card. Use ROCm (Linux) or the Vulkan backend in llama.cpp instead. CUDA-only tools won't work.

How much does AMD Radeon RX 9070 XT cost?

Current street price for AMD Radeon RX 9070 XT is around $649 (MSRP $599). Prices vary by region and supply.

AMD Radeon RX 9070 XT

Our verdict

What it does well

Where it breaks

Ideal model range

Bad use cases

Verdict

How it compares

Overview

Specs

Models that fit

Frequently asked

What models can AMD Radeon RX 9070 XT run?

Does AMD Radeon RX 9070 XT support CUDA?

How much does AMD Radeon RX 9070 XT cost?

Where next?

AMD Radeon RX 9070 XT

Our verdict

What it does well

Where it breaks

Ideal model range

Bad use cases

Verdict

How it compares

Overview

Specs

Models that fit

Frequently asked

What models can AMD Radeon RX 9070 XT run?

Does AMD Radeon RX 9070 XT support CUDA?

How much does AMD Radeon RX 9070 XT cost?

Where next?

Hardware worth comparing

VRAM	16 GB
Power draw (peak)	304 W
Released	2025
MSRP	$599
Backends	ROCm Vulkan