RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
  1. >
  2. Home
  3. /Hardware
  4. /AMD Instinct MI300X
UNIT · AMD · GPU
192 GB VRAMworkstation·Reviewed June 2026

AMD Instinct MI300X

AMD MI300X spec card — 192 GB HBM3 VRAM, 5.3 TB/s bandwidth, 750 W; 405B Q4 single-GPU or 70B FP16
diagram
Credit: RunLocalAI·License: CC-BY-4.0 (original illustration)·Source

192GB HBM3 datacenter card. Used by Microsoft, Oracle, Meta cloud deployments.

Released 2023·5325 GB/s memory bandwidth
▼ CHECK CURRENT PRICE· 1 retailer
AMD Instinct MI300X
Check on Amazon→

Affiliate disclosure: as an Amazon Associate and partner of other retailers, we earn from qualifying purchases. The verdict on this page is our editorial opinion; affiliate links never influence what we recommend.

RUNLOCALAI SCORE
See full leaderboard →
621/ 1000
BB-tier
Estimated
Throughput
500/ 500
VRAM-fit
200/ 200
Ecosystem
130/ 200
Efficiency
57/ 100

Sub-scores sum to 887 / 1000. Headline = 887 × 0.70 (Estimated-confidence discount) = 621. This is an algorithmic performance-tier score — distinct from, and often lower than, the editorial “Our verdict” below, which weighs value and real-world fit (especially for hardware we haven’t measured yet). How scoring works →

Extrapolated from 5325 GB/s bandwidth — 532.5 tok/s estimated. No measured benchmarks yet.

WORKLOAD FIT
Try other hardware →

Plain-English: Runs 70B comfortably — snappy enough for a coding agent.

7B chat✓
Comfortable
14B chat✓
Comfortable
32B chat✓
Comfortable
70B chat✓
Comfortable
Coding agent✓
Comfortable
Vision (≤8B VLM)~
Tight
Long context (32K)✓
Comfortable
✓Comfortable — fits with headroom
~Tight — works, no slack
△Marginal — needs aggressive quant
✗Doesn't fit usefully

Verdicts extrapolated from catalog VRAM + bandwidth + ecosystem flags. Hover any chip for the rationale. Want measured numbers? Submit your own run with runlocalai-bench --submit.

BLK · VERDICT

Our verdict

OP · Fredoline Eruo|VERIFIED JUN 12, 2026
10.0/10

What it does well

The MI300X is the closest the AMD ecosystem gets to a true H100 alternative for LLM inference. 192 GB HBM3 at 5.3 TB/s gives it 2.4× the memory of an H100 SXM and ~58% more bandwidth — at MSRP roughly equal to an H100 SXM ($15,000–$20,000 list, often discounted on enterprise quotes). For LLMs, the math is genuinely compelling: a single MI300X fits Llama 3.3 405B at Q3 with comfortable context, DeepSeek V3 671B at Q2 with paged offload, or Qwen 3 235B FP8 with full operational context. ROCm 6.2+ has reached genuine parity on inference: vLLM upstream supports MI300X first-class as of 2025, SGLang added MI300X-tuned kernels, and Hugging Face Transformers / PyTorch 2.5+ run AMD without manual workarounds for most modern architectures. AMD's Infinity Fabric interconnect is competitive with NVIDIA NVLink for 8× clusters in the MI300X platform. Cloud rental at $2.50–$4.50/hr on TensorWave / Hot Aisle / RunPod is usually 20–40% cheaper than equivalent H100 rental.

Where it breaks

  • Software stack is still maturing. ROCm has improved dramatically but the long tail — fine-tuning libraries, niche frameworks, day-zero support for new model architectures — still lags CUDA by weeks-to-months. If you're integrating with a stack that targets CUDA-only (TensorRT-LLM, certain quantization libraries, specific training frameworks), AMD doesn't run.
  • No FP4 native, limited FP8 support. MI300X has FP8 but the architecture doesn't include NVIDIA's Transformer Engine optimization patterns. For workloads aggressively exploiting FP8 (and certainly FP4), B200 and H200 win on architecture-specific throughput.
  • Driver and kernel module installation is non-trivial. Production-grade ROCm setup (kernel module + dkms + matching userspace) is more delicate than NVIDIA's mature driver story. First-time AMD-on-Linux is rougher than first-time NVIDIA-on-Linux.
  • Limited consumer software paths. Ollama, LM Studio, llama.cpp ROCm all work, but the ergonomics around AMD remain second-class on consumer-tooling. If you want to compare A vs B on every framework that exists, expect more friction on AMD.
  • Resale and used-market liquidity is thin. Used MI300X pricing is hard to find (low transaction volume), unlike used H100 / A100. Cap-ex risk is higher because exit is less certain.

Ideal model range

  • Sweet spot: 70B–235B production inference at FP8 / Q4. The 192 GB memory ceiling is the headline feature — single-card 235B serving is real on MI300X and not on any single-card NVIDIA SKU below B200.
  • Sweet spot: Long-context inference (64K–256K) at the 70B–200B tier. 5.3 TB/s bandwidth keeps decode fed.
  • Sweet spot: 405B-class inference across 2× MI300X NVLink-equivalent (Infinity Fabric) — the cheapest production 405B path that doesn't require 8× NVIDIA SXM.
  • Sweet spot: 671B serving across 4× MI300X (768 GB combined) — competitive with 8× H100 SXM5 on memory and often cheaper on rental.
  • Stretch: Frontier-model fine-tuning at 70B QLoRA or 32B FP16 full fine-tune on a single MI300X.
  • Comfortable: Anything that runs on ROCm — embedding models, classifiers, smaller LMs at high concurrency.

Bad use cases

  • Hobbyist / single-developer workloads. ROCm is a learning curve. Pick RTX 4090 or RTX 5090 for hassle-free CUDA. Save AMD for production scale.
  • CUDA-locked stacks. Don't pick AMD if your team's existing tooling is CUDA-only and you can't afford the integration tax.
  • Cap-ex without rental utilization fit. Pricing is similar to H100 — cap-ex breakeven similarly requires sustained utilization.
  • Frontier training where FP4 / TE2 dominate. B200 is the right tier.
  • Anywhere that needs a mature driver story. NVIDIA's driver is more polished. If you can't budget for ROCm setup time, don't pick AMD.

Verdict

Buy this if you're standing up production inference at 70B–200B+ scale, you have ROCm engineering capacity (or a vendor that does), the 192 GB single-card memory is genuinely useful for your model mix, and you've validated that your serving framework (vLLM / SGLang / Hugging Face Transformers) targets MI300X first-class. The MI300X is the right pick for memory-bound production inference at scale where 192 GB on one card unlocks workloads NVIDIA equivalents can't fit cheaply.

Skip this if your stack is CUDA-only and the integration tax exceeds the price savings, your workloads fit 80 GB (H100 PCIe or even L40S wins on integration ease and ecosystem), you're frontier-training where FP4 / Transformer Engine 2 matters (B200), or you're a hobbyist (consumer NVIDIA wins by a wide margin).

How it compares

  • vs H100 SXM (80 GB) → MI300X has 2.4× memory + 58% more bandwidth + similar FP8 throughput at similar enterprise pricing. H100 SXM has more mature ecosystem, FP8-native Transformer Engine, NVLink. Pick MI300X when memory ceiling and bandwidth genuinely help; pick H100 SXM when ecosystem maturity matters more than memory headroom. See /compare/amd-mi300x-vs-nvidia-h100-sxm.
  • vs H200 (141 GB) → MI300X has 36% more memory + 10% more bandwidth at often lower price. H200 has the entire NVIDIA ecosystem advantage. Pick MI300X for cost-sensitive memory-bound deployments; H200 when you want NVIDIA software guarantees. See /compare/amd-mi300x-vs-nvidia-h200.
  • vs B200 (192 GB) → Same memory tier (192 GB). B200 has 50% more bandwidth (8 vs 5.3 TB/s) + native FP4 + Transformer Engine 2 + NVIDIA ecosystem at substantially higher price (~$40,000 vs $15-20k). Pick B200 for frontier production where FP4 throughput pays; pick MI300X for cost-sensitive memory-bound serving.
  • vs MI325X (256 GB) / MI355X (288 GB) → MI325X / MI355X are AMD's straight-line successors with more memory + faster HBM3e. Pick MI325X / MI355X for new builds when available; MI300X is the value-conscious or earlier-availability pick.
  • vs renting MI300X on Runpod / TensorWave / Hot Aisle → Cloud rental at $2.50–$4.50/hr is usually 20–40% cheaper than equivalent H100 rental. Cap-ex breakeven is similar to H100 (~9 months 24×7). For experimentation and intermittent workloads, rent MI300X first to validate ROCm fit before buying.
BLK · OVERVIEW

Overview

192GB HBM3 datacenter card. Used by Microsoft, Oracle, Meta cloud deployments.

Retailers we'd check:Amazon

Some links above are affiliate links. We may earn a commission at no extra cost to you. How we make money.

BLK · SPECS

Specs

VRAM192 GB
Power draw (peak)750 W
Released2023
MSRP$15000
Backends
ROCm

Models that fit

Open-weight models small enough to run on AMD Instinct MI300X with usable context.

all-MiniLM-L6-v2
0.022B · other
FLUX.1 [dev]
12B · other
Qwen 3 0.6B
0.6B · qwen
BGE Large EN v1.5
0.335B · other
Llama 4 Scout
109B · llama
Nomic Embed Text v1.5
0.137B · other
Kokoro 82M
0.082B · other
Llama 3.1 8B Instruct
8B · llama

Frequently asked

What models can AMD Instinct MI300X run?

With 192GB VRAM, the AMD Instinct MI300X runs 70B models in 4-bit quantization, plus everything smaller. See the model list below for tested combinations.

Does AMD Instinct MI300X support CUDA?

No — AMD Instinct MI300X is an AMD card. Use ROCm (Linux) or the Vulkan backend in llama.cpp instead. CUDA-only tools won't work.

Where next?

Buyer guides
  • Best GPU for local AI →
  • Best laptop for local AI →
  • Best Mac for local AI →
  • Best used GPU for local AI →
Troubleshooting
  • CUDA out of memory →
  • Ollama running slowly →
  • ROCm not detected →
  • Model keeps crashing →

Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify hardware specifications.

RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
Compare alternatives

Hardware worth comparing

The closest alternatives by price, memory bandwidth, and form factor, plus a step up and down — so you can frame the buying decision against real options.

Closest matches
Similar price, bandwidth & form factor
  • Intel Gaudi 3
    intel · 128 GB VRAM
    8.2/10
  • NVIDIA H200
    nvidia · 141 GB VRAM
    10.0/10
  • AMD Instinct MI325X
    amd · 256 GB VRAM
    10.0/10
  • NVIDIA B200
    nvidia · 192 GB VRAM
    10.0/10
  • NVIDIA H100 NVL
    nvidia · 188 GB VRAM
    10.0/10
  • AMD Instinct MI250X
    amd · 128 GB VRAM
    9.7/10
Step up
More capable — more memory or a higher tier
  • NVIDIA H200
    nvidia · 141 GB VRAM
    10.0/10
  • AMD Instinct MI325X
    amd · 256 GB VRAM
    10.0/10
  • NVIDIA B200
    nvidia · 192 GB VRAM
    10.0/10
Step down
Lighter — cheaper or more constrained
  • Intel Gaudi 3
    intel · 128 GB VRAM
    8.2/10
  • NVIDIA H200
    nvidia · 141 GB VRAM
    10.0/10
  • AMD Instinct MI250X
    amd · 128 GB VRAM
    9.7/10