RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
  1. >
  2. Home
  3. /Hardware
  4. /AMD Instinct MI300A (APU)
UNIT · AMD · APU
128 GB VRAMworkstation·Reviewed June 2026

AMD Instinct MI300A (APU)

AMD · HARDWARE
AMD Instinct MI300A (APU)

No editorial image yet — generic vendor mark shown. Credentials in spec table below.

Combined CPU + GPU APU with 128GB unified HBM3. Powers the El Capitan supercomputer.

Released 2023·5300 GB/s memory bandwidth
RUNLOCALAI SCORE
See full leaderboard →
620/ 1000
BB-tier
Estimated
Throughput
500/ 500
VRAM-fit
200/ 200
Ecosystem
130/ 200
Efficiency
56/ 100

Sub-scores sum to 886 / 1000. Headline = 886 × 0.70 (Estimated-confidence discount) = 620. This is an algorithmic performance-tier score — distinct from, and often lower than, the editorial “Our verdict” below, which weighs value and real-world fit (especially for hardware we haven’t measured yet). How scoring works →

Extrapolated from 5300 GB/s bandwidth — 530.0 tok/s estimated. No measured benchmarks yet.

WORKLOAD FIT
Try other hardware →

Plain-English: Runs 70B comfortably — snappy enough for a coding agent.

7B chat✓
Comfortable
14B chat✓
Comfortable
32B chat✓
Comfortable
70B chat✓
Comfortable
Coding agent✓
Comfortable
Vision (≤8B VLM)~
Tight
Long context (32K)✓
Comfortable
✓Comfortable — fits with headroom
~Tight — works, no slack
△Marginal — needs aggressive quant
✗Doesn't fit usefully

Verdicts extrapolated from catalog VRAM + bandwidth + ecosystem flags. Hover any chip for the rationale. Want measured numbers? Submit your own run with runlocalai-bench --submit.

BLK · VERDICT

Our verdict

OP · Fredoline Eruo|VERIFIED JUN 18, 2026
10.0/10

What it does well

The MI300A is AMD's CPU+GPU APU — 24 Zen 4 cores + 228 CDNA 3 compute units + 128 GB unified HBM3 memory at 5.3 TB/s, all on a single package. The architecture eliminates the CPU↔GPU memory transfer overhead that bottlenecks traditional discrete-GPU systems: the Zen 4 cores and CDNA 3 GPU share the same physical HBM3 memory pool with full coherent access. This is the chip in the El Capitan supercomputer (LLNL), the world's first sustained-exaflop classical computer. For LLM workloads, the unified-memory architecture is genuinely useful: 128 GB on-chip memory + Infinity Fabric clustering means an 8× MI300A node has 1 TB combined HBM at coherent bandwidth, which is a meaningful advantage over traditional CPU↔GPU systems for memory-bound inference + training workflows. ROCm 6+ supports MI300A first-class. Cap-ex is OEM/integrator-only — typically $20,000-$30,000 per APU socket.

Where it breaks

  • OEM/integrator-only procurement. MI300A doesn't ship to typical enterprises — you buy it as part of an HPE Cray supercomputer or HPE ProLiant XD685+ APU server. Lead times measured in months, MOQs measured in racks.
  • No CUDA — full stop. AMD ROCm ecosystem only. Same long-tail framework compatibility constraints as MI300X and other Instinct cards.
  • Architecture is APU, not pure-GPU. The 24 Zen 4 cores are useful for orchestration but the GPU compute density is lower than MI300X (228 CUs vs 304 CUs) due to silicon die budget shared with the CPU. For pure GPU workloads, MI300X wins.
  • Software stack tuned for HPC, not pure LLM. El Capitan's workload mix is HPC scientific simulation, weather modeling, etc. — LLM-specific optimization on MI300A is less mature than MI300X.
  • Resale and used-market liquidity is essentially zero. Decommissioned El Capitan racks may eventually surface, but transaction volume will be tiny.
  • Power and cooling infrastructure is HPC-tier. 760 W TDP per APU socket, liquid cooling required for sustained workloads.

Ideal model range

  • Sweet spot: HPC + LLM hybrid workloads where CPU↔GPU coherence advantage genuinely matters (specific scientific computing + AI fusion workflows).
  • Sweet spot: Multi-tenant production inference at supercomputer scale where 8× APU node = 1 TB combined HBM is genuinely useful.
  • Sweet spot: Trillion-parameter foundation model training where unified memory architecture reduces transfer overhead vs traditional discrete GPU.
  • Sweet spot: National lab / sovereign AI deployments where MI300A's specific El Capitan provenance is the procurement vehicle.
  • Bad fit: Pure LLM production inference (MI300X is better), single-card workloads (wrong tier), enterprise procurement (wrong channel).

Bad use cases

  • Standard enterprise procurement. Pick MI300X or NVIDIA equivalents.
  • Pure LLM serving. MI300X has more GPU CUs at similar memory tier.
  • CUDA-locked stacks. Don't pick AMD if your toolchain requires CUDA.
  • Anyone reading this for buying decision purposes. This isn't a buying decision — it's reference info on AMD's APU architecture that powers El Capitan.
  • Cost-conscious anything. Wrong tier entirely.
  • Workstation deployment. Rack/HPC-only.

Verdict

Buy this if you're spec'ing HPC infrastructure (national lab, defense, large pharma) where MI300A's specific HPC + AI fusion capability matters, you have OEM relationships with HPE Cray for supercomputer-scale procurement, and your workload genuinely benefits from CPU+GPU coherent unified memory at the rack scale. MI300A is the right pick for the narrow HPC + LLM hybrid use case.

Skip this if you're a typical enterprise (pick MI300X or MI325X for AMD; H200 or B200 for NVIDIA), you're pure-LLM serving (MI300X has more GPU compute), CUDA-locked, or you can't budget OEM/integrator-only procurement. For most readers, this verdict is informational reference, not a buying decision.

How it compares

  • vs MI300X (192 GB) → MI300X has 50% more memory + 304 CUs (33% more) + standard PCIe procurement at $20k cap-ex. MI300A has CPU+GPU coherent unified memory + APU integration at $25-30k OEM. Pick MI300X for typical enterprise; MI300A for HPC + AI fusion specific use cases. See /compare/amd-mi300a-vs-amd-mi300x.
  • vs GB200 NVL72 → GB200 NVL72 is the equivalent NVIDIA platform for trillion-parameter scale at $3M+ rack. MI300A in HPE Cray rack form is similar tier on AMD ecosystem. Pick by ecosystem alignment + scale.
  • vs Grace Hopper Superchip → NVIDIA's equivalent CPU+GPU integrated platform on the Hopper generation. Different ecosystem, similar architectural concept.
  • vs DGX H200 → DGX H200 is 8× discrete H200 SXM5 in 8U at ~$300k. MI300A in HPE Cray APU server form is supercomputer-tier procurement. Wrong comparison — different scales.
BLK · OVERVIEW

Overview

Combined CPU + GPU APU with 128GB unified HBM3. Powers the El Capitan supercomputer.

Retailers we'd check:Amazon

Search-fallback link — editorial hasn't yet curated a retailer URL for this card.

Some links above are affiliate links. We may earn a commission at no extra cost to you. How we make money.

BLK · SPECS

Specs

VRAM128 GB
Power draw (peak)760 W
Released2023
Backends
ROCm

Models that fit

Open-weight models small enough to run on AMD Instinct MI300A (APU) with usable context.

all-MiniLM-L6-v2
0.022B · other
FLUX.1 [dev]
12B · other
Qwen 3 0.6B
0.6B · qwen
BGE Large EN v1.5
0.335B · other
Nomic Embed Text v1.5
0.137B · other
Kokoro 82M
0.082B · other
Llama 3.1 8B Instruct
8B · llama
Qwen 3 30B-A3B
30B · qwen

Frequently asked

What models can AMD Instinct MI300A (APU) run?

With 128GB VRAM, the AMD Instinct MI300A (APU) runs 70B models in 4-bit quantization, plus everything smaller. See the model list below for tested combinations.

Does AMD Instinct MI300A (APU) support CUDA?

No — AMD Instinct MI300A (APU) is an AMD card. Use ROCm (Linux) or the Vulkan backend in llama.cpp instead. CUDA-only tools won't work.

Where next?

Buyer guides
  • Best GPU for local AI →
  • Best laptop for local AI →
  • Best Mac for local AI →
  • Best used GPU for local AI →
Troubleshooting
  • CUDA out of memory →
  • Ollama running slowly →
  • ROCm not detected →
  • Model keeps crashing →

Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify hardware specifications.

RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
Compare alternatives

Hardware worth comparing

The closest alternatives by price, memory bandwidth, and form factor, plus a step up and down — so you can frame the buying decision against real options.

Closest matches
Similar price, bandwidth & form factor
  • NVIDIA H200
    nvidia · 141 GB VRAM
    10.0/10
  • Intel Gaudi 3
    intel · 128 GB VRAM
    8.2/10
  • NVIDIA H20 (96GB)
    nvidia · 96 GB VRAM
    7.4/10
  • NVIDIA H100 NVL
    nvidia · 188 GB VRAM
    10.0/10
  • NVIDIA B200
    nvidia · 192 GB VRAM
    10.0/10
  • AMD Instinct MI300X
    amd · 192 GB VRAM
    10.0/10
Step up
More capable — more memory or a higher tier
  • NVIDIA H100 NVL
    nvidia · 188 GB VRAM
    10.0/10
  • NVIDIA B200
    nvidia · 192 GB VRAM
    10.0/10
  • AMD Instinct MI300X
    amd · 192 GB VRAM
    10.0/10
Step down
Lighter — cheaper or more constrained
  • NVIDIA H20 (96GB)
    nvidia · 96 GB VRAM
    7.4/10
  • NVIDIA H100 SXM
    nvidia · 80 GB VRAM
    10.0/10
  • Intel Gaudi 2
    intel · 96 GB VRAM
    7.9/10