RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Troubleshooting
  4. /ROCm: HSA_STATUS_ERROR / HIP runtime errors during inference
fatal✓Editorial·Reviewed May 2026

ROCm HSA status error — recover an AMD GPU mid-inference

HSA / HIP errors mid-inference on AMD GPUs usually trace to thermal limits, kernel-driver mismatch, or known-bad memory modes on consumer cards. Here's the diagnostic order.

ROCmPyTorch ROCmllama.cpp HIP backendAMD RDNA 2/3
By Fredoline Eruo · Last verified 2026-05-08

Diagnostic order — most likely first

#1

Card thermal throttle hitting an unstable clock state

Diagnose

Crash correlates with sustained load. `rocm-smi` shows GPU temp > 95°C right before crash. Reproducible.

Fix

Improve case airflow. Underclock VRAM by 100 MHz: `rocm-smi --setmclk 7` (numbers vary by card). Use undervolt profile in MorePowerTool (RDNA 3) to lower thermals.

#2

Kernel + ROCm version mismatch after distro update

Diagnose

Was working last week. After `apt upgrade` it broke. `dkms status` shows the AMDGPU module status as 'failed' or version skew with running kernel.

Fix

Reinstall with matching versions: `sudo amdgpu-install --usecase=rocm,dkms --no-dkms` then reboot. For consumer cards on rolling distros (Arch), pin the kernel version against the ROCm release.

#3

RDNA 3 memory bandwidth bug at high context

Diagnose

7900 XTX specifically. Crashes only at long context (>16K) on certain models. Known issue tracked in llama.cpp + ROCm GH issues.

Fix

Use a Q5_K_M quant instead of Q4 (slightly different memory access pattern). Or cap context at 8K for affected models. Or build llama.cpp HEAD with the latest ROCm patches.

#4

Missing HSA gfx-version override for the card

Diagnose

`rocminfo` shows your card's gfx version (gfx1100, gfx1030, etc.) but ROCm libraries reject it because the binary distribution doesn't ship that target.

Fix

Set `HSA_OVERRIDE_GFX_VERSION=11.0.0` (RDNA 3) or `10.3.0` (RDNA 2) in your shell or systemd service. Many bundled ROCm builds need this for consumer cards.

#5

Insufficient PCIe bandwidth (laptop or x4 slot)

Diagnose

Crashes during model load or first inference. `lspci -vv | grep LnkSta` shows the GPU at PCIe 3.0 x4 instead of 4.0 x16.

Fix

Move card to a full-length x16 slot if available. For laptops with M.2-eGPU adapters, this is a hardware limitation — only short-context inference is reliable.

Frequently asked questions

Is ROCm production-ready on consumer AMD GPUs in 2026?

On RDNA 3 (7900 XTX, 7900 XT) with the gfx-version override and ROCm 6.x — yes for inference, with caveats. On RDNA 2 — workable but more friction. On older cards — use Vulkan via llama.cpp instead.

Should I switch to Vulkan if ROCm keeps crashing?

Yes. llama.cpp's Vulkan backend (`-DGGML_VULKAN=ON`) achieves 70-90% of ROCm performance for inference and is dramatically more stable on consumer AMD cards. The trade-off: no PyTorch / Transformers support.

Can I run ROCm and CUDA on the same machine (multi-vendor GPU)?

Technically yes, but driver coexistence is fragile. Most operators dedicate a machine to one vendor. If you must, install in isolated containers.

Related troubleshooting

ROCm not detected / AMD GPU not found

ROCm is finicky on consumer AMD GPUs in 2026. Here's the install order, the gfx-version override that fixes 80% of detection failures, and when to give up and use Vulkan.

Model keeps crashing / segfault during inference

Mid-inference crashes (segfault, illegal memory access, kernel panic) usually mean VRAM ECC, thermal throttling, PSU instability, or a bad model file. Here's the diagnostic order.

CUDA out of memory

Why CUDA OOM happens during local LLM inference and image gen, how to confirm the real cause, and the four real fixes (smaller quant, shorter context, gradient checkpointing, or more VRAM).

When the fix is hardware

A surprising fraction of troubleshooting tickets resolve to: this card doesn't have enough VRAM for what you're asking it to do. If you're hitting OOM after every reasonable fix, or your GPU genuinely can't fit the model you need, it's upgrade time:

  • Best GPU for local AI
  • Best laptop for local AI
  • Best Mac for local AI

Where next?

All troubleshooting guides
OrBest GPU for local AIWill it run on my hardware?