fatalEditorialReviewed May 2026

ROCm HSA status error — recover an AMD GPU mid-inference

HSA / HIP errors mid-inference on AMD GPUs usually trace to thermal limits, kernel-driver mismatch, or known-bad memory modes on consumer cards. Here's the diagnostic order.

ROCmPyTorch ROCmllama.cpp HIP backendAMD RDNA 2/3

By Fredoline Eruo · Last verified 2026-05-08

Diagnostic order — most likely first

Card thermal throttle hitting an unstable clock state

Diagnose

Crash correlates with sustained load. `rocm-smi` shows GPU temp > 95°C right before crash. Reproducible.

Fix

Improve case airflow. Underclock VRAM by 100 MHz: `rocm-smi --setmclk 7` (numbers vary by card). Use undervolt profile in MorePowerTool (RDNA 3) to lower thermals.

Kernel + ROCm version mismatch after distro update

Diagnose

Was working last week. After `apt upgrade` it broke. `dkms status` shows the AMDGPU module status as 'failed' or version skew with running kernel.

Fix

Reinstall with matching versions: `sudo amdgpu-install --usecase=rocm,dkms --no-dkms` then reboot. For consumer cards on rolling distros (Arch), pin the kernel version against the ROCm release.

RDNA 3 memory bandwidth bug at high context

Diagnose

7900 XTX specifically. Crashes only at long context (>16K) on certain models. Known issue tracked in llama.cpp + ROCm GH issues.

Fix

Use a Q5_K_M quant instead of Q4 (slightly different memory access pattern). Or cap context at 8K for affected models. Or build llama.cpp HEAD with the latest ROCm patches.

Missing HSA gfx-version override for the card

Diagnose

`rocminfo` shows your card's gfx version (gfx1100, gfx1030, etc.) but ROCm libraries reject it because the binary distribution doesn't ship that target.

Fix

Set `HSA_OVERRIDE_GFX_VERSION=11.0.0` (RDNA 3) or `10.3.0` (RDNA 2) in your shell or systemd service. Many bundled ROCm builds need this for consumer cards.

Insufficient PCIe bandwidth (laptop or x4 slot)

Diagnose

Crashes during model load or first inference. `lspci -vv | grep LnkSta` shows the GPU at PCIe 3.0 x4 instead of 4.0 x16.

Fix

Move card to a full-length x16 slot if available. For laptops with M.2-eGPU adapters, this is a hardware limitation — only short-context inference is reliable.

Frequently asked questions

Is ROCm production-ready on consumer AMD GPUs in 2026?

On RDNA 3 (7900 XTX, 7900 XT) with the gfx-version override and ROCm 6.x — yes for inference, with caveats. On RDNA 2 — workable but more friction. On older cards — use Vulkan via llama.cpp instead.

Should I switch to Vulkan if ROCm keeps crashing?

Yes. llama.cpp's Vulkan backend (`-DGGML_VULKAN=ON`) achieves 70-90% of ROCm performance for inference and is dramatically more stable on consumer AMD cards. The trade-off: no PyTorch / Transformers support.

Can I run ROCm and CUDA on the same machine (multi-vendor GPU)?

Technically yes, but driver coexistence is fragile. Most operators dedicate a machine to one vendor. If you must, install in isolated containers.

Related troubleshooting

ROCm not detected / AMD GPU not found

ROCm is finicky on consumer AMD GPUs in 2026. Here's the install order, the gfx-version override that fixes 80% of detection failures, and when to give up and use Vulkan.

Model keeps crashing / segfault during inference

Mid-inference crashes (segfault, illegal memory access, kernel panic) usually mean VRAM ECC, thermal throttling, PSU instability, or a bad model file. Here's the diagnostic order.

CUDA out of memory

Why CUDA OOM happens during local LLM inference and image gen, how to confirm the real cause, and the four real fixes (smaller quant, shorter context, gradient checkpointing, or more VRAM).

When the fix is hardware

A surprising fraction of troubleshooting tickets resolve to: this card doesn't have enough VRAM for what you're asking it to do. If you're hitting OOM after every reasonable fix, or your GPU genuinely can't fit the model you need, it's upgrade time:

Where next?

All troubleshooting guides

OrBest GPU for local AI Will it run on my hardware?