Benchmark regression candidates
Rule-based detection of possible regressions and improvements across runtime + driver + CUDA / ROCm / Metal version axes. None of these are confirmed. Each row carries the rationale (derivedFrom), what evidence is missing, and a suggested reproduction path.
We never publish "X regressed by Y%" as fact. The measurements that produce these signals are valid; the interpretation requires reproduction by an independent operator. Read the regression methodology before quoting any of these numbers.
No regression candidates detected in the current corpus. New candidates appear when version-aware paired measurements diverge by >15% — see the regression methodology for the exact rules.
See a candidate that looks wrong?
The detector is rule-based, not perfect. A "candidate" can be a real regression, a real improvement, or noise from cohort drift (different operators, different hardware lots, different room temperatures). The fix is reproduction.