Benchmark regression candidates

Rule-based detection of possible regressions and improvements across runtime + driver + CUDA / ROCm / Metal version axes. None of these are confirmed. Each row carries the rationale (derivedFrom), what evidence is missing, and a suggested reproduction path.

We never publish "X regressed by Y%" as fact. The measurements that produce these signals are valid; the interpretation requires reproduction by an independent operator. Read the regression methodology before quoting any of these numbers.

Candidates total

Possible regressions

Possible improvements

Insufficient

No regression candidates detected in the current corpus. New candidates appear when version-aware paired measurements diverge by >15% — see the regression methodology for the exact rules.

See a candidate that looks wrong?

The detector is rule-based, not perfect. A "candidate" can be a real regression, a real improvement, or noise from cohort drift (different operators, different hardware lots, different room temperatures). The fix is reproduction.

Open the reproduction queue →Flag a wrong candidate →Methodology →

Next recommended step

Reproduction queue

OrRegression methodology API: /api/v2/regression-candidates