Methodology hub
Editorial

Methodology

Every methodology surface on RunLocalAI in one place — the rules behind every benchmark badge, every confidence tier, every regression candidate, every reproduction request. We document the rules so you can disagree with them specifically, not vaguely.

Anything that's missing? Tell us at /submit/feedback — we publish methodology corrections.

Trust + scoring

How tok/s + VRAM scores combine with confidence tiers into the editorial verdict. The four-state verification ladder for community submissions.

Why confidence is a 4-tier vocabulary (low / moderate / high / very-high) and never a percentage. derivedFrom rationale per row.

What editorial does before a community submission goes public. What gets rejected. Why some submissions sit pending for days.

What the benchmark badges mean, how operator-grade scoring differs from synthetic vendor scoring, and where we admit uncertainty.

Benchmarks

How to reproduce an existing benchmark step-by-step so your run lifts the row's confidence tier.

Operator-grade capture protocol used on the 6 owned rigs: cold-start prep, standard prompt, 3-run median, sha256 reproducibility hash, mandatory environment metadata (driver + runtime + OS). The exact recipe behind every source=owner row.

10-item printable checklist: what to measure, what NOT to claim, sampler-pinning, version capture, reproducibility hygiene.

How /benchmarks/regressions reaches 'possible regression' / 'possible improvement' / 'insufficient'. Threshold rules + known noise sources.

8-stage flow: request submitted → accepted → claimed → submitted → moderated → approved-public → reproduced → independently-reproduced.

How distinct-operator + distinct-hardware reproductions lift confidence from moderate → high → very-high.

Editorial policy

What we accept on /benchmarks/request, what we reject, what counts as an editorial-priority signal.

Single-table view of every major runtime across 13 operational dimensions (maintenance burden, reproducibility, lock-in, OS support, etc).

Who writes here, how named-author bylines work, and what 'editorial' versus 'community' provenance means on a benchmark row.

How operator profiles get the 'verified-owner' signal. Why we don't accept self-attested verification.

Why methodology lives in one place

Editorial credibility lives or dies on whether you can defend every score with a specific rule. We publish the rules so a knowledgeable operator can disagree with this rule on this page rather than handwaving "your scoring seems off."

None of these methodologies are settled forever. We update them when contributors point out flaws. Every change ships in the changelog.