RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Methodology
Methodology hub
✓Editorial

Methodology

Every methodology surface on RunLocalAI in one place — the rules behind every benchmark badge, every confidence tier, every regression candidate, every reproduction request. We document the rules so you can disagree with them specifically, not vaguely.

Anything that's missing? Tell us at /submit/feedback — we publish methodology corrections.

Trust + scoring

Will-It-Run Framework →

The citable RunLocalAI method for local-AI hardware fit: effective VRAM, model working set, context pressure, runtime constraints, fit tiers, and measured-vs-estimated evidence labels.

Scoring methodology →

How tok/s + VRAM scores combine with confidence tiers into the editorial verdict. The four-state verification ladder for community submissions.

Confidence methodology →

Why confidence is a 4-tier vocabulary (low / moderate / high / very-high) and never a percentage. derivedFrom rationale per row.

Verification policy →

What editorial does before a community submission goes public. What gets rejected. Why some submissions sit pending for days.

Trust — benchmarks →

What the benchmark badges mean, how operator-grade scoring differs from synthetic vendor scoring, and where we admit uncertainty.

Benchmarks

Reproduction guide →

How to reproduce an existing benchmark step-by-step so your run lifts the row's confidence tier.

Benchmark protocol (V36.51) →

Operator-grade capture protocol used on the 7 owned rigs: cold-start prep, standard prompt, 3-run median, sha256 reproducibility hash, mandatory environment metadata (driver + runtime + OS). The exact recipe behind every source=owner row.

Benchmark methodology checklist →

10-item printable checklist: what to measure, what NOT to claim, sampler-pinning, version capture, reproducibility hygiene.

Regression methodology →

How /benchmarks/regressions reaches 'possible regression' / 'possible improvement' / 'insufficient'. Threshold rules + known noise sources.

Benchmark lifecycle diagram →

8-stage flow: request submitted → accepted → claimed → submitted → moderated → approved-public → reproduced → independently-reproduced.

Reproduction network v2 →

How distinct-operator + distinct-hardware reproductions lift confidence from moderate → high → very-high.

Editorial policy

Benchmark request policy →

What we accept on /benchmarks/request, what we reject, what counts as an editorial-priority signal.

Engine choice matrix →

Single-table view of every major runtime across 13 operational dimensions (maintenance burden, reproducibility, lock-in, OS support, etc).

Trust — editorial →

Who writes here, how named-author bylines work, and what 'editorial' versus 'community' provenance means on a benchmark row.

Trust — operators →

How operator profiles get the 'verified-owner' signal. Why we don't accept self-attested verification.

Why methodology lives in one place

Editorial credibility lives or dies on whether you can defend every score with a specific rule. We publish the rules so a knowledgeable operator can disagree with this rule on this page rather than handwaving "your scoring seems off."

None of these methodologies are settled forever. We update them when contributors point out flaws. Every change ships in the changelog.

Next recommended step

Browse benchmarks
OrBrowse evaluationsSubmit feedback / corrections