RunLocalAI Model Intelligence Pipeline
The Model Intelligence Pipeline imports public benchmark signals and labels them as external priors. They help answer which model is worth trying, but they never replace RunLocalAI's local fit math, measured tok/s rows, quantization, context length, runtime, or confidence evidence.
Why this exists
RunLocalAI should not become a generic model leaderboard. LMSYS, LMArena, LiveBench, Artificial Analysis, and similar projects already specialize in model quality, human preference, provider speed, and contamination-resistant evaluation. RunLocalAI's job is to join that intelligence to the local reality: hardware fit, runtime choice, quantization, context pressure, speed, cost, and evidence.
The pipeline therefore produces model intelligence priors. A prior can say that one model appears stronger than another in external evaluations. It cannot say that the model will fit in your VRAM, run fast at Q4, or beat the cloud for your workload. Those claims come from the Will-It-Run Framework and RunLocalAI benchmark evidence.
Sources
Pipeline contract
weekly GitHub Action
-> fetch upstream dataset-server rows
-> normalize model aliases
-> attach source/confidence/license notes
-> aggregate LiveBench judgment rows from a stated sample
-> compute transparent model-intelligence priors
-> publish JSON snapshot + API + dated archiveEvery score in the snapshot keeps its source dataset, config, split, source URL, row count, capture time, scale, and evidence note. The public JSON includes not_local_measurement: true at the snapshot level because this layer is intentionally separate from local execution evidence.
OpenEvals is small enough to pull fully. LMArena and LiveBench are large enough to hit public rows-API rate limits, so the default job publishes fetched_rows, total_rows_reported, and sampling_note in the source metadata. Deeper refreshes can raise those caps without changing the public schema.
Composite discipline
The composite score is a convenience sort key, not the truth. It normalizes available OpenEvals aggregate score, LMArena text rating, and LiveBench aggregate judgment into a 0-100 prior. Missing components are not backfilled. The confidence tier is based on source coverage: high when all three source families are present, moderate when two are present, and low when only one is present.
The composite is never shown as a local verdict by itself. A useful RunLocalAI answer must join it with the local tuple:
model intelligence prior
+ hardware effective VRAM
+ quantization and context
+ runtime support
+ measured or estimated tok/s evidence
+ cost-vs-cloud economics
= Will-It-Run verdictPublic files
- JSON snapshot: /model-intelligence/latest.json
- API endpoint: /api/v1/model-intelligence
- JSON schema: /schemas/runlocalai-model-intelligence-snapshot-v1.json
Limits
External benchmarks are useful but incomplete. They may use different prompts, providers, endpoints, tools, release dates, sampling settings, or model variants. They may not reflect open-weight quantized quality. They do not answer whether a model is legal for your use case, safe for your domain, cheap enough for your workload, or runnable on your exact machine.
That is why this pipeline exists as a supporting layer. RunLocalAI owns the final local decision, not the generic leaderboard.
Use the priors correctly
The fit methodology that turns priors into local decisions.