Editorial(Live coverage report)

Benchmark cohort coverage

The intelligence graph compares your benchmark to its cohort — same model, same hardware, same quant bucket, same context bucket. Cohorts under 5 measurements can't produce confident outlier flags. This page surfaces which cohorts have signal and which are underpowered.

The cohorts ranked first below are ones where one or two more measurements would unlock real intelligence. If you have the rig, the “reproduce” CTA on each row prefills the submission form.

Total cohorts

Very-high tier

Underpowered

Single-runtime only

Cohorts where one more measurement matters

Ranked: low / moderate confidence first, then proximity to the 5-row outlier-detection threshold, then recency. A measurement landing on any of these tips it across the line.

Cohort	Confidence	Rows	Latest	Action
llama-3.2-1b-instruct on rtx-3080-16gb-mobile 4-bit · ≤4K · Single-source cohort; nothing to compare against.	Low	1	2026-06-02	Reproduce →
kumru-2b on rtx-3080-16gb-mobile 4-bit · ≤4K · Single-source cohort; nothing to compare against.	Low	1	2026-06-02	Reproduce →
trendyol-llm-asure-12b on rtx-3080-16gb-mobile 4-bit · ≤4K · Single-source cohort; nothing to compare against.	Low	1	2026-06-02	Reproduce →
ytu-turkish-gemma-9b on rtx-3080-16gb-mobile 4-bit · ≤4K · Single-source cohort; nothing to compare against.	Low	1	2026-06-02	Reproduce →
brooqs-mistral-turkish-v2-latest on rtx-3080-16gb-mobile 4-bit · ≤4K · Single-source cohort; nothing to compare against.	Low	1	2026-06-02	Reproduce →
codegemma-7b on rtx-3080-16gb-mobile 4-bit · ≤4K · Single-source cohort; nothing to compare against.	Low	1	2026-06-02	Reproduce →
deepseek-coder-v2-lite on rtx-3080-16gb-mobile 4-bit · ≤4K · Single-source cohort; nothing to compare against.	Low	1	2026-06-02	Reproduce →
deepseek-r1-distill-qwen-7b on rtx-3080-16gb-mobile 4-bit · ≤4K · Single-source cohort; nothing to compare against.	Low	1	2026-06-02	Reproduce →
gemma-2-9b-it on rtx-3080-16gb-mobile 4-bit · ≤4K · Single-source cohort; nothing to compare against.	Low	1	2026-06-02	Reproduce →
gemma-3-12b on rtx-3080-16gb-mobile 4-bit · ≤4K · Single-source cohort; nothing to compare against.	Low	1	2026-06-02	Reproduce →
gemma-3-1b on rtx-3080-16gb-mobile 4-bit · ≤4K · Single-source cohort; nothing to compare against.	Low	1	2026-06-02	Reproduce →
gemma-3-4b on rtx-3080-16gb-mobile 4-bit · ≤4K · Single-source cohort; nothing to compare against.	Low	1	2026-06-02	Reproduce →
gemma-4-e2b on rtx-3080-16gb-mobile 4-bit · ≤4K · Single-source cohort; nothing to compare against.	Low	1	2026-06-02	Reproduce →
gemma-4-e4b on rtx-3080-16gb-mobile 4-bit · ≤4K · Single-source cohort; nothing to compare against.	Low	1	2026-06-02	Reproduce →
hermes-3-llama-3.1-8b on rtx-3080-16gb-mobile 4-bit · ≤4K · Single-source cohort; nothing to compare against.	Low	1	2026-06-02	Reproduce →
mistral-7b-turkish on rtx-3080-16gb-mobile 4-bit · ≤4K · Single-source cohort; nothing to compare against.	Low	1	2026-06-02	Reproduce →
llama-3.2-11b-vision-instruct on rtx-3080-16gb-mobile 4-bit · ≤4K · Single-source cohort; nothing to compare against.	Low	1	2026-06-02	Reproduce →
mistral-7b-instruct-v0.3 on rtx-3080-16gb-mobile 4-bit · ≤4K · Single-source cohort; nothing to compare against.	Low	1	2026-06-02	Reproduce →
mistral-nemo-12b on rtx-3080-16gb-mobile 4-bit · ≤4K · Single-source cohort; nothing to compare against.	Low	1	2026-06-02	Reproduce →
phi-3.5-mini-instruct on rtx-3080-16gb-mobile 4-bit · ≤4K · Single-source cohort; nothing to compare against.	Low	1	2026-06-02	Reproduce →
phi-4-reasoning-14b on rtx-3080-16gb-mobile 4-bit · ≤4K · Single-source cohort; nothing to compare against.	Low	1	2026-06-02	Reproduce →
qwen-2.5-7b-instruct on rtx-3080-16gb-mobile 4-bit · ≤4K · Single-source cohort; nothing to compare against.	Low	1	2026-06-02	Reproduce →
qwen-3-14b on rtx-3080-16gb-mobile 4-bit · ≤4K · Single-source cohort; nothing to compare against.	Low	1	2026-06-02	Reproduce →
qwen-3-4b on rtx-3080-16gb-mobile 4-bit · ≤4K · Single-source cohort; nothing to compare against.	Low	1	2026-06-02	Reproduce →
rn-tr-r1 on rtx-3080-16gb-mobile 4-bit · ≤4K · Single-source cohort; nothing to compare against.	Low	1	2026-06-02	Reproduce →
rn-tr-r2 on rtx-3080-16gb-mobile 4-bit · ≤4K · Single-source cohort; nothing to compare against.	Low	1	2026-06-02	Reproduce →
turkcell-llm-7b on rtx-3080-16gb-mobile 4-bit · ≤4K · Single-source cohort; nothing to compare against.	Low	1	2026-06-02	Reproduce →
trendyol-llm-asure-12b on rtx-5080 4-bit · ≤4K · Single-source cohort; nothing to compare against.	Low	1	2026-05-28	Reproduce →
llama-3.1-8b-instruct on rtx-5080 4-bit · ≤4K · Single-source cohort; nothing to compare against.	Low	1	2026-05-28	Reproduce →
qwen-2.5-coder-14b-instruct on rtx-5080 4-bit · ≤4K · Single-source cohort; nothing to compare against.	Low	1	2026-05-28	Reproduce →
kumru-2b on rtx-5080 4-bit · ≤4K · Single-source cohort; nothing to compare against.	Low	1	2026-05-28	Reproduce →
brooqs-mistral-turkish-v2-latest on rtx-5080 4-bit · ≤4K · Single-source cohort; nothing to compare against.	Low	1	2026-05-28	Reproduce →
turkcell-llm-7b on rtx-5080 4-bit · ≤4K · Single-source cohort; nothing to compare against.	Low	1	2026-05-28	Reproduce →
ytu-turkish-gemma-9b on rtx-5080 4-bit · ≤4K · Single-source cohort; nothing to compare against.	Low	1	2026-05-28	Reproduce →
trendyol-llm-asure-12b on rtx-5080 unknown · ≤4K · Single-source cohort; nothing to compare against.	Low	1	2026-05-28	Reproduce →
rn-tr-r1 on rtx-5080 4-bit · ≤4K · Single-source cohort; nothing to compare against.	Low	1	2026-05-28	Reproduce →
rn-tr-r2 on rtx-5080 4-bit · ≤4K · Single-source cohort; nothing to compare against.	Low	1	2026-05-28	Reproduce →
mistral-7b-turkish on rtx-5080 5-bit · ≤4K · Single-source cohort; nothing to compare against.	Low	1	2026-05-28	Reproduce →
trendyol-llm-asure-12b on rtx-5080 4-bit · 4-8K · Single-source cohort; nothing to compare against.	Low	1	2026-05-27	Reproduce →

How cohort confidence is derived

Cohort labels mirror the per-benchmark confidence engine: low / moderate / high / very-high. Never percentages.

Very-high: ≥5 measurements + ≥2 reproductions.
High: ≥5 measurements, reproduction count low.
Moderate: 3-4 measurements, below the outlier-detection threshold.
Low: 1-2 measurements, single-source. The intelligence graph cannot draw conclusions.

A cohort that's last-touched >18 months ago gets demoted one tier — runtime + driver drift since then is real. A cohort that has only one runtime represented gets called out; runtime-drift signal is absent until a second runtime lands.

Next recommended step

Editorial-curated benchmark opportunities ranked by impact.

See the public benchmark roadmap

OrSubmit a benchmark Browse benchmarks