Community submittedAll submissions reviewed before publication

Reproduce a benchmark

You're filing a reproduction. Same model, same hardware. Tell us what you measured. If your numbers match the original within ±15%, the original benchmark gets a reproduction count and a confidence lift.

Reproducing

You are reproducing an existing benchmark

Original: Qwen 2.5 Coder 7B Instruct on NVIDIA GeForce RTX 3080 16GB (Mobile) (editorial benchmark #337)

The form below is pre-filled with the original benchmark's model, hardware, quant format, and context size. Update any field that's different in your run (e.g. runtime version, quant format, OS). Reproductions still go through editorial review; we don't auto-publish, but a successful reproduction lifts the original's confidence tier and shows up on the original benchmark's reproduction tree.

How this works

1. Submit. Fill in the form below. Required: model + hardware + at least one measurement. Everything else is optional but helps reviewers triage faster.

2. Review. A named RunLocalAI editor reviews your submission within 1-7 days. We check that the numbers are plausible, the model + hardware combo makes sense, and there's nothing dishonest going on. Suspicious submissions are rejected; we don't publish what we can't trust.

3. Publish. Approved submissions render with a clear Community submitted badge. They never get mixed up with our editorial measurements. If we later reproduce them on similar hardware, the badge upgrades to Reproduced.

Discipline rules

Why we moderate and what we won't accept.

Don't fabricate numbers. We will reject submissions that don't match plausible bandwidth math.
Be specific. “~30 tok/s on 4090” is fine; “fast” is not.
Note context length. A 4K-context number is very different from a 32K-context number for the same model.
Note runtime version. vLLM 0.7 vs vLLM 0.8 differ enough to matter.
Note OS + driver. AMD on Linux vs Windows is a real difference.
Stability notes are valuable too. “Worked but OOM'd at 32K context after 3 hours” is the kind of detail that's actually useful.

Model *

Hardware *

Runtime

Quant format

Context (tokens)

Measurements (at least one required)

Tokens / sec (decode)

TTFT (ms)

VRAM used (GB)

RAM used (GB)

Power draw (W)

Concurrent users

Runtime version

Driver / CUDA / ROCm

Adding version metadata makes your benchmark more useful for regression detection.

Advanced version metadata (all optional)

CUDA version

ROCm version

Metal version

MLX version

Ollama version

llama.cpp commit

vLLM version

SGLang version

Container runtime

OS version (precise)

Kernel version

Docker image

Notes on version drift

Notes (operator detail)

Stability notes

Your name (optional)

Email (optional)

Privacy

Name and email are optional. If you provide them, we'll only use them to credit you on the published benchmark and to ask follow-up questions. We never sell or share submitter data.

We do hash your IP for rate-limiting and duplicate detection (max 3 submissions per hour). We never store the raw IP. The hash includes a daily salt so it can't be used to track you across days.

Got a correction or a workflow report instead? Send feedback →