Community submittedAll submissions reviewed before publication

Submit a benchmark

Share what you actually measured running a model on your hardware. Helps every operator after you make a better decision. We never auto-publish — every submission lands in the moderation queue and a named editor reviews it before it appears on the site.

Faster path: if you ran runlocalai-bench and have the terminal output, paste it instead — most fields auto-fill.30-sec paste mode →

How this works

1. Submit. Fill in the form below. Required: model + hardware + at least one measurement. Everything else is optional but helps reviewers triage faster.

2. Review. A named RunLocalAI editor reviews your submission within 1-7 days. We check that the numbers are plausible, the model + hardware combo makes sense, and there's nothing dishonest going on. Suspicious submissions are rejected; we don't publish what we can't trust.

3. Publish. Approved submissions render with a clear Community submitted badge. They never get mixed up with our editorial measurements. If we later reproduce them on similar hardware, the badge upgrades to Reproduced.

Discipline rules

Why we moderate and what we won't accept.

Don't fabricate numbers. We will reject submissions that don't match plausible bandwidth math.
Be specific. “~30 tok/s on 4090” is fine; “fast” is not.
Note context length. A 4K-context number is very different from a 32K-context number for the same model.
Note runtime version. vLLM 0.7 vs vLLM 0.8 differ enough to matter.
Note OS + driver. AMD on Linux vs Windows is a real difference.
Stability notes are valuable too. “Worked but OOM'd at 32K context after 3 hours” is the kind of detail that's actually useful.

Model *

Hardware *

Runtime

Quant format

Context (tokens)

Measurements (at least one required)

Tokens / sec (decode)

TTFT (ms)

VRAM used (GB)

RAM used (GB)

Power draw (W)

Concurrent users

Runtime version

Driver / CUDA / ROCm

Adding version metadata makes your benchmark more useful for regression detection.

Advanced version metadata (all optional)

CUDA version

ROCm version

Metal version

MLX version

Ollama version

llama.cpp commit

vLLM version

SGLang version

Container runtime

OS version (precise)

Kernel version

Docker image

Notes on version drift

Notes (operator detail)

Stability notes

Your name (optional)

Email (optional)

Privacy

Name and email are optional. If you provide them, we'll only use them to credit you on the published benchmark and to ask follow-up questions. We never sell or share submitter data.

We do hash your IP for rate-limiting and duplicate detection (max 3 submissions per hour). We never store the raw IP. The hash includes a daily salt so it can't be used to track you across days.

Got a correction or a workflow report instead? Send feedback →