RUNLOCALAIv38
→WILL IT RUNBEST GPUCOMPARETROUBLESHOOTSTARTPULSEMODELSHARDWARETOOLSBENCH
RUNLOCALAI

Operator-grade instrument for local-AI hardware intelligence. Hand-written verdicts. Real benchmarks. Reproducible commands.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
  • Will it run?
GUIDES
  • Best GPU
  • Best laptop
  • Best Mac
  • Best used GPU
  • Best budget GPU
  • Best GPU for Ollama
  • Best GPU for SD
  • AI PC build $2K
  • CUDA vs ROCm
  • 16 vs 24 GB
  • Compare hardware
  • Custom compare
REF
  • Systems
  • Ecosystem maps
  • Pillar guides
  • Methodology
  • Glossary
  • Errors KB
  • Troubleshooting
  • Resources
  • Public API
EDITOR
  • About
  • About the author
  • Changelog
  • Latest
  • Updates
  • Submit benchmark
  • Send feedback
  • Trust
  • Editorial policy
  • How we make money
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

SYS · ONLINEUPTIME · 100%2026 · operator-owned
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Benchmarks
  4. /Reproduce
◯Community submitted(Help-improve queue)

Help improve the benchmark corpus

Reproduction is the moat. Every benchmark below would measurably improve the dataset if you reproduced it on your rig. Prefill links open the submission form with model, hardware, quant, and context already populated.

Total public reproductions on file: 0. Total queue items below: 8.

Editorial-priority opportunities

High/critical-priority benchmark gaps from the editorial roadmap.

  • moderate
    Phi-3.5 Mini Instruct on Qualcomm Snapdragon 8 Elite

    Snapdragon 8 Elite is the mid-2025 flagship for Android on-device LLM inference. Establishing the NPU-vs-GPU-fallback tradeoff numbers is critical for the Android-on-device guidance.

    Unlocks: /hardware/snapdragon-8-elite, /tools/qualcomm-ai-hub, /stacks/android-on-device-ai

    Reproduce →View original
  • moderate
    Llama 3.2 3B Instruct on Apple A18 Pro

    Mobile on-device LLM viability is the most-asked question in the iPhone-developer ecosystem in 2026. A measured tok/s + battery drain + thermal throttling curve answers 'can I ship this in my app?'

    Unlocks: /hardware/apple-a18-pro, /systems/mobile-local-ai, /stacks/iphone-on-device-ai

    Reproduce →View original
  • hard
    Qwen 3 Coder 32B on NVIDIA GeForce RTX 5090

    The single-5090 baseline is the comparison anchor for every multi-GPU recommendation on this site. Without it, the 'should I just buy one bigger card?' question can't be answered with confidence.

    Unlocks: /hardware/rtx-5090, /guides/choosing-a-gpu-for-local-ai-2026, /guides/running-local-ai-on-multiple-gpus-2026

    Reproduce →View original
  • moderate
    DeepSeek V4 Flash (284B MoE) on (any)

    DeepSeek V4 Flash with the MTP head is claimed to be the throughput leader. Verifying the MTP advantage on production hardware is high-value for V4-Pro-vs-V4-Flash decision-making.

    Unlocks: /hardware-combos/vllm-tensor-parallel-h100-workstation, /stacks/h100-tensor-parallel-workstation, /models/deepseek-v4-flash

    Reproduce →View original
  • moderate
    Qwen 3.5 235B-A17B (MoE) on (any)

    The Apple-vs-NVIDIA comparison at the frontier-MoE tier is the most-asked question for Mac Studio buyers. Editorial estimate is 25-30% of NVIDIA throughput; measured value would close the loop.

    Unlocks: /hardware-combos/mac-studio-m3-ultra-192gb, /stacks/apple-silicon-ai, /will-it-run/combo/mac-studio-m3-ultra-192gb

    Reproduce →View original
  • moderate
    Qwen 3.5 235B-A17B (MoE) on (any)

    The frontier-MoE production reference. Organizations weighing $200k+ DGX-class purchases vs cloud rental need measured throughput to model cost-per-million-tokens accurately.

    Unlocks: /hardware-combos/vllm-tensor-parallel-h100-workstation, /stacks/h100-tensor-parallel-workstation, /will-it-run/combo/vllm-tensor-parallel-h100-workstation

    Reproduce →View original
  • moderate
    Llama 3.3 70B Instruct on (any)

    Pairs with the dual-3090 measurement to quantify the NVLink-vs-PCIe penalty. The 4090 NVLink absence is the single most-misunderstood spec gap; a measured comparison ends the speculation.

    Unlocks: /hardware-combos/dual-rtx-4090, /stacks/dual-4090-workstation, /will-it-run/combo/dual-rtx-4090

    Reproduce →View original
  • moderate
    Llama 3.3 70B Instruct on (any)

    The dual-3090 NVLink build is the most-recommended prosumer multi-GPU configuration on this site. Without a measured benchmark, the 25-32 tok/s estimate carries editorial-only confidence — operators making $1,500+ buying decisions deserve real numbers.

    Unlocks: /hardware-combos/dual-rtx-3090, /stacks/dual-3090-workstation, /will-it-run/combo/dual-rtx-3090

    Reproduce →View original

How reproduction lifts confidence

The four-tier ladder runs editorial-driven, never automatic.

±15% match. Your measurement and the original within 15% on tok/s + matching quant + matching context bucket triggers a confidence lift on the original.

Two independent reproducers. Two distinct operators reproduce the same benchmark → the badge upgrades to independently-reproduced.

Editorial review. Submissions are never auto-published. Editorial reviews each within 1-7 days. Read the trust standards for the full process.

Next recommended step

See the public benchmark roadmap
OrSee cohort coverageRead the reproduction guide