RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Benchmarks
  4. /Reproduce
◯Community submitted(Help-improve queue)

Help improve the benchmark corpus

Reproduction is the moat. Every benchmark below would measurably improve the dataset if you reproduced it on your rig. Prefill links open the submission form with model, hardware, quant, and context already populated.

Total public reproductions on file: 0. Total queue items below: 8.

Editorial-priority opportunities

High/critical-priority benchmark gaps from the editorial roadmap.

  • moderate
    Phi-3.5 Mini Instruct on Qualcomm Snapdragon 8 Elite

    Snapdragon 8 Elite is the mid-2025 flagship for Android on-device LLM inference. Establishing the NPU-vs-GPU-fallback tradeoff numbers is critical for the Android-on-device guidance.

    Unlocks: /hardware/snapdragon-8-elite, /tools/qualcomm-ai-hub, /stacks/android-on-device-ai

    Reproduce →View original
  • moderate
    Llama 3.2 3B Instruct on Apple A18 Pro

    Mobile on-device LLM viability is the most-asked question in the iPhone-developer ecosystem in 2026. A measured tok/s + battery drain + thermal throttling curve answers 'can I ship this in my app?'

    Unlocks: /hardware/apple-a18-pro, /systems/mobile-local-ai, /stacks/iphone-on-device-ai

    Reproduce →View original
  • hard
    Qwen 3 Coder 32B on NVIDIA GeForce RTX 5090

    The single-5090 baseline is the comparison anchor for every multi-GPU recommendation on this site. Without it, the 'should I just buy one bigger card?' question can't be answered with confidence.

    Unlocks: /hardware/rtx-5090, /guides/choosing-a-gpu-for-local-ai-2026, /guides/running-local-ai-on-multiple-gpus-2026

    Reproduce →View original
  • moderate
    DeepSeek V4 Flash (284B MoE) on (any)

    DeepSeek V4 Flash with the MTP head is claimed to be the throughput leader. Verifying the MTP advantage on production hardware is high-value for V4-Pro-vs-V4-Flash decision-making.

    Unlocks: /hardware-combos/vllm-tensor-parallel-h100-workstation, /stacks/h100-tensor-parallel-workstation, /models/deepseek-v4-flash

    Reproduce →View original
  • moderate
    Qwen 3.5 235B-A17B (MoE) on (any)

    The Apple-vs-NVIDIA comparison at the frontier-MoE tier is the most-asked question for Mac Studio buyers. Editorial estimate is 25-30% of NVIDIA throughput; measured value would close the loop.

    Unlocks: /hardware-combos/mac-studio-m3-ultra-192gb, /stacks/apple-silicon-ai, /will-it-run/combo/mac-studio-m3-ultra-192gb

    Reproduce →View original
  • moderate
    Qwen 3.5 235B-A17B (MoE) on (any)

    The frontier-MoE production reference. Organizations weighing $200k+ DGX-class purchases vs cloud rental need measured throughput to model cost-per-million-tokens accurately.

    Unlocks: /hardware-combos/vllm-tensor-parallel-h100-workstation, /stacks/h100-tensor-parallel-workstation, /will-it-run/combo/vllm-tensor-parallel-h100-workstation

    Reproduce →View original
  • moderate
    Llama 3.3 70B Instruct on (any)

    Pairs with the dual-3090 measurement to quantify the NVLink-vs-PCIe penalty. The 4090 NVLink absence is the single most-misunderstood spec gap; a measured comparison ends the speculation.

    Unlocks: /hardware-combos/dual-rtx-4090, /stacks/dual-4090-workstation, /will-it-run/combo/dual-rtx-4090

    Reproduce →View original
  • moderate
    Llama 3.3 70B Instruct on (any)

    The dual-3090 NVLink build is the most-recommended prosumer multi-GPU configuration on this site. Without a measured benchmark, the 25-32 tok/s estimate carries editorial-only confidence — operators making $1,500+ buying decisions deserve real numbers.

    Unlocks: /hardware-combos/dual-rtx-3090, /stacks/dual-3090-workstation, /will-it-run/combo/dual-rtx-3090

    Reproduce →View original

How reproduction lifts confidence

The four-tier ladder runs editorial-driven, never automatic.

±15% match. Your measurement and the original within 15% on tok/s + matching quant + matching context bucket triggers a confidence lift on the original.

Two independent reproducers. Two distinct operators reproduce the same benchmark → the badge upgrades to independently-reproduced.

Editorial review. Submissions are never auto-published. Editorial reviews each within 1-7 days. Read the trust standards for the full process.

Next recommended step

See the public benchmark roadmap
OrSee cohort coverageRead the reproduction guide