Phi-4 Reasoning 14B on NVIDIA GeForce RTX 3080 16GB (Mobile)
Measured this month.
Measurement
- tok/s
- 40.4
- TTFT
- 226 ms
- VRAM used
- —
- RAM used
- —
- Power
- —
- Quant
- Q4_K_M
- Context
- 4K
- Run date
- 2026-06-02
- Source
- owner
- Type
- measured
V36.52 rigor detail
Protocol →- Cold-start decode
- 41.02 tok/sTTFT 5282 ms
- Steady-state median
- 40.44 tok/sP5 39.7 · P95 40.5
- Runs captured
- 5
- Scenario
- Single-stream
5-run capture · variance 2.5% · scenario single-stream · runtime ollama
Evidence
What this row provides for independent review. Missing fields lower confidence; they are shown explicitly instead of hidden.
- Source link
- Open source
- Evidence manifest
- Open manifest
- Command
- Available
- Raw logs
- 5 files
- Raw results
- 5 files
- Log hash
- 9d3324def3df...
- Operator
- fred-oline
- Runtime
- ollama version is 0.24.0
- Driver
- 571.96
- CUDA / ROCm / Metal
- 12.8
- OS
- Microsoft Windows [Version 10.0.26200.8457]
- Run count
- 5
- Raw stats
- Available
- Environment notes
- Not provided
Why this confidence tier?
Confidence is rule-based. Every factor below contributed to the tier. We never expose a single numeric score; the tier label is auditable through this explanation alone.
- +Measured by RunLocalAI editorial
- Reproduce this benchmark →An independent reproduction with matching numbers lifts the tier and reduces single-source risk.
- Read the confidence methodology →Full editorial standards for tiering.
- Why we don't use percentages →Tier labels — auditable, no opaque score.
Cohort intelligence
How this measurement compares to the rest of the corpus. Only comparable rows (same model + hardware first, with relaxations labelled) are used. We never average across runtimes or quant formats unless explicitly told to.
Same hardware, different model
12 matching rowsWhat else this rig can run at the same quant bucket.
- 38.3 tok/srtx-3080-16gb-mobileollama version is 0.24.0Q4_K_MEditorial
- 43.3 tok/srtx-3080-16gb-mobileollama version is 0.24.0Q4_K_MEditorial
- 43.4 tok/srtx-3080-16gb-mobileollama version is 0.24.0Q4_K_MEditorial
- 65.7 tok/srtx-3080-16gb-mobileollama version is 0.24.0Q4_K_MEditorial
- 66.0 tok/srtx-3080-16gb-mobileollama version is 0.24.0Q4_K_MEditorial
- +7 more
Reproduce this benchmark
Got the same model + hardware combo? Run the same measurement and submit your numbers. We'll pre-fill model, hardware, quant, and context — you just add your tok/s, VRAM, runtime version. If your numbers match within ±15%, this benchmark gets a confidence lift and a reproduction badge.
Related
Drill into the entity pages for this measurement.
Cite or export
Reference this benchmark in your work. Multiple formats; CC-BY attribution required.
Cite this benchmark or paste it into a README. Copy-to-clipboard; license is CC-BY-4.0 (attribution to RunLocalAI required).
<a href="https://www.runlocalai.co/benchmarks/384" rel="noopener">RunLocalAI: Phi-4 Reasoning 14B on NVIDIA GeForce RTX 3080 16GB (Mobile) — 40.4 tok/s</a>
Next recommended step
Got the same model + hardware? Run it and submit your numbers — successful reproductions lift this benchmark's confidence tier.