YTU Turkish Gemma 9B v0.1 on NVIDIA GeForce RTX 3080 16GB (Mobile)
Measured this month.
Measurement
- tok/s
- 66.0
- TTFT
- 369 ms
- VRAM used
- —
- RAM used
- —
- Power
- —
- Quant
- Q4_K_M
- Context
- 4K
- Run date
- 2026-06-02
- Source
- owner
- Type
- measured
V36.52 rigor detail
Protocol →- Cold-start decode
- 66.61 tok/sTTFT 4818 ms
- Steady-state median
- 66.00 tok/sP5 65.7 · P95 66.8
- Runs captured
- 5
- Scenario
- Single-stream
5-run capture · variance 1.9% · scenario single-stream · runtime ollama
Evidence
What this row provides for independent review. Missing fields lower confidence; they are shown explicitly instead of hidden.
- Source link
- Open source
- Evidence manifest
- Open manifest
- Command
- Available
- Raw logs
- 5 files
- Raw results
- 5 files
- Log hash
- 8ac298362c2b...
- Operator
- fred-oline
- Runtime
- ollama version is 0.24.0
- Driver
- 571.96
- CUDA / ROCm / Metal
- 12.8
- OS
- Microsoft Windows [Version 10.0.26200.8457]
- Run count
- 5
- Raw stats
- Available
- Environment notes
- Not provided
Why this confidence tier?
Confidence is rule-based. Every factor below contributed to the tier. We never expose a single numeric score; the tier label is auditable through this explanation alone.
- +Measured by RunLocalAI editorial
- Reproduce this benchmark →An independent reproduction with matching numbers lifts the tier and reduces single-source risk.
- Read the confidence methodology →Full editorial standards for tiering.
- Why we don't use percentages →Tier labels — auditable, no opaque score.
Cohort intelligence
How this measurement compares to the rest of the corpus. Only comparable rows (same model + hardware first, with relaxations labelled) are used. We never average across runtimes or quant formats unless explicitly told to.
Same model, different hardware
1 matching rowWhat this model looks like on adjacent hardware. Drives the 'should I upgrade?' question.
- 101.1 tok/srtx-5080ollama-local-apiQ4_K_MEditorial
Same hardware, different model
12 matching rowsWhat else this rig can run at the same quant bucket.
- 65.7 tok/srtx-3080-16gb-mobileollama version is 0.24.0Q4_K_MEditorial
- 67.0 tok/srtx-3080-16gb-mobileollama version is 0.24.0Q4_K_MEditorial
- 68.2 tok/srtx-3080-16gb-mobileollama version is 0.24.0Q4_K_MEditorial
- 78.1 tok/srtx-3080-16gb-mobileollama version is 0.24.0Q4_K_MEditorial
- 79.3 tok/srtx-3080-16gb-mobileollama version is 0.24.0Q4_K_MEditorial
- +7 more
Reproduce this benchmark
Got the same model + hardware combo? Run the same measurement and submit your numbers. We'll pre-fill model, hardware, quant, and context — you just add your tok/s, VRAM, runtime version. If your numbers match within ±15%, this benchmark gets a confidence lift and a reproduction badge.
Related
Drill into the entity pages for this measurement.
Cite or export
Reference this benchmark in your work. Multiple formats; CC-BY attribution required.
Cite this benchmark or paste it into a README. Copy-to-clipboard; license is CC-BY-4.0 (attribution to RunLocalAI required).
<a href="https://www.runlocalai.co/benchmarks/367" rel="noopener">RunLocalAI: YTU Turkish Gemma 9B v0.1 on NVIDIA GeForce RTX 3080 16GB (Mobile) — 66.0 tok/s</a>
Next recommended step
Got the same model + hardware? Run it and submit your numbers — successful reproductions lift this benchmark's confidence tier.