RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Benchmarks
Original benchmark dataset

Local LLM benchmarks

Tokens-per-second rows collected on owner hardware or from cited, reproduced sources. Every public row ships with a confidence badge so you know which numbers are owner-measured, independently reproduced, or source-backed.

MMeasuredM~Measured-nearCSource-backed~ExtrapolatedEEstimatedHow we measure →
Browse all results ->Submit a result ->Read methodology ->See benchmark roadmap →Mobile + edge gap report →
Confidence engine
39 strong
Coverage map
0 wanted
Regression watch
version-aware
Reproduce
protocol + queue
See something off?Submit a benchmarkWe read every submission. Editorial review takes 1-7 days.
Selected coverage slice

22 measured or reviewed / 0 estimated / 0 wanted / 6 unstarted

78.6% of 28 selected cells measured or reviewed

22 measured or reviewed cells, 0 estimated cells, 0 wanted cells, and 6 unstarted cells.

Selected benchmark coverage by hardware and model. Measured, reviewed, estimated, and wanted cells are visually distinct.
Hardware / Model
Trendyol LLM Asure
Kumru 2B
Mistral Turkish v2
Turkcell LLM 7B v1
YTU Turkish Gemma
RefinedNeuro RN TR
RefinedNeuro RN TR
Malhajar Mistral 7
Llama 3.1 8B Instr
Qwen 2.5 Coder 14B
Llama 3.2 1B Instr
CodeGemma 7B
DeepSeek Coder V2
DeepSeek R1 Distil
NVIDIA GeForce RTX 3080 16GB
43Trendyol LLM Asure 12B on NVIDIA GeForce RTX 3080 16GB (Mobile): RunLocalAI measured 43.4 tok/s, high confidence
174Kumru 2B on NVIDIA GeForce RTX 3080 16GB (Mobile): RunLocalAI measured 174.2 tok/s, high confidence
107Mistral Turkish v2 (brooqs) on NVIDIA GeForce RTX 3080 16GB (Mobile): RunLocalAI measured 106.8 tok/s, high confidence
86Turkcell LLM 7B v1 on NVIDIA GeForce RTX 3080 16GB (Mobile): RunLocalAI measured 85.8 tok/s, high confidence
66YTU Turkish Gemma 9B v0.1 on NVIDIA GeForce RTX 3080 16GB (Mobile): RunLocalAI measured 66.0 tok/s, high confidence
80RefinedNeuro RN TR R1 on NVIDIA GeForce RTX 3080 16GB (Mobile): RunLocalAI measured 79.9 tok/s, high confidence
79RefinedNeuro RN TR R2 on NVIDIA GeForce RTX 3080 16GB (Mobile): RunLocalAI measured 79.3 tok/s, high confidence
87Malhajar Mistral 7B Turkish on NVIDIA GeForce RTX 3080 16GB (Mobile): RunLocalAI measured 87.3 tok/s, high confidence
Llama 3.1 8B Instruct on NVIDIA GeForce RTX 3080 16GB (Mobile): not started
Qwen 2.5 Coder 14B Instruct on NVIDIA GeForce RTX 3080 16GB (Mobile): not started
190Llama 3.2 1B Instruct on NVIDIA GeForce RTX 3080 16GB (Mobile): RunLocalAI measured 189.5 tok/s, high confidence
81CodeGemma 7B on NVIDIA GeForce RTX 3080 16GB (Mobile): RunLocalAI measured 80.6 tok/s, high confidence
152DeepSeek Coder V2 Lite (16B) on NVIDIA GeForce RTX 3080 16GB (Mobile): RunLocalAI measured 152.0 tok/s, high confidence
80DeepSeek R1 Distill Qwen 7B on NVIDIA GeForce RTX 3080 16GB (Mobile): RunLocalAI measured 80.3 tok/s, high confidence
NVIDIA GeForce RTX 5080
82Trendyol LLM Asure 12B on NVIDIA GeForce RTX 5080: RunLocalAI measured 82.0 tok/s, high confidence
444Kumru 2B on NVIDIA GeForce RTX 5080: RunLocalAI measured 443.7 tok/s, high confidence
161Mistral Turkish v2 (brooqs) on NVIDIA GeForce RTX 5080: RunLocalAI measured 161.1 tok/s, high confidence
145Turkcell LLM 7B v1 on NVIDIA GeForce RTX 5080: RunLocalAI measured 145.1 tok/s, high confidence
101YTU Turkish Gemma 9B v0.1 on NVIDIA GeForce RTX 5080: RunLocalAI measured 101.1 tok/s, high confidence
134RefinedNeuro RN TR R1 on NVIDIA GeForce RTX 5080: RunLocalAI measured 133.6 tok/s, high confidence
133RefinedNeuro RN TR R2 on NVIDIA GeForce RTX 5080: RunLocalAI measured 133.4 tok/s, high confidence
130Malhajar Mistral 7B Turkish on NVIDIA GeForce RTX 5080: RunLocalAI measured 130.4 tok/s, high confidence
136Llama 3.1 8B Instruct on NVIDIA GeForce RTX 5080: RunLocalAI measured 135.6 tok/s, high confidence
79Qwen 2.5 Coder 14B Instruct on NVIDIA GeForce RTX 5080: RunLocalAI measured 79.0 tok/s, high confidence
Llama 3.2 1B Instruct on NVIDIA GeForce RTX 5080: not started
CodeGemma 7B on NVIDIA GeForce RTX 5080: not started
DeepSeek Coder V2 Lite (16B) on NVIDIA GeForce RTX 5080: not started
DeepSeek R1 Distill Qwen 7B on NVIDIA GeForce RTX 5080: not started
RunLocalAI measuredReproducedVendor-publishedCommunity-reviewedLow/unverified measuredEstimatedCritical wantedWantedNot started
Fast measured/reviewed cells
  • Kumru 2B on NVIDIA GeForce RTX 5080: 443.7 tok/s
  • Llama 3.2 1B Instruct on NVIDIA GeForce RTX 3080 16GB (Mobile): 189.5 tok/s
  • Kumru 2B on NVIDIA GeForce RTX 3080 16GB (Mobile): 174.2 tok/s
  • Mistral Turkish v2 (brooqs) on NVIDIA GeForce RTX 5080: 161.1 tok/s
Dataset provenance split

Public evidence split

39 rows
Dataset provenance splitStacked bar showing the share of editorial, reproduced, community- reproduced, and excluded estimated rows out of 39 total rows.100%
  1. 1.
    Editorial (measured)39 (100%)
    Measured by RunLocalAI on owner hardware.
  2. 2.
    Reproduced0 (0%)
    Independently reproduced by an operator on similar hardware.
  3. 3.
    Community reproduced0 (0%)
    Submitted by an operator and reproduced before public evidence use.
  4. 4.
    Excluded estimates0 (0%)
    Formula-derived rows are excluded from public benchmark evidence. How we estimate →
Source: RunLocalAI benchmark + community-benchmark tables. Editorial rows are owner-measured; reproduced rows are owner-measured numbers confirmed by an independent operator; community rows are operator submissions that passed reproduction; estimated rows are excluded from public evidence surfaces. See scoring methodology.
Dataset provenance heuristic

How strong is the visible provenance?

39 rows
Dataset provenance heuristicHorizontal stacked bar showing how the 39 rows in the benchmark dataset divide across very-high, high, moderate, and low confidence tiers. Buckets are a source/reproduction heuristic, not a promise that every row has complete forensic evidence.High39 · 100%
  1. 1.
    Very high0 (0%)
    Owner-measured AND reproduced by an independent operator.
  2. 2.
    High39 (100%)
    Owner-measured single-source OR community independently-reproduced.
  3. 3.
    Moderate0 (0%)
    Community-submitted: editorially approved, reproduced once.
  4. 4.
    Low0 (0%)
    Estimated / unverified / single anecdote — directional only.
This chart is a provenance heuristic derived from source + reproduction status. Row-level pages show the stricter evidence and confidence details on /benchmarks/[id] detail pages. See Confidence methodology.
Benchmark aging heatmap

How fresh is the dataset, month by month?

39 rows · last 18+ 3 mo
Benchmark aging heatmapHorizontal strip of 21 cells, one per month. Each cell's color encodes the count of public benchmarks whose run or publication date falls in that month: empty for zero, light emerald for one to two, medium emerald for three to five, deep emerald for six or more. The leftmost 3 cells fade toward amber and rose to mark months past the 18-month staleness boundary.Oct'24NovDecJan'25FebMarAprMayJunJulAugSepOctNovDecJan'26FebMarApr12May27Jun
  • empty
  • 1-2 rows
  • 3-5 rows
  • ≥6 rows
  • stale (18m+)
How fresh is the dataset? Each cell is one calendar month. Older cells fade. See confidence methodology.
Top hardware by median tokens / sec

Throughput leaderboard

Methodology →
Top hardware by median tokens per secondHorizontal bar chart of the top 2 hardware ranked by median tokens per second across measured benchmark runs. Bar color encodes sample size: emerald for five or more runs, amber outline for two to four runs, slate for a single run.NVIDIA GeForce RTX 5080132tok/s · n=12NVIDIA GeForce RTX 3080 16…81tok/s · n=27≥5 runs2-4 runs1 run (undersampled)
  • NVIDIA GeForce RTX 5080✓Editorial
  • NVIDIA GeForce RTX 3080 16GB (Mobile)✓Editorial
Median tokens-per-second per hardware across 39 measured runs in our dataset. Bar color reflects sample size — bars built from one run are deliberately rendered light to signal undersampling. See methodology.

Latest 39 runs

Sorted by date. Click a model or hardware name to drill into the full record.

ModelHardwareProvenanceQuantCtxTokens / secTTFTDate
Turkcell LLM 7B v1NVIDIA GeForce RTX 3080 16GB (Mobile)
✓EditorialM
Q4_K_M4K
85.8tok/s
96 msJun 2, 26
Hermes 3 Llama 3.1 8BNVIDIA GeForce RTX 3080 16GB (Mobile)
✓EditorialM
Q4_K_M4K
81.5tok/s
357 msJun 2, 26
Malhajar Mistral 7B TurkishNVIDIA GeForce RTX 3080 16GB (Mobile)
✓EditorialM
Q4_K_M4K
87.3tok/s
74 msJun 2, 26
Llama 3.2 11B Vision InstructNVIDIA GeForce RTX 3080 16GB (Mobile)
✓EditorialM
Q4_K_M4K
67.0tok/s
411 msJun 2, 26
Mistral 7B Instruct v0.3NVIDIA GeForce RTX 3080 16GB (Mobile)
✓EditorialM
Q4_K_M4K
89.6tok/s
80 msJun 2, 26
Mistral Nemo 12B InstructNVIDIA GeForce RTX 3080 16GB (Mobile)
✓EditorialM
Q4_K_M4K
65.7tok/s
367 msJun 2, 26
Phi-3.5 Mini InstructNVIDIA GeForce RTX 3080 16GB (Mobile)
✓EditorialM
Q4_K_M4K
155.4tok/s
66 msJun 2, 26
Phi-4 Reasoning 14BNVIDIA GeForce RTX 3080 16GB (Mobile)
✓EditorialM
Q4_K_M4K
40.4tok/s
226 msJun 2, 26
Qwen 2.5 7B InstructNVIDIA GeForce RTX 3080 16GB (Mobile)
✓EditorialM
Q4_K_M4K
80.4tok/s
335 msJun 2, 26
Qwen 3 14BNVIDIA GeForce RTX 3080 16GB (Mobile)
✓EditorialM
Q4_K_M4K
38.3tok/s
310 msJun 2, 26
Qwen 3 4BNVIDIA GeForce RTX 3080 16GB (Mobile)
✓EditorialM
Q4_K_M4K
103.7tok/s
303 msJun 2, 26
RefinedNeuro RN TR R1NVIDIA GeForce RTX 3080 16GB (Mobile)
✓EditorialM
Q4_K_M4K
79.9tok/s
361 msJun 2, 26
RefinedNeuro RN TR R2NVIDIA GeForce RTX 3080 16GB (Mobile)
✓EditorialM
Q4_K_M4K
79.3tok/s
366 msJun 2, 26
Llama 3.2 1B InstructNVIDIA GeForce RTX 3080 16GB (Mobile)
✓EditorialM
Q4_K_M4K
189.5tok/s
359 msJun 2, 26
Kumru 2BNVIDIA GeForce RTX 3080 16GB (Mobile)
✓EditorialM
Q4_K_M4K
174.2tok/s
129 msJun 2, 26
Trendyol LLM Asure 12BNVIDIA GeForce RTX 3080 16GB (Mobile)
✓EditorialM
Q4_K_M4K
43.4tok/s
391 msJun 2, 26
YTU Turkish Gemma 9B v0.1NVIDIA GeForce RTX 3080 16GB (Mobile)
✓EditorialM
Q4_K_M4K
66.0tok/s
369 msJun 2, 26
Mistral Turkish v2 (brooqs)NVIDIA GeForce RTX 3080 16GB (Mobile)
✓EditorialM
Q4_K_M4K
106.8tok/s
73 msJun 2, 26
CodeGemma 7BNVIDIA GeForce RTX 3080 16GB (Mobile)
✓EditorialM
Q4_K_M4K
80.6tok/s
383 msJun 2, 26
DeepSeek Coder V2 Lite (16B)NVIDIA GeForce RTX 3080 16GB (Mobile)
✓EditorialM
Q4_K_M4K
152.0tok/s
211 msJun 2, 26
DeepSeek R1 Distill Qwen 7BNVIDIA GeForce RTX 3080 16GB (Mobile)
✓EditorialM
Q4_K_M4K
80.3tok/s
300 msJun 2, 26
Gemma 2 9B InstructNVIDIA GeForce RTX 3080 16GB (Mobile)
✓EditorialM
Q4_K_M4K
68.2tok/s
358 msJun 2, 26
Gemma 3 12BNVIDIA GeForce RTX 3080 16GB (Mobile)
✓EditorialM
Q4_K_M4K
43.3tok/s
767 msJun 2, 26
Gemma 3 1BNVIDIA GeForce RTX 3080 16GB (Mobile)
✓EditorialM
Q4_K_M4K
160.4tok/s
790 msJun 2, 26
Gemma 3 4BNVIDIA GeForce RTX 3080 16GB (Mobile)
✓EditorialM
Q4_K_M4K
97.7tok/s
743 msJun 2, 26
Gemma 4 E2B (Effective 2B)NVIDIA GeForce RTX 3080 16GB (Mobile)
✓EditorialM
Q4_K_M4K
99.1tok/s
792 msJun 2, 26
Gemma 4 E4B (Effective 4B)NVIDIA GeForce RTX 3080 16GB (Mobile)
✓EditorialM
Q4_K_M4K
78.1tok/s
790 msJun 2, 26
Trendyol LLM Asure 12BNVIDIA GeForce RTX 5080
✓EditorialM
Q4_K_M4K
82.0tok/s
136 msMay 28, 26
Llama 3.1 8B InstructNVIDIA GeForce RTX 5080
✓EditorialM
Q4_K_M4K
135.6tok/s
130 msMay 28, 26
Qwen 2.5 Coder 14B InstructNVIDIA GeForce RTX 5080
✓EditorialM
Q4_K_M4K
79.0tok/s
117 msMay 28, 26
Kumru 2BNVIDIA GeForce RTX 5080
✓EditorialM
Q4_K_M2K
443.7tok/s
—May 28, 26
Mistral Turkish v2 (brooqs)NVIDIA GeForce RTX 5080
✓EditorialM
Q4_02K
161.1tok/s
—May 28, 26
Turkcell LLM 7B v1NVIDIA GeForce RTX 5080
✓EditorialM
Q4_K_M2K
145.1tok/s
—May 28, 26
YTU Turkish Gemma 9B v0.1NVIDIA GeForce RTX 5080
✓EditorialM
Q4_K_M2K
101.1tok/s
—May 28, 26
Trendyol LLM Asure 12BNVIDIA GeForce RTX 5080
✓EditorialM
unknown2K
79.1tok/s
—May 28, 26
RefinedNeuro RN TR R1NVIDIA GeForce RTX 5080
✓EditorialM
Q4_K_M2K
133.6tok/s
—May 28, 26
RefinedNeuro RN TR R2NVIDIA GeForce RTX 5080
✓EditorialM
Q4_K_M2K
133.4tok/s
—May 28, 26
Malhajar Mistral 7B TurkishNVIDIA GeForce RTX 5080
✓EditorialM
Q5_K_M2K
130.4tok/s
—May 28, 26
Trendyol LLM Asure 12BNVIDIA GeForce RTX 5080
✓EditorialM
Q4_K_M8K
61.5tok/s
323 msMay 27, 26
§How we measure
Every measured benchmark records the runner version, driver version, prompt, and date — shown in full on each benchmark's own detail page (the summary tables list the headline metrics). Predictions are graded with confidence badges (M / C / ~ / E) so you know which numbers to trust for purchasing decisions. Read the methodology →
Get monthly local AI changes

Monthly recap of local-AI changes. No spam, unsubscribe with one click.