Turkish Llama 8B Instruct v0.1
Llama 3 8B continued pre-trained on Turkish corpora, then instruction-tuned for Turkish chat. YTU CE COSMOS group's most-downloaded Llama variant. GGUF builds available — drops into Ollama directly.
Overview
Llama 3 8B continued pre-trained on Turkish corpora, then instruction-tuned for Turkish chat. YTU CE COSMOS group's most-downloaded Llama variant. GGUF builds available — drops into Ollama directly.
Strengths
- Llama 3 base — proven architecture with broad tooling support
- GGUF builds shipped alongside transformers checkpoint
- Solid Turkish output quality at the 8B size class
Weaknesses
- Continued pre-training can hurt English capability — use a different model if you need bilingual
- 8K context inherited from Llama 3 base
- v0.1 — newer YTU releases may have superseded this
Reviewed quality benchmarks
First-party rows were run by RunLocalAI; reviewed community rows are labeled in the data. Every row links to the raw test-run log.
| Benchmark | Quant | Runtime / Hardware | Score | Raw log |
|---|---|---|---|---|
TurkishMMLU (Generative) tested 2026-05-26 | Q4_K_M | ollama-0.24 rtx-3080 | 11.0/100 | Gist → |
TurkishMMLU (Generative) tested 2026-05-28 | Q4_K_M | ollama-0.24 rtx-3080-16gb-mobile | 11.0/100 | Gist → |
Q4_K_M note:Baseline run on Ollama 0.24 with default 2048 context window. Score is below the 20% random-guess baseline — strong indicator that 5-shot Turkish prompts (which average ~2000 tokens due to morphology) were silently truncated by Ollama. Re-run with --num-ctx 8192 expected to land 30-45%. Published as-is so the methodology improvement is measurable; this row is intentionally NOT promoted to 'verified'.
Q4_K_M note:Re-run on RTX 3080 Laptop (16 GB) with `num_ctx=8192` to test the earlier hypothesis that the prior 11% score was caused by Ollama's default 2048-context window truncating 5-shot Turkish prompts. The re-run **landed at the same 11.00%**, ruling out the truncation hypothesis. The honest reading: Turkish-Llama-8B-Instruct-v0.1 was trained as a **Turkish conversational** model, not a multi-choice reasoning model. It speaks Turkish fluently but underperforms even the 20% random-guess baseline on TurkishMMLU's scientific/historical/literary subjects. Per-subject results: Biology 13%, Chemistry 6%, Geography 15% est., History 11%, Mathematics 12-15% est., Philosophy 15%, Physics 11%, Religious Culture & Ethics 10%, Turkish Language & Literature 12%. Use this model for chat/customer-service Turkish, not for structured Q&A. Higher-knowledge Turkish models (Trendyol Asure 12B at 58.89%) are the right anchor for general-knowledge use cases.
Want to verify? Every row links to its Gist with full stdout and stderr of the run. The runner script is in the public repo (scripts/run-humaneval-plus.ts) — reproducible end-to-end. Browse all coding scores at /benchmarks/coding.
Quantization variants
Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.
| Quantization | File size | VRAM required |
|---|---|---|
| Q4_K_M | 4.4 GB | 6 GB |
Get the model
HuggingFace
Original weights
Source repository — direct quantization required.
Hardware that runs this
Cards with enough VRAM for at least one quantization of Turkish Llama 8B Instruct v0.1.
Models worth comparing
Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.
Frequently asked
What's the minimum VRAM to run Turkish Llama 8B Instruct v0.1?
Can I use Turkish Llama 8B Instruct v0.1 commercially?
What's the context length of Turkish Llama 8B Instruct v0.1?
Source: huggingface.co/ytu-ce-cosmos/Turkish-Llama-8b-Instruct-v0.1
Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.
Related — keep moving
Verify Turkish Llama 8B Instruct v0.1 runs on your specific hardware before committing money.