Kanarya 2B
Turkish-from-scratch language model trained by Ali Safaya (Koç University researcher). Named after the kanarya (Turkish for 'canary'). Trained on 250+ GB of Turkish text including Wikipedia, news, and books.
Overview
Turkish-from-scratch language model trained by Ali Safaya (Koç University researcher). Named after the kanarya (Turkish for 'canary'). Trained on 250+ GB of Turkish text including Wikipedia, news, and books.
Strengths
- Trained from scratch on Turkish — not a fine-tune; tokenizer is purpose-built for Turkish morphology
- Apache-2.0 license; academic and commercial use both unrestricted
- Backed by a published paper and reproducible training recipe
Weaknesses
- Base model without instruction tuning — needs prompting in completion style, not chat
- 2K context — very short, suitable for short-form generation only
- Older release; lacks modern post-training (no RLHF/DPO)
Quantization variants
Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.
| Quantization | File size | VRAM required |
|---|---|---|
| Q4_K_M | 1.1 GB | 2 GB |
Get the model
HuggingFace
Original weights
Source repository — direct quantization required.
Hardware that runs this
Cards with enough VRAM for at least one quantization of Kanarya 2B.
Models worth comparing
Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.
Frequently asked
What's the minimum VRAM to run Kanarya 2B?
Can I use Kanarya 2B commercially?
What's the context length of Kanarya 2B?
Source: huggingface.co/asafaya/kanarya-2b
Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.
Related — keep moving
Verify Kanarya 2B runs on your specific hardware before committing money.