Kanarya 750M
Smaller Kanarya variant — 750M parameters. Runs on CPU or 4GB GPU comfortably. Useful for low-resource Turkish text classification, embeddings, or completion tasks where latency matters more than quality.
Overview
Smaller Kanarya variant — 750M parameters. Runs on CPU or 4GB GPU comfortably. Useful for low-resource Turkish text classification, embeddings, or completion tasks where latency matters more than quality.
Strengths
- Runs on CPU at usable speeds for short completions
- Same Turkish-native tokenizer as Kanarya 2B
- Fits in 2GB VRAM — phone-class hardware viable
Weaknesses
- Quality cap is materially below 2B sibling; only pick this if you need the size
- 2K context limit
- Base completion model only — no chat template
Quantization variants
Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.
| Quantization | File size | VRAM required |
|---|---|---|
| Q4_K_M | 0.4 GB | 1 GB |
Get the model
HuggingFace
Original weights
Source repository — direct quantization required.
Hardware that runs this
Cards with enough VRAM for at least one quantization of Kanarya 750M.
Models worth comparing
Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.
Frequently asked
What's the minimum VRAM to run Kanarya 750M?
Can I use Kanarya 750M commercially?
What's the context length of Kanarya 750M?
Source: huggingface.co/asafaya/kanarya-750m
Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.
Related — keep moving
Verify Kanarya 750M runs on your specific hardware before committing money.