mistral
2.4B parameters
Commercial OK
Reviewed May 2026

Kumru 2B

Kumru 2B is a compact Turkish text-generation model from VNGRS. The Hugging Face config reports a Mistral-family architecture with an 8K context window, and the public Ollama build makes it a practical edge-speed Turkish model.

License: Apache-2.0·Context: 8,192 tokens

Overview

Kumru 2B is a compact Turkish text-generation model from VNGRS. The Hugging Face config reports a Mistral-family architecture with an 8K context window, and the public Ollama build makes it a practical edge-speed Turkish model.

Strengths

  • Very high local throughput on consumer GPUs
  • Apache-2.0 license with commercial use allowed
  • Small enough for low-VRAM Turkish assistants

Weaknesses

  • 2B-class quality ceiling on complex reasoning
  • Shorter context than current 32K Turkish Mistral derivatives
  • Best suited to short Turkish interactions, not deep research

Quantization variants

Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.

QuantizationFile sizeVRAM required
Q4_K_M1.5 GB2 GB

Get the model

Ollama

One-line install

ollama run alibayram/kumru:latestRead our Ollama review →

HuggingFace

Original weights

huggingface.co/vngrs-ai/Kumru-2B

Source repository — direct quantization required.

Benchmarks

Real measurements on real hardware. Numbers ship with the runner version, quant, and date.

2 runs on record
HardwareProvenanceQuantCtxTokens / secVRAMTTFTDate
NVIDIA GeForce RTX 5080
EditorialM
Q4_K_M2K
443.7tok/s
May 28, 26
NVIDIA GeForce RTX 3080 16GB (Mobile)
EditorialM
Q4_K_M4K
174.2tok/s
129 msJun 2, 26

What to do next

Got this model running on real hardware? Share what you measured — the form arrives with the model pre-selected.

Hardware that runs this

Cards with enough VRAM for at least one quantization of Kumru 2B.

Compare alternatives

Models worth comparing

Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.

Frequently asked

What's the minimum VRAM to run Kumru 2B?

2GB of VRAM is enough to run Kumru 2B at the Q4_K_M quantization (file size 1.5 GB). Higher-quality quantizations need more.

Can I use Kumru 2B commercially?

Yes — Kumru 2B ships under the Apache-2.0, which permits commercial use. Always read the license text before deployment.

What's the context length of Kumru 2B?

Kumru 2B supports a context window of 8,192 tokens (about 8K).

How do I install Kumru 2B with Ollama?

Run `ollama pull alibayram/kumru:latest` to download, then `ollama run alibayram/kumru:latest` to start a chat session. The default quantization is Q4_K_M.

Source: huggingface.co/vngrs-ai/Kumru-2B

Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.

Related — keep moving

Before you buy

Verify Kumru 2B runs on your specific hardware before committing money.