mGPT 13B
mGPT-13B is a 13B-parameter GPT-3-style model pretrained on 600 GB of deduplicated text spanning 61 languages across 25 language families, sourced from mC4 and Wikipedia. It is a base model — no instruction tuning, no RLHF. MIT-licensed and commercially usable.
If you need a commercially clean base model with real Russian and broader post-Soviet language coverage, mGPT-13B is one of the few honest options at this size. Do not deploy it raw expecting chat or instruction-following behavior — it will disappoint. The 2048-token context is a genuine operational constraint worth planning around. Hedge: worth the VRAM only if you intend to fine-tune or have a clear completion-style use case.
›Why this rating
Auto-generated rating (Opus 4.7 judge, claude-opus-4-7). Overall 9.15/10. License is explicitly MIT on the HF card and correctly flagged commercial-ok. Parameter count, vendor, family (gpt2/gpt3-style), and multilingual scope all check out against the card. Context length of 2048 is standard for this GPT-2/3 architecture lineage and is a reasonable default though not explicitly stated in the excerpt — a minor hedge but defensible. The description is honest, concrete, and operator-voiced; weaknesses correctly flag the tight context, lack of instruction tuning, thin community, and unverified low-resource quality. Best use case is sharp (Russian/Turkic/Slavic base for fine-tuning) rather than generic. Brand fit is solid but slightly narrow — this is a fine-tuning substrate, not something a typical local-AI operator runs raw, which the verdict honestly acknowledges.
Flags: - contextLength 2048 not explicitly confirmed in the README excerpt — inferred from GPT-2/3 architecture lineage; should be verified against config.json
Overview
mGPT-13B is a 13B-parameter GPT-3-style model pretrained on 600 GB of deduplicated text spanning 61 languages across 25 language families, sourced from mC4 and Wikipedia. It is a base model — no instruction tuning, no RLHF. MIT-licensed and commercially usable.
Strengths
- Genuine multilingual coverage: 61 languages, 25 families, including Slavic, Turkic, and Dravidian groups
- Trained on 600 GB of deduplicated data — not a small or hastily assembled corpus
- MIT license: no commercial restrictions
- One of the few open 13B base models with serious Russian-language pretraining
Weaknesses
- 2048-token context window is tight by current standards — expect hard cutoffs on longer documents
- No instruction tuning: raw completions only, prompt engineering required for any task-shaped output
- 1,624 HF downloads suggests thin community support — debugging is largely on you
- English and high-resource languages likely dominate the corpus; low-resource language quality is unverified beyond perplexity numbers
Quantization variants
Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.
| Quantization | File size | VRAM required |
|---|---|---|
| Q4_K_M | 7.2 GB | 10 GB |
Get the model
HuggingFace
Original weights
Source repository — direct quantization required.
Hardware that runs this
Cards with enough VRAM for at least one quantization of mGPT 13B.
Models worth comparing
Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.
Frequently asked
What's the minimum VRAM to run mGPT 13B?
Can I use mGPT 13B commercially?
What's the context length of mGPT 13B?
Source: huggingface.co/ai-forever/mGPT-13B
Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.
Related — keep moving
Verify mGPT 13B runs on your specific hardware before committing money.