other

13B parameters

Commercial OK

Reviewed May 2026

mGPT 13B

mGPT-13B is a 13B-parameter GPT-3-style model pretrained on 600 GB of deduplicated text spanning 61 languages across 25 language families, sourced from mC4 and Wikipedia. It is a base model — no instruction tuning, no RLHF. MIT-licensed and commercially usable.

License: mit·Context: 2,048 tokens

BLK · VERDICT

Our verdict

OP · Fredoline Eruo|VERIFIED MAY 28, 2026

9.2/10

If you need a commercially clean base model with real Russian and broader post-Soviet language coverage, mGPT-13B is one of the few honest options at this size. Do not deploy it raw expecting chat or instruction-following behavior — it will disappoint. The 2048-token context is a genuine operational constraint worth planning around. Hedge: worth the VRAM only if you intend to fine-tune or have a clear completion-style use case.

›Why this rating

Auto-generated rating (Opus 4.7 judge, claude-opus-4-7). Overall 9.15/10. License is explicitly MIT on the HF card and correctly flagged commercial-ok. Parameter count, vendor, family (gpt2/gpt3-style), and multilingual scope all check out against the card. Context length of 2048 is standard for this GPT-2/3 architecture lineage and is a reasonable default though not explicitly stated in the excerpt — a minor hedge but defensible. The description is honest, concrete, and operator-voiced; weaknesses correctly flag the tight context, lack of instruction tuning, thin community, and unverified low-resource quality. Best use case is sharp (Russian/Turkic/Slavic base for fine-tuning) rather than generic. Brand fit is solid but slightly narrow — this is a fine-tuning substrate, not something a typical local-AI operator runs raw, which the verdict honestly acknowledges.

Flags: - contextLength 2048 not explicitly confirmed in the README excerpt — inferred from GPT-2/3 architecture lineage; should be verified against config.json

Overview

mGPT-13B is a 13B-parameter GPT-3-style model pretrained on 600 GB of deduplicated text spanning 61 languages across 25 language families, sourced from mC4 and Wikipedia. It is a base model — no instruction tuning, no RLHF. MIT-licensed and commercially usable.

Strengths

Genuine multilingual coverage: 61 languages, 25 families, including Slavic, Turkic, and Dravidian groups
Trained on 600 GB of deduplicated data — not a small or hastily assembled corpus
MIT license: no commercial restrictions
One of the few open 13B base models with serious Russian-language pretraining

Weaknesses

2048-token context window is tight by current standards — expect hard cutoffs on longer documents
No instruction tuning: raw completions only, prompt engineering required for any task-shaped output
1,624 HF downloads suggests thin community support — debugging is largely on you
English and high-resource languages likely dominate the corpus; low-resource language quality is unverified beyond perplexity numbers

Quantization variants

Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.

Quantization	File size	VRAM required
Q4_K_M	7.2 GB	10 GB

Get the model

HuggingFace

Original weights

huggingface.co/ai-forever/mGPT-13B

Source repository — direct quantization required.

Hardware that runs this

Cards with enough VRAM for at least one quantization of mGPT 13B.

NVIDIA GB200 NVL72

13824GB · nvidia

AMD Instinct MI350X

NVIDIA B300 (Blackwell Ultra)

288GB · nvidia

AMD Instinct MI355X

AMD Instinct MI325X

AMD Instinct MI300X

192GB · nvidia

NVIDIA H100 NVL

188GB · nvidia

Compare alternatives

Models worth comparing

Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.

Same tier

Models in the same parameter band as this one

Step up

More capable — bigger memory footprint

Step down

Smaller — faster, runs on weaker hardware

Frequently asked

What's the minimum VRAM to run mGPT 13B?

10GB of VRAM is enough to run mGPT 13B at the Q4_K_M quantization (file size 7.2 GB). Higher-quality quantizations need more.

Can I use mGPT 13B commercially?

Yes — mGPT 13B ships under the mit, which permits commercial use. Always read the license text before deployment.

What's the context length of mGPT 13B?

mGPT 13B supports a context window of 2,048 tokens (about 2K).

Source: huggingface.co/ai-forever/mGPT-13B

Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.

Related — keep moving

Compare hardware

Buyer guides

When it doesn't work

Recommended hardware

Before you buy

Verify mGPT 13B runs on your specific hardware before committing money.

Will it run on my hardware? →Custom hardware comparison →GPU recommender (4 questions) →