olmo
13B parameters
Commercial OK
Reviewed June 2026

OLMo 2 13B

AI2's fully-open 13B. Apache 2.0; full training data + checkpoints + recipes published. The reproducibility-first model in the 13B class.

License: Apache 2.0·Released Nov 26, 2024·Context: 4,096 tokens
BLK · VERDICT

Our verdict

OP · Fredoline Eruo|VERIFIED JUN 12, 2026
unrated

Positioning

OLMo 2 13B is AI2's fully-open dense 13B model, released under Apache 2.0 with a 4,096-token context window. Its defining characteristic is radical transparency: the vendor publishes the full training data, all intermediate checkpoints, and the complete training recipe. This makes it the reproducibility-first entry in the 13B weight class — a model designed not just for inference but for understanding how language models are built.

Strengths

  • Fully open, no strings attached. Apache 2.0 license permits commercial use, modification, and redistribution without restrictions. The complete training pipeline is public, enabling researchers to audit or replicate results.
  • Reproducibility as a feature. Unlike most open-weight models that only release final weights, AI2 provides the entire training data and recipe. This is invaluable for academic research, ablation studies, and educational use.
  • Dense architecture simplifies deployment. With 13B parameters in a dense configuration, inference cost scales predictably. No MoE routing overhead or expert balancing concerns — straightforward to serve on consumer hardware.
  • Small context window keeps memory predictable. 4,096 tokens means KV cache overhead is modest. At Q4_K_M (~7.3 GB on disk), the model fits comfortably on a 12 GB GPU with room for context.

Limitations

  • 4,096-token context is short by modern standards. Many contemporary models offer 32K, 128K, or longer. This limits use cases like long-document analysis or multi-turn conversations with extensive history.
  • We do not have community-verified benchmark scores for this model. Operators evaluating it should treat any published vendor metrics as best-case and conduct their own testing on representative tasks.
  • No architectural innovations for efficiency. As a dense 13B model, it requires more compute per token than a comparably sized MoE model with lower active parameters. The trade-off is simpler deployment.
  • Smaller ecosystem than popular families. Fewer community quantizations, fine-tuned variants, and tool integrations exist compared to Llama or Mistral families. Operators may need to build custom tooling.

What it takes to run this locally

OLMo 2 13B fits into the consumer deployment class. Quantized sizes range from 26 GB (FP16) down to ~4.2 GB (Q2_K). For a practical balance of quality and memory, Q4_K_M (7.3 GB) or Q5_K_M (~9.3 GB) are recommended. Add ~30–50% overhead for KV cache and framework memory at the 4,096-token context length. A single 12–24 GB GPU (e.g., RTX 3060 12 GB, RTX 3090 24 GB) can run the model comfortably at Q4_K_M or Q5_K_M. CPU inference is possible with sufficient RAM (16 GB+ for quantized versions).

Should you run this locally?

Yes if you prioritize reproducibility, need a permissive Apache 2.0 license for commercial deployment, or want to study the full training pipeline. The dense architecture and small context make it easy to get started on consumer hardware.

No if you need long-context capabilities (beyond 4K tokens), require a model with extensive community tooling and fine-tuned variants, or are looking for the highest raw performance per parameter in the 13B class — without benchmark data, that comparison cannot be made.

Catalog cross-links

  • OLMo 1B — smaller sibling in the same fully-open family.
  • Llama 3.2 3B — another Apache 2.0 model, but with a different transparency philosophy.
  • Consumer GPU guide — hardware recommendations for running 13B-class models.

Overview

AI2's fully-open 13B. Apache 2.0; full training data + checkpoints + recipes published. The reproducibility-first model in the 13B class.

Strengths

  • Fully-open training data
  • Apache 2.0
  • AI2 research backing

Weaknesses

  • Trails Qwen 2.5 / Llama 3.x on benchmarks

Quantization variants

Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.

QuantizationFile sizeVRAM required
Q4_K_M7.8 GB10 GB

Get the model

HuggingFace

Original weights

huggingface.co/allenai/OLMo-2-1124-13B-Instruct

Source repository — direct quantization required.

Hardware that runs this

Cards with enough VRAM for at least one quantization of OLMo 2 13B.

Compare alternatives

Models worth comparing

Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.

Frequently asked

What's the minimum VRAM to run OLMo 2 13B?

10GB of VRAM is enough to run OLMo 2 13B at the Q4_K_M quantization (file size 7.8 GB). Higher-quality quantizations need more.

Can I use OLMo 2 13B commercially?

Yes — OLMo 2 13B ships under the Apache 2.0, which permits commercial use. Always read the license text before deployment.

What's the context length of OLMo 2 13B?

OLMo 2 13B supports a context window of 4,096 tokens (about 4K).

Source: huggingface.co/allenai/OLMo-2-1124-13B-Instruct

Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.

Related — keep moving

Before you buy

Verify OLMo 2 13B runs on your specific hardware before committing money.