OLMo 2 32B

Fully-open OLMo 2. AI2 publishes the full training data, code, and weights — the most reproducible 32B model.

License: Apache 2.0·Released Apr 12, 2026·Context: 32,768 tokens

Overview

Fully-open OLMo 2. AI2 publishes the full training data, code, and weights — the most reproducible 32B model.

Strengths

Fully open (data + code + weights)
Apache 2.0
Reproducible

Weaknesses

Behind closed-data peers on some benchmarks

Quantization variants

Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.

Quantization	File size	VRAM required
Q4_K_M	19.0 GB	24 GB

Get the model

HuggingFace

Original weights

huggingface.co/allenai/OLMo-2-32B-Instruct

Source repository — direct quantization required.

Hardware that runs this

Cards with enough VRAM for at least one quantization of OLMo 2 32B.

Compare alternatives

Models worth comparing

Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.

Same tier

Models in the same parameter band as this one

Step up

More capable — bigger memory footprint

Step down

Smaller — faster, runs on weaker hardware

Frequently asked

What's the minimum VRAM to run OLMo 2 32B?

24GB of VRAM is enough to run OLMo 2 32B at the Q4_K_M quantization (file size 19.0 GB). Higher-quality quantizations need more.

Can I use OLMo 2 32B commercially?

Yes — OLMo 2 32B ships under the Apache 2.0, which permits commercial use. Always read the license text before deployment.

What's the context length of OLMo 2 32B?

OLMo 2 32B supports a context window of 32,768 tokens (about 33K).

Source: huggingface.co/allenai/OLMo-2-32B-Instruct

Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.