Molmo 72B
Molmo flagship. Apache 2.0 VLM rivaling proprietary models on UI pointing and visual reasoning.
Overview
Molmo flagship. Apache 2.0 VLM rivaling proprietary models on UI pointing and visual reasoning.
How to run it
Molmo 72B is Ai2's vision-language model — 72B dense backbone with a custom vision encoder. Designed for strong visual understanding with a focus on pointing/grounding (can reference specific image regions). Run at Q4_K_M via llama.cpp with llava-server for vision. Q4_K_M file size ~41 GB (text) + ~3-5 GB (vision). Minimum VRAM: 48 GB — RTX A6000 at Q3_K_M with vision. Recommended: A100 80GB at AWQ-INT4. Throughput: ~12-20 tok/s on A6000 at Q4_K_M text-only; vision adds 1-3s encoding. Molmo's unique feature is pixel-precise pointing — it can identify regions in images by coordinates, useful for UI automation, visual QA with grounding, and robotics. Ai2's license is permissive (Apache 2.0). Ecosystem support is narrower than Llama/Qwen vision models — verify llama.cpp Molmo support. Ollama may not have Molmo — use raw llama.cpp. For serving: vLLM if Molmo is registered as a supported architecture.
Hardware guidance
Minimum: RTX A6000 48GB at Q3_K_M + vision (4K context). Recommended: A100 80GB at AWQ-INT4. VRAM math: 72B dense at Q4_K_M ≈ 41 GB. Molmo vision encoder: 3-5 GB. KV cache at 8K: ~10 GB. Total: ~54-56 GB. A6000 48GB: Q3_K_M (31 GB) + vision at 4K context. A100 80GB: comfortable for Q4 + vision + 8K. Dual RTX 4090: row-split text + vision VRAM split across cards. Mac Studio M4 Ultra 128GB: Q4_K_M + vision, 2-5 tok/s (Molmo support on Apple Silicon uncertain). Cloud: A100 at $5-10/hr. AWQ-INT4 on A100 enables 16K+ context.
What breaks first
- Molmo GGUF availability. Pre-converted Molmo GGUFs are rare. You may need to convert from hf using Ai2's conversion script. Verify GGUF or AWQ availability before provisioning hardware. 2. Pointing/grounding in local inference. Molmo's coordinate outputs rely on specific output formatting tokens. llama.cpp may not parse these correctly — verify that coordinate outputs are well-formed before trusting results. 3. Vision encoder compatibility. Molmo uses a custom vision encoder (not CLIP, not InternViT). llama.cpp's standard llava implementation may not support it without model-specific patches. 4. Apache 2.0 but verify. While Molmo is Apache 2.0 licensed, the vision encoder or training data may have additional restrictions. Check the full license on huggingface.co/allenai/Molmo-72B.
Runtime recommendation
Common beginner mistakes
Mistake: Expecting Molmo to work with standard Ollama vision commands. Fix: Molmo requires custom model registration in llama.cpp. Test with raw llama.cpp and verify the multimodal GGUF. Mistake: Ignoring the pointing/grounding output format. Fix: Molmo outputs coordinates in a specific format. Parse these explicitly — don't treat them as regular text. Mistake: Using a Llama 3.2 Vision mmproj with Molmo. Fix: Vision projectors are architecture-specific. Download or convert the Molmo-specific projector. Mistake: Assuming Molmo's text quality matches Qwen 3 72B. Fix: Molmo is optimized for vision grounding — general text quality may be lower than same-sized general-purpose models. Test text-only tasks before deploying.
Family & lineage
How this model relates to others in its lineage. Family members share architecture and training-data roots; parent / children edges record direct distillation or fine-tune relationships.
Strengths
- Apache 2.0
- Frontier UI grounding
Weaknesses
- 48GB+ VRAM tier
Quantization variants
Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.
| Quantization | File size | VRAM required |
|---|---|---|
| Q4_K_M | 41.0 GB | 48 GB |
Get the model
HuggingFace
Original weights
Source repository — direct quantization required.
Hardware that runs this
Cards with enough VRAM for at least one quantization of Molmo 72B.
Frequently asked
What's the minimum VRAM to run Molmo 72B?
Can I use Molmo 72B commercially?
What's the context length of Molmo 72B?
Does Molmo 72B support images?
Source: huggingface.co/allenai/Molmo-72B-0924
Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.
Related — keep moving
Verify Molmo 72B runs on your specific hardware before committing money.