other
8B parameters
Commercial OK
Multimodal
Reviewed June 2026

Molmo 7B-D

AI2's fully-open VLM. Trained on PixMo dataset; pointing capability for UI grounding.

License: Apache 2.0·Released Sep 25, 2024·Context: 4,096 tokens
BLK · VERDICT

Our verdict

OP · Fredoline Eruo|VERIFIED JUN 12, 2026
unrated

Positioning

Molmo 7B-D is an 8-billion-parameter dense vision-language model (VLM) released by the Allen Institute for AI (AI2) under the permissive Apache 2.0 license. It is trained on the PixMo dataset and emphasizes UI grounding and pointing capabilities, making it a strong candidate for open-research applications that require visual understanding and interaction with graphical interfaces. As a fully open-weight model with a 4,096-token context window, it occupies a distinct niche among VLMs that prioritize transparency and reproducibility over proprietary training data.

Strengths

  • Fully open license: Apache 2.0 permits commercial use, modification, and redistribution without restrictions, making it ideal for research and product integration.
  • UI grounding focus: Trained specifically on the PixMo dataset for pointing and interface understanding, which is rare among open VLMs.
  • Dense architecture: Unlike mixture-of-experts models, the dense 8B-parameter design simplifies deployment and memory planning.
  • Consumer-friendly size: At Q4_K_M quantization (~4.5 GB on disk), the model fits comfortably on a single consumer GPU with 8–12 GB VRAM after accounting for KV cache overhead.

Limitations

  • Short context window: 4,096 tokens limits the model's ability to process long documents or multi-turn conversations with large image contexts.
  • No community benchmarks yet: We do not have independent measurements of performance on standard VLM tasks; published vendor metrics should be treated as best-case.
  • Narrow training focus: The PixMo dataset emphasizes UI grounding, which may not generalize well to other visual domains (e.g., natural scenes, medical imaging).
  • Quantization overhead: At lower bit widths (Q3_K_M, Q2_K), quality degradation is possible, and we lack data on how quantization affects pointing accuracy.

What it takes to run this locally

Molmo 7B-D's quantized sizes range from 16 GB (FP16) down to ~2.6 GB (Q2_K). For practical deployment, add 30–50% for KV cache and framework overhead. A Q4_K_M (4.5 GB) or Q5_K_M (~5.7 GB) quant fits on a single consumer GPU with 8–12 GB VRAM (e.g., RTX 3060/4060). For FP16 inference, a workstation GPU with 24 GB (e.g., RTX 3090/4090) is recommended. No specific tokens-per-second claims are available.

Should you run this locally?

Yes if you need an open VLM for UI grounding research, want a permissive license for commercial deployment, or have a single consumer GPU with at least 8 GB VRAM and can tolerate a 4K context window.

No if your use case requires long-context understanding, high accuracy on general visual tasks, or you rely on community-verified benchmark scores to evaluate model quality.

Catalog cross-links

  • AI2 OLMo 7B – AI2's dense language model companion.
  • Qwen2-VL 7B – Another open VLM with longer context.
  • Consumer GPU Guide – Hardware recommendations for 8–12 GB VRAM cards.

Overview

AI2's fully-open VLM. Trained on PixMo dataset; pointing capability for UI grounding.

Family & lineage

How this model relates to others in its lineage. Family members share architecture and training-data roots; parent / children edges record direct distillation or fine-tune relationships.

Family siblings (molmo)
Molmo 7B-D8B
You are here
Molmo 72B72B
Datacenter
Distilled / fine-tuned from this

Strengths

  • Fully-open data + weights
  • UI pointing / grounding

Weaknesses

  • Smaller community than LLaVA / Qwen-VL

Quantization variants

Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.

QuantizationFile sizeVRAM required
Q4_K_M5.2 GB8 GB

Get the model

HuggingFace

Original weights

huggingface.co/allenai/Molmo-7B-D-0924

Source repository — direct quantization required.

Hardware that runs this

Cards with enough VRAM for at least one quantization of Molmo 7B-D.

Compare alternatives

Models worth comparing

Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.

Frequently asked

What's the minimum VRAM to run Molmo 7B-D?

8GB of VRAM is enough to run Molmo 7B-D at the Q4_K_M quantization (file size 5.2 GB). Higher-quality quantizations need more.

Can I use Molmo 7B-D commercially?

Yes — Molmo 7B-D ships under the Apache 2.0, which permits commercial use. Always read the license text before deployment.

What's the context length of Molmo 7B-D?

Molmo 7B-D supports a context window of 4,096 tokens (about 4K).

Does Molmo 7B-D support images?

Yes — Molmo 7B-D is multimodal and accepts text + vision inputs. Vision support requires a runner that handles its image-conditioning architecture.

Source: huggingface.co/allenai/Molmo-7B-D-0924

Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.

Related — keep moving

Alternatives
Before you buy

Verify Molmo 7B-D runs on your specific hardware before committing money.