Llama 3.2 90B Vision Instruct
The 90B vision Llama. Best-in-class first-party multimodal open weight at the time of release. Workstation-class only.
Positioning
Llama 3.2 90B Vision Instruct is Meta's first-party multimodal extension of the Llama 3.2 family, adding native vision understanding to the dense 90B-parameter architecture. Released under the Llama 3.2 Community License, it targets operators who need a permissive, open-weight vision-language model at the 70B-class scale. With a 131,072-token context window, it is designed for long-context multimodal tasks such as document analysis, video understanding, and complex visual reasoning. At 90B dense parameters, this is a datacenter-class model: the FP16 checkpoint alone is ~180 GB, and even aggressive quantization requires workstation-grade hardware.
Strengths
- First-party multimodal Llama: As a Meta release, this model benefits from the same training data, safety mitigations, and ecosystem support as the text-only Llama 3.2 models, making it a natural choice for operators already invested in the Llama stack.
- Massive context window: 131,072 tokens of context enables processing of long documents, high-resolution image sequences, or extended video clips without truncation — a significant advantage over many open-weight VLMs with shorter contexts.
- Permissive commercial license: The Llama 3.2 Community License allows for commercial use, including fine-tuning and deployment, with only usage-based restrictions for very large-scale applications (monthly active users thresholds).
- Quantization flexibility: With quantized sizes ranging from Q8_0 (96 GB) down to Q2_K (29.3 GB), operators can trade off precision for hardware fit. The Q4_K_M variant (~50.6 GB) offers a practical balance for dual-GPU workstation setups.
Limitations
- Datacenter-only at full precision: The FP16 checkpoint requires ~180 GB of GPU memory, plus substantial overhead for KV cache (add ~30-50% at typical context lengths). This effectively limits full-precision inference to multi-GPU datacenter nodes (e.g., 4× A100 80GB or 2× H100).
- No community benchmarks yet: As a recent release, we lack independent, community-verified performance numbers for this model. Operators should treat vendor-published metrics as best-case and plan for their own evaluation.
- Dense architecture at 90B: Unlike Mixture-of-Experts models that activate only a fraction of parameters per token, Llama 3.2 90B is dense — every forward pass uses all 90B parameters. This means inference cost scales linearly with parameter count, making it more expensive per token than an MoE model of similar total size.
- Vision modality adds complexity: Running vision-language models requires additional preprocessing (image encoding) and often larger batch sizes for throughput. The vision encoder itself consumes memory and compute, further increasing hardware demands beyond the language model alone.
What it takes to run this locally
At FP16, the model requires ~180 GB of GPU memory just for weights. Adding KV cache and framework overhead (typically 30-50% at 131K context) pushes total memory beyond 250 GB. This places full-precision inference firmly in the datacenter class: 4× A100 80GB or 2× H100 80GB are the minimum viable configurations.
Quantization reduces the memory footprint significantly:
- Q8_0: ~96 GB weights → ~125-145 GB total → still requires 2× A100 80GB or 4× A6000 48GB.
- Q4_K_M: ~50.6 GB weights → ~66-76 GB total → fits on a single A100 80GB or 2× RTX 6000 Ada 48GB (with careful context management).
- Q2_K: ~29.3 GB weights → ~38-44 GB total → possible on a single 48GB workstation GPU (e.g., RTX A6000) but with significant quality loss.
For practical deployment, a workstation with 2× 48GB GPUs (e.g., RTX 6000 Ada) running Q4_K_M is the most accessible path, while consumer hardware (single 24GB GPU) is not viable even at Q2_K due to memory constraints.
Should you run this locally?
Yes if you need a permissively licensed, first-party multimodal Llama model for commercial deployment and have access to datacenter or high-end workstation GPUs (2× 48GB or better). The 131K context window is a strong differentiator for long-document or video analysis tasks.
No if you are limited to consumer hardware (single 24GB GPU) or need fast, low-cost inference. The dense 90B architecture is expensive to run, and smaller VLMs (e.g., 7B-13B class) may be more practical. Also, if you require community-verified benchmarks before committing, wait for independent evaluations.
Catalog cross-links
- Llama 3.2 11B Vision Instruct — smaller sibling for consumer-grade vision tasks
- Llama 3.1 70B Instruct — text-only dense 70B for comparison
- A100 80GB — recommended datacenter GPU for this model
- RTX 6000 Ada — workstation GPU capable of Q4_K_M deployment
Overview
The 90B vision Llama. Best-in-class first-party multimodal open weight at the time of release. Workstation-class only.
Family & lineage
How this model relates to others in its lineage. Family members share architecture and training-data roots; parent / children edges record direct distillation or fine-tune relationships.
Strengths
- Top-tier open-weight vision quality
- 128K context
Weaknesses
- Needs 60GB+ VRAM
- EU restricted
Quantization variants
Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.
| Quantization | File size | VRAM required |
|---|---|---|
| Q4_K_M | 51.0 GB | 60 GB |
Get the model
Ollama
One-line install
ollama run llama3.2-vision:90bRead our Ollama review →HuggingFace
Original weights
Source repository — direct quantization required.
Hardware that runs this
Cards with enough VRAM for at least one quantization of Llama 3.2 90B Vision Instruct.
Models worth comparing
Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.
Frequently asked
What's the minimum VRAM to run Llama 3.2 90B Vision Instruct?
Can I use Llama 3.2 90B Vision Instruct commercially?
What's the context length of Llama 3.2 90B Vision Instruct?
How do I install Llama 3.2 90B Vision Instruct with Ollama?
Does Llama 3.2 90B Vision Instruct support images?
Source: huggingface.co/meta-llama/Llama-3.2-90B-Vision-Instruct
Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.
Related — keep moving
Verify Llama 3.2 90B Vision Instruct runs on your specific hardware before committing money.