Llama 3.2 11B Vision Instruct
First-party multimodal Llama. Accepts images alongside text for VQA, document understanding, and chart reading. Runs on 12GB+ VRAM.
Overview
First-party multimodal Llama. Accepts images alongside text for VQA, document understanding, and chart reading. Runs on 12GB+ VRAM.
Strengths
- Strong vision-language baseline
- Document and chart understanding
Weaknesses
- EU restricted by license
- Higher VRAM than text-only 8B
Quantization variants
Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.
| Quantization | File size | VRAM required |
|---|---|---|
| Q4_K_M | 7.9 GB | 11 GB |
| Q8_0 | 12.5 GB | 16 GB |
Get the model
Ollama
One-line install
ollama run llama3.2-vision:11bRead our Ollama review →HuggingFace
Original weights
Source repository — direct quantization required.
Hardware that runs this
Cards with enough VRAM for at least one quantization of Llama 3.2 11B Vision Instruct.
Models worth comparing
Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.
Frequently asked
What's the minimum VRAM to run Llama 3.2 11B Vision Instruct?
Can I use Llama 3.2 11B Vision Instruct commercially?
What's the context length of Llama 3.2 11B Vision Instruct?
How do I install Llama 3.2 11B Vision Instruct with Ollama?
Does Llama 3.2 11B Vision Instruct support images?
Source: huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct
Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.