Pixtral 12B
Mistral's multimodal entry. 12B parameters, vision + text, Apache 2.0. Good document and chart understanding.
Positioning
Pixtral 12B is Mistral AI's first multimodal entry, combining vision and language capabilities in a dense 12-billion-parameter model. Released under the permissive Apache 2.0 license, it offers a 131,072-token context window, making it suitable for long-document and chart understanding tasks. As a consumer-tier model, it targets operators who need a locally runnable vision-language model without restrictive licensing.
Strengths
- Apache 2.0 license: Fully open for commercial use, modification, and redistribution — no restrictions beyond attribution.
- Long context window: 131K tokens enables processing of lengthy documents, multi-page PDFs, or extended conversations without truncation.
- Multimodal in a dense 12B package: Vision and text in a single model that fits on consumer hardware at common quantizations.
- Proven vendor lineage: Built by Mistral AI, known for efficient architectures and strong community adoption.
Limitations
- No community benchmarks available: Published vendor metrics should be treated as best-case; real-world performance on local hardware is unverified.
- 12B dense model: At FP16, requires ~24 GB of disk and significant VRAM; quantization is necessary for most consumer GPUs.
- Vision capabilities unquantified: Document and chart understanding claims lack independent validation; actual accuracy on specific tasks is unknown.
- KV cache overhead: With 131K context, KV cache can exceed model weights at high context lengths, demanding careful memory planning.
What it takes to run this locally
Quantized sizes range from 24 GB (FP16) down to ~3.9 GB (Q2_K). For typical use, add 30–50% for KV cache and framework overhead. A Q4_K_M (6.8 GB) or Q5_K_M (~8.6 GB) quant fits on a single 12–24 GB consumer GPU, making this a viable choice for local deployment on RTX 3090/4090-class hardware. No specific token throughput measurements are available.
Should you run this locally?
Yes if you need a permissively licensed vision-language model for local inference on consumer hardware, and you are comfortable with quantized deployment. No if you require verified benchmark scores, need to process very long contexts without memory planning, or prefer a model with extensive community performance data.
Catalog cross-links
- Mistral 7B
- Mistral Large
- Llama 3.2 Vision
Overview
Mistral's multimodal entry. 12B parameters, vision + text, Apache 2.0. Good document and chart understanding.
Family & lineage
How this model relates to others in its lineage. Family members share architecture and training-data roots; parent / children edges record direct distillation or fine-tune relationships.
Strengths
- Apache 2.0
- Multimodal
- 12B fits 16GB
Weaknesses
- Vision quality below Qwen-VL family
Quantization variants
Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.
| Quantization | File size | VRAM required |
|---|---|---|
| Q4_K_M | 7.0 GB | 10 GB |
| Q8_0 | 13.0 GB | 16 GB |
Get the model
Ollama
One-line install
ollama run pixtral:12bRead our Ollama review →HuggingFace
Original weights
Source repository — direct quantization required.
Hardware that runs this
Cards with enough VRAM for at least one quantization of Pixtral 12B.
Models worth comparing
Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.
Frequently asked
What's the minimum VRAM to run Pixtral 12B?
Can I use Pixtral 12B commercially?
What's the context length of Pixtral 12B?
How do I install Pixtral 12B with Ollama?
Does Pixtral 12B support images?
Source: huggingface.co/mistralai/Pixtral-12B-2409
Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.
Related — keep moving
Verify Pixtral 12B runs on your specific hardware before committing money.