mistral
12B parameters
Commercial OK
Multimodal
Reviewed June 2026

Pixtral 12B

Mistral's multimodal entry. 12B parameters, vision + text, Apache 2.0. Good document and chart understanding.

License: Apache 2.0·Released Sep 17, 2024·Context: 131,072 tokens
BLK · VERDICT

Our verdict

OP · Fredoline Eruo|VERIFIED JUN 12, 2026
unrated

Positioning

Pixtral 12B is Mistral AI's first multimodal entry, combining vision and language capabilities in a dense 12-billion-parameter model. Released under the permissive Apache 2.0 license, it offers a 131,072-token context window, making it suitable for long-document and chart understanding tasks. As a consumer-tier model, it targets operators who need a locally runnable vision-language model without restrictive licensing.

Strengths

  • Apache 2.0 license: Fully open for commercial use, modification, and redistribution — no restrictions beyond attribution.
  • Long context window: 131K tokens enables processing of lengthy documents, multi-page PDFs, or extended conversations without truncation.
  • Multimodal in a dense 12B package: Vision and text in a single model that fits on consumer hardware at common quantizations.
  • Proven vendor lineage: Built by Mistral AI, known for efficient architectures and strong community adoption.

Limitations

  • No community benchmarks available: Published vendor metrics should be treated as best-case; real-world performance on local hardware is unverified.
  • 12B dense model: At FP16, requires ~24 GB of disk and significant VRAM; quantization is necessary for most consumer GPUs.
  • Vision capabilities unquantified: Document and chart understanding claims lack independent validation; actual accuracy on specific tasks is unknown.
  • KV cache overhead: With 131K context, KV cache can exceed model weights at high context lengths, demanding careful memory planning.

What it takes to run this locally

Quantized sizes range from 24 GB (FP16) down to ~3.9 GB (Q2_K). For typical use, add 30–50% for KV cache and framework overhead. A Q4_K_M (6.8 GB) or Q5_K_M (~8.6 GB) quant fits on a single 12–24 GB consumer GPU, making this a viable choice for local deployment on RTX 3090/4090-class hardware. No specific token throughput measurements are available.

Should you run this locally?

Yes if you need a permissively licensed vision-language model for local inference on consumer hardware, and you are comfortable with quantized deployment. No if you require verified benchmark scores, need to process very long contexts without memory planning, or prefer a model with extensive community performance data.

Catalog cross-links

  • Mistral 7B
  • Mistral Large
  • Llama 3.2 Vision

Overview

Mistral's multimodal entry. 12B parameters, vision + text, Apache 2.0. Good document and chart understanding.

Family & lineage

How this model relates to others in its lineage. Family members share architecture and training-data roots; parent / children edges record direct distillation or fine-tune relationships.

Strengths

  • Apache 2.0
  • Multimodal
  • 12B fits 16GB

Weaknesses

  • Vision quality below Qwen-VL family

Quantization variants

Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.

QuantizationFile sizeVRAM required
Q4_K_M7.0 GB10 GB
Q8_013.0 GB16 GB

Get the model

Ollama

One-line install

ollama run pixtral:12bRead our Ollama review →

HuggingFace

Original weights

huggingface.co/mistralai/Pixtral-12B-2409

Source repository — direct quantization required.

Hardware that runs this

Cards with enough VRAM for at least one quantization of Pixtral 12B.

Compare alternatives

Models worth comparing

Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.

Frequently asked

What's the minimum VRAM to run Pixtral 12B?

10GB of VRAM is enough to run Pixtral 12B at the Q4_K_M quantization (file size 7.0 GB). Higher-quality quantizations need more.

Can I use Pixtral 12B commercially?

Yes — Pixtral 12B ships under the Apache 2.0, which permits commercial use. Always read the license text before deployment.

What's the context length of Pixtral 12B?

Pixtral 12B supports a context window of 131,072 tokens (about 131K).

How do I install Pixtral 12B with Ollama?

Run `ollama pull pixtral:12b` to download, then `ollama run pixtral:12b` to start a chat session. The default quantization is Q4_K_M.

Does Pixtral 12B support images?

Yes — Pixtral 12B is multimodal and accepts text + vision inputs. Vision support requires a runner that handles its image-conditioning architecture.

Source: huggingface.co/mistralai/Pixtral-12B-2409

Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.

Related — keep moving

Before you buy

Verify Pixtral 12B runs on your specific hardware before committing money.