Pixtral 12B

Positioning

Pixtral 12B is Mistral AI's first multimodal entry, combining vision and language capabilities in a dense 12-billion-parameter model. Released under the permissive Apache 2.0 license, it offers a 131,072-token context window, making it suitable for long-document and chart understanding tasks. As a consumer-tier model, it targets operators who need a locally runnable vision-language model without restrictive licensing.

Strengths

Apache 2.0 license: Fully open for commercial use, modification, and redistribution — no restrictions beyond attribution.
Long context window: 131K tokens enables processing of lengthy documents, multi-page PDFs, or extended conversations without truncation.
Multimodal in a dense 12B package: Vision and text in a single model that fits on consumer hardware at common quantizations.
Proven vendor lineage: Built by Mistral AI, known for efficient architectures and strong community adoption.

Limitations

No community benchmarks available: Published vendor metrics should be treated as best-case; real-world performance on local hardware is unverified.
12B dense model: At FP16, requires ~24 GB of disk and significant VRAM; quantization is necessary for most consumer GPUs.
Vision capabilities unquantified: Document and chart understanding claims lack independent validation; actual accuracy on specific tasks is unknown.
KV cache overhead: With 131K context, KV cache can exceed model weights at high context lengths, demanding careful memory planning.

What it takes to run this locally

Quantized sizes range from 24 GB (FP16) down to ~3.9 GB (Q2_K). For typical use, add 30–50% for KV cache and framework overhead. A Q4_K_M (6.8 GB) or Q5_K_M (~8.6 GB) quant fits on a single 12–24 GB consumer GPU, making this a viable choice for local deployment on RTX 3090/4090-class hardware. No specific token throughput measurements are available.

Should you run this locally?

Yes if you need a permissively licensed vision-language model for local inference on consumer hardware, and you are comfortable with quantized deployment. No if you require verified benchmark scores, need to process very long contexts without memory planning, or prefer a model with extensive community performance data.

Catalog cross-links

Mistral 7B
Mistral Large
Llama 3.2 Vision

Quantization	File size	VRAM required
Q4_K_M	7.0 GB	10 GB
Q8_0	13.0 GB	16 GB

Quantization

File size

VRAM required

Q4_K_M

7.0 GB

10 GB

Q8_0

13.0 GB

16 GB

Frequently asked

What's the minimum VRAM to run Pixtral 12B?

10GB of VRAM is enough to run Pixtral 12B at the Q4_K_M quantization (file size 7.0 GB). Higher-quality quantizations need more.

Can I use Pixtral 12B commercially?

Yes — Pixtral 12B ships under the Apache 2.0, which permits commercial use. Always read the license text before deployment.

What's the context length of Pixtral 12B?

Pixtral 12B supports a context window of 131,072 tokens (about 131K).

How do I install Pixtral 12B with Ollama?

Run `ollama pull pixtral:12b` to download, then `ollama run pixtral:12b` to start a chat session. The default quantization is Q4_K_M.

Does Pixtral 12B support images?

Yes — Pixtral 12B is multimodal and accepts text + vision inputs. Vision support requires a runner that handles its image-conditioning architecture.

Our verdict

Positioning

Strengths

Limitations

What it takes to run this locally

Should you run this locally?

Catalog cross-links

Overview

Family & lineage

Strengths

Weaknesses

Quantization variants

Get the model

Ollama

HuggingFace

Hardware that runs this

Models worth comparing

Frequently asked

What's the minimum VRAM to run Pixtral 12B?

Can I use Pixtral 12B commercially?

What's the context length of Pixtral 12B?

How do I install Pixtral 12B with Ollama?

Does Pixtral 12B support images?

Related — keep moving