Training & optimization
Q3_K_M Quantization
Q3_K_M is a 3-bit GGUF K-quant averaging ~3.9 bits per parameter. It's the smallest format that still produces usable output for most models.
Quality drops noticeably: perplexity is typically 0.5–1.0 points above FP16, and complex tasks (multi-step reasoning, code) show measurable degradation. For 7B–13B models, Q3_K_M is rarely worth it — drop to a smaller model at Q4_K_M instead. For 70B+ models on consumer hardware, Q3_K_M is the only path that fits in 36 GB or under.
If output starts producing word salad, the model is past its quant cliff; try Q4_K_S or Q4_K_M instead.
Related terms
See also
Reviewed by Fredoline Eruo. See our editorial policy.
Buyer guides
When it doesn't work