Quant Advisor

Q4_K_M vs Q5_K_M vs Q8_0 vs FP16 — settled with math, not forum folklore. Pick a model + hardware + use case + context length; we compute the memory budget, score every quant against your quality tolerance, and show you the curve.

Quality numbers are community-reported PPL deltas vs FP16 across Llama 3 / Qwen 3 / Mistral families on WikiText-2. Approximations, hedged — see methodology.

Inputs

URL updates as you change fields — share the result by copying the URL.

Pick a model and hardware to see the recommendation.

We have 183 models and 103 hardware entries in the catalog.