Kimi K1.5
Moonshot's reasoning model. Reasoning-token emission with very long thinking-block depth — sometimes 5000+ tokens per query. Strong on math; restricted commercial license.
Overview
Moonshot's reasoning model. Reasoning-token emission with very long thinking-block depth — sometimes 5000+ tokens per query. Strong on math; restricted commercial license.
Strengths
- Deep reasoning at frontier scale
- Strong math benchmarks
Weaknesses
- Long reasoning blocks add wall-clock cost
- Restricted license
Quantization variants
Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.
| Quantization | File size | VRAM required |
|---|---|---|
| AWQ-INT4 | 115.0 GB | 140 GB |
Get the model
HuggingFace
Original weights
Source repository — direct quantization required.
Hardware that runs this
Cards with enough VRAM for at least one quantization of Kimi K1.5.
Models worth comparing
Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.
Frequently asked
What's the minimum VRAM to run Kimi K1.5?
Can I use Kimi K1.5 commercially?
What's the context length of Kimi K1.5?
Source: huggingface.co/moonshotai/Kimi-K1.5
Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.