Kimi K2.7-Code
Kimi K2.7-Code is a coding-specialized open-weight Mixture-of-Experts model from Moonshot AI (Hugging Face `moonshotai/Kimi-K2.7-Code`, 2026-06), built on K2.6 with ~1T total / ~32B active parameters and a ~256K-token context. It uses a forced-thinking mode and is reported to use ~30% fewer reasoning tokens than K2.6. Released under a Modified MIT License (commercial use permitted; attribution required only above 100M MAU / $20M monthly revenue — a continuation of K2.6's terms, not a switch from Apache 2.0). Self-hosting is steep (~577GB VRAM at INT4). All benchmark figures are Moonshot in-house and not independently verified at release. (Core specs corroborated across multiple secondary sources; the HF card could not be fetched directly during verification.)
Positioning
Kimi K2.7-Code is Moonshot AI's June 2026 coding specialist — a ~1T-parameter / ~32B-active MoE built on Kimi K2.6, tuned for long-horizon software engineering with a 256K context and a forced-thinking mode.
What stands out
The efficiency angle is the pitch: Moonshot reports it reaches strong agentic-coding scores while using ~30% fewer reasoning tokens than K2.6 — meaningful if you pay per token or run it locally. The Modified MIT license is genuinely permissive (attribution required only above 100M MAU / $20M monthly revenue), and it is a continuation of K2.6's terms, not a switch from Apache 2.0.
Honest caveats
All numbers are Moonshot in-house with no independent SWE-bench at release, and we have not reproduced them. Self-hosting is steep — roughly 577 GB VRAM at INT4, i.e. multi-GPU server-class; most users will reach it via API or vLLM on a cluster. (Core specs here are corroborated across multiple sources; we could not fetch the HF card directly during verification.)
Verdict
Run it if you want a permissively-licensed 1T-class coding model for agentic software work and you have server-class hardware or use a host. Skip local self-hosting on anything smaller — the footprint is enormous. For coding specifically it is one of the strongest open options of the month, with the usual caveat that the benchmarks are vendor-only.
Overview
Kimi K2.7-Code is a coding-specialized open-weight Mixture-of-Experts model from Moonshot AI (Hugging Face `moonshotai/Kimi-K2.7-Code`, 2026-06), built on K2.6 with ~1T total / ~32B active parameters and a ~256K-token context. It uses a forced-thinking mode and is reported to use ~30% fewer reasoning tokens than K2.6. Released under a Modified MIT License (commercial use permitted; attribution required only above 100M MAU / $20M monthly revenue — a continuation of K2.6's terms, not a switch from Apache 2.0). Self-hosting is steep (~577GB VRAM at INT4). All benchmark figures are Moonshot in-house and not independently verified at release. (Core specs corroborated across multiple secondary sources; the HF card could not be fetched directly during verification.)
Strengths
Weaknesses
Quantization variants
Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.
| Quantization | File size | VRAM required |
|---|
Get the model
HuggingFace
Original weights
Source repository — direct quantization required.
Hardware that runs this
Cards with enough VRAM for at least one quantization of Kimi K2.7-Code.
Models worth comparing
Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.
Frequently asked
Can I use Kimi K2.7-Code commercially?
What's the context length of Kimi K2.7-Code?
Source: huggingface.co/moonshotai/Kimi-K2.7-Code
Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.
Related — keep moving
Verify Kimi K2.7-Code runs on your specific hardware before committing money.