Coder-specialised open-weight models that run on your own hardware — the Qwen2.5-Coder family, DeepSeek-Coder, StarCoder2, Yi-Coder, CodeLlama. Filtered by HumanEval+ and MBPP+ scores where we've benchmarked.
Coding-tuned LLMs aren't just smaller versions of general-purpose chat models — they're trained on much more code, often with fill-in-the-middle objectives, and they post-train with execution feedback. The result is dramatic per-parameter strength on coding benchmarks compared to general models at the same size.
Headline benchmark on this laptop (RTX 3080, 16GB): qwen-2.5-coder-7b-instruct scored 81.1 HumanEval+ pass@1 + 66.9 MBPP+ pass@1 — comparable to commercial models 5-10x its size. The pattern repeats across the coder family: smaller models hit much higher coding scores than their general-chat siblings.
Each row links to the model's full operator notes including the actual prompting kit, recommended quantization, and benchmark scores (HumanEval+, MBPP+) we've run. Filter by 'commercial OK' if the license matters.
DeepSeek's April 2026 frontier flagship. 1.6T total / 49B active MoE with hybrid Compressed Sparse Attention + Heavily Compressed Attention. 1M context window. Closes most of the gap with Claude Opus 4.6 on coding while
The cost-efficient sibling of V4-Pro. 284B total / 13B active MoE, same hybrid CSA+HCA attention, same 1M context. The MoE active-param ratio (4.5%) makes it surprisingly fast for its nameplate size — practical on dual A
DeepSeek's spring 2026 frontier MoE. 745B total / 38B active. The current open-weight benchmark leader on coding + math; closes the gap with closed-source flagships on reasoning.
DeepSeek V2.5 — merged V2 chat + Coder. Pre-V3 baseline; 21B active MoE.
Newer R1 distill on a Qwen 3 base. Combines R1 reasoning with Qwen 3's reasoning-toggle architecture. Apache 2.0.
R1 reasoning distilled into a Llama 3 8B base. Smaller R1 distill; useful when 32B is too heavy. Reasoning quality is meaningfully below the 32B distill but still beats non-reasoning Llama 8B on math/code.
Full DeepSeek Coder V2. 236B total / 21B active MoE coder.
DeepSeek's coder line successor. Dense 33B; competitive with Qwen 2.5 Coder 32B on SWE-Bench.
Distillation of DeepSeek V3 to a smaller MoE. 16B total / 2.4B active. Captures most of V3's reasoning at consumer-card-friendly memory.
Qwen 3.6 35B-A3B with Multi-Token Prediction (MTP). The "A3B" suffix means ~3B activated parameters per token via Mixture-of-Experts — inference cost stays mid-tier while total parameter count climbs to 35B. MTP enables
Qwen 3.6 27B dense (not MoE) with Multi-Token Prediction. Sits between the 14B and 35B-A3B as a "single dense model with MTP throughput acceleration." Targets workloads where the MoE activated-param dance isn't ideal but
Coding-specialized Qwen 2.5 at 14B. The 16GB-VRAM tier coding model — fits comfortably with 8K context.
CodeQwen 1.5 — Qwen Coder predecessor. Superseded by Qwen 2.5 Coder for new deployments.
Coding-specialized fine-tune of Qwen 3 32B. Curated coding corpus; outperforms Qwen 2.5 Coder 32B on SWE-Bench by ~6 points. Apache 2.0.
Coding-specialized Qwen 2.5 at 7B. The 8-12GB-VRAM coding model — entry-tier autocomplete + IDE assistant. Smaller sibling of the 14B / 32B Coder line.
Smallest Qwen 2.5 Coder. Targets edge / autocomplete on integrated GPUs and Apple Silicon laptops.
Compact Qwen 2.5 Coder. Sweet spot for laptop autocomplete and small refactor agents.
InclusionAI's Ring-2.6-1T is a 1 trillion parameter Mixture-of-Experts model with ~32B activated parameters per token, released by Ant Group's AI research arm. Targets frontier reasoning and code at MoE serving cost. Apa
Sarvam-105B is a Mixture-of-Experts model with 105B total parameters but only 10.3B active at inference time. It targets reasoning, coding, and agentic tasks with coverage across 22 Indian languages. Apache 2.0 licensed
BigCode's StarCoder 2 at 3B. Trained on The Stack v2 with 600+ programming languages.
StarCoder 2 flagship. The largest BigCode coder; 16k context with strong fill-in-middle.
Mid-size StarCoder 2. The 8GB-VRAM autocomplete pick.
Salamandra 2B is a base-only transformer trained from scratch by Barcelona Supercomputing Center on 12.875 trillion tokens across 35 European languages and code. At 2.25B parameters and an 8192-token context window, it i
Salamandra 7B is a base language model from Barcelona Supercomputing Center, pretrained on 12.875 trillion tokens across 35 European languages and code. It is not instruction-tuned — this is a raw foundation model. Apach
Llama 4 dense at 70B. Drop-in successor to Llama 3.3 70B; same hardware envelope, better on reasoning benchmarks.
Phind's CodeLlama-derived coder at 34B. Older release; retained for historical / continuity value. Newer Qwen Coder lineage has surpassed it.
Mistral's coding-specialized Mistral Small 2 successor. Apache 2.0 — the rare commercial-OK Mistral coder.
Mistral's Mamba (state-space) architecture coding model. Linear inference cost — the architectural alternative to attention-based coding models. Apache 2.0.
Cross-reference the model rows here with the runtime guidance at /apps for Cline, Aider, Continue, OpenInterpreter, and Claude Code adapters. The 'best coding agent for local models' Q&A digs into the actual workflows. Read the coding-agent comparison.