RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
BLK · CODING MODELScoder-specialised · HumanEval+ · MBPP+

Local coding models

Coder-specialised open-weight models that run on your own hardware — the Qwen2.5-Coder family, DeepSeek-Coder, StarCoder2, Yi-Coder, CodeLlama. Filtered by HumanEval+ and MBPP+ scores where we've benchmarked.

Models curated
30
Vendors
13
Commercial OK
30/30
Benchmarked
0/30

Coding-tuned LLMs aren't just smaller versions of general-purpose chat models — they're trained on much more code, often with fill-in-the-middle objectives, and they post-train with execution feedback. The result is dramatic per-parameter strength on coding benchmarks compared to general models at the same size.

Headline benchmark on this laptop (RTX 3080, 16GB): qwen-2.5-coder-7b-instruct scored 81.1 HumanEval+ pass@1 + 66.9 MBPP+ pass@1 — comparable to commercial models 5-10x its size. The pattern repeats across the coder family: smaller models hit much higher coding scores than their general-chat siblings.

Each row links to the model's full operator notes including the actual prompting kit, recommended quantization, and benchmark scores (HumanEval+, MBPP+) we've run. Filter by 'commercial OK' if the license matters.

FAM · DEEPSEEK

DeepSeek-based

9 models
DeepSeek V4 Pro (1.6T MoE)
1600B params · DeepSeek
▸ frontier-tier coding + reasoning serving — currently the open-weight ceiling

DeepSeek's April 2026 frontier flagship. 1.6T total / 49B active MoE with hybrid Compressed Sparse Attention + Heavily Compressed Attention. 1M context window. Closes most of the gap with Claude Opus 4.6 on coding while

License
MIT · OK
Context
1024K
DeepSeek V4 Flash (284B MoE)
284B params · DeepSeek
▸ datacenter MoE — V4 efficiency variant

The cost-efficient sibling of V4-Pro. 284B total / 13B active MoE, same hybrid CSA+HCA attention, same 1M context. The MoE active-param ratio (4.5%) makes it surprisingly fast for its nameplate size — practical on dual A

License
MIT · OK
Context
1024K
DeepSeek V4
745B params · DeepSeek AI
▸ frontier-tier reasoning on multi-machine clusters

DeepSeek's spring 2026 frontier MoE. 745B total / 38B active. The current open-weight benchmark leader on coding + math; closes the gap with closed-source flagships on reasoning.

License
DeepSeek License · OK
Context
128K
DeepSeek V2.5 236B
236B params · DeepSeek
▸ DeepSeek lineage reference — pre-V3

DeepSeek V2.5 — merged V2 chat + Coder. Pre-V3 baseline; 21B active MoE.

License
DeepSeek License · OK
Context
128K
DeepSeek R1 Distill Qwen 3 32B
32B params · DeepSeek AI
▸ workstation reasoning with Qwen 3 base improvements

Newer R1 distill on a Qwen 3 base. Combines R1 reasoning with Qwen 3's reasoning-toggle architecture. Apache 2.0.

License
Apache 2.0 · OK
Context
128K
DeepSeek R1 Distill Llama 8B
8B params · DeepSeek AI
▸ consumer-tier reasoning on 8GB+ GPUs

R1 reasoning distilled into a Llama 3 8B base. Smaller R1 distill; useful when 32B is too heavy. Reasoning quality is meaningfully below the 32B distill but still beats non-reasoning Llama 8B on math/code.

License
Apache 2.0 · OK
Context
128K
DeepSeek Coder V2 236B
236B params · DeepSeek
▸ datacenter-tier MoE coding

Full DeepSeek Coder V2. 236B total / 21B active MoE coder.

License
DeepSeek License · OK
Context
128K
DeepSeek Coder V3
33B params · DeepSeek AI
▸ workstation coding alternative to Qwen 2.5 Coder

DeepSeek's coder line successor. Dense 33B; competitive with Qwen 2.5 Coder 32B on SWE-Bench.

License
DeepSeek License · OK
Context
128K
DeepSeek V3 Lite (16B MoE)
16B params · DeepSeek AI
▸ consumer-tier MoE inference

Distillation of DeepSeek V3 to a smaller MoE. 16B total / 2.4B active. Captures most of V3's reasoning at consumer-card-friendly memory.

License
DeepSeek License · OK
Context
128K
FAM · QWEN

Qwen-based

8 models
Qwen 3.6 35B-A3B (MTP)
35B params · Alibaba / Qwen team
judged 8.0/10
▸ high-throughput MoE inference at workstation tier

Qwen 3.6 35B-A3B with Multi-Token Prediction (MTP). The "A3B" suffix means ~3B activated parameters per token via Mixture-of-Experts — inference cost stays mid-tier while total parameter count climbs to 35B. MTP enables

License
Apache-2.0 · OK
Context
256K
Qwen 3.6 27B (MTP)
27B params · Alibaba / Qwen team
judged 8.0/10
▸ dense workstation model with throughput-acceleration

Qwen 3.6 27B dense (not MoE) with Multi-Token Prediction. Sits between the 14B and 35B-A3B as a "single dense model with MTP throughput acceleration." Targets workloads where the MoE activated-param dance isn't ideal but

License
Apache-2.0 · OK
Context
128K
Qwen 2.5 Coder 14B Instruct
14B params · Alibaba
▸ 16GB-VRAM coding

Coding-specialized Qwen 2.5 at 14B. The 16GB-VRAM tier coding model — fits comfortably with 8K context.

License
Apache 2.0 · OK
Context
128K
CodeQwen 1.5 7B
7B params · Alibaba
▸ historical reference — Qwen 2.5 Coder 7B is the modern pick

CodeQwen 1.5 — Qwen Coder predecessor. Superseded by Qwen 2.5 Coder for new deployments.

License
Tongyi Qianwen L · OK
Context
64K
Qwen 3 Coder 32B
32B params · Alibaba
▸ coding-specialized agent workloads

Coding-specialized fine-tune of Qwen 3 32B. Curated coding corpus; outperforms Qwen 2.5 Coder 32B on SWE-Bench by ~6 points. Apache 2.0.

License
Apache 2.0 · OK
Context
128K
Qwen 2.5 Coder 7B Instruct
7B params · Alibaba
▸ consumer-tier coding at 8GB VRAM

Coding-specialized Qwen 2.5 at 7B. The 8-12GB-VRAM coding model — entry-tier autocomplete + IDE assistant. Smaller sibling of the 14B / 32B Coder line.

License
Apache 2.0 · OK
Context
128K
Qwen 2.5 Coder 1.5B
1.5B params · Alibaba
▸ IDE autocomplete on integrated GPUs

Smallest Qwen 2.5 Coder. Targets edge / autocomplete on integrated GPUs and Apple Silicon laptops.

License
Apache 2.0 · OK
Context
32K
Qwen 2.5 Coder 3B
3B params · Alibaba
▸ Apple Silicon laptop coding autocomplete

Compact Qwen 2.5 Coder. Sweet spot for laptop autocomplete and small refactor agents.

License
Apache 2.0 · OK
Context
32K
FAM · OTHER

Other / from-scratch

5 models
Ring-2.6-1T
1000B params · InclusionAI / Ant Group
judged 8.0/10
▸ frontier reasoning at MoE serving cost

InclusionAI's Ring-2.6-1T is a 1 trillion parameter Mixture-of-Experts model with ~32B activated parameters per token, released by Ant Group's AI research arm. Targets frontier reasoning and code at MoE serving cost. Apa

License
Apache-2.0 · OK
Context
125K
Sarvam 105B
105B params · sarvamai
judged 9.3/10
▸ Hindi and Indian-language reasoning or agentic workflows

Sarvam-105B is a Mixture-of-Experts model with 105B total parameters but only 10.3B active at inference time. It targets reasoning, coding, and agentic tasks with coverage across 22 Indian languages. Apache 2.0 licensed

License
apache-2.0 · OK
Context
125K
StarCoder 2 3B
3B params · BigCode
▸ edge-tier code completion

BigCode's StarCoder 2 at 3B. Trained on The Stack v2 with 600+ programming languages.

License
BigCode OpenRAIL · OK
Context
16K
StarCoder 2 15B
15B params · BigCode
▸ permissively-licensed coding at 16GB-VRAM

StarCoder 2 flagship. The largest BigCode coder; 16k context with strong fill-in-middle.

License
BigCode OpenRAIL · OK
Context
16K
StarCoder 2 7B
7B params · BigCode
▸ consumer-tier code completion at 8GB

Mid-size StarCoder 2. The 8GB-VRAM autocomplete pick.

License
BigCode OpenRAIL · OK
Context
16K
FAM · LLAMA

Llama-based

4 models
Salamandra 2B
2.25B params · BSC-LT
judged 9.4/10
▸ Fine-tuning base for Spanish or Catalan/Galician/Basque NLP tasks

Salamandra 2B is a base-only transformer trained from scratch by Barcelona Supercomputing Center on 12.875 trillion tokens across 35 European languages and code. At 2.25B parameters and an 8192-token context window, it i

License
apache-2.0 · OK
Context
8K
Salamandra 7B
7B params · BSC-LT
judged 9.4/10
▸ Spanish/Catalan fine-tuning base for custom NLP pipelines

Salamandra 7B is a base language model from Barcelona Supercomputing Center, pretrained on 12.875 trillion tokens across 35 European languages and code. It is not instruction-tuned — this is a raw foundation model. Apach

License
apache-2.0 · OK
Context
8K
Llama 4 70B
70B params · Meta
▸ production self-hosted serving on 2x A100 / H100

Llama 4 dense at 70B. Drop-in successor to Llama 3.3 70B; same hardware envelope, better on reasoning benchmarks.

License
Llama 4 Communit · OK
Context
128K
Phind CodeLlama 34B v2
34B params · Phind
▸ historical reference for Llama 2 coder lineage

Phind's CodeLlama-derived coder at 34B. Older release; retained for historical / continuity value. Newer Qwen Coder lineage has surpassed it.

License
Llama 2 Communit · OK
Context
16K
FAM · MISTRAL

Mistral-based

2 models
Devstral Small 2 24B
24B params · Mistral AI
▸ Apache 2.0 coding alternative to Qwen 2.5 Coder

Mistral's coding-specialized Mistral Small 2 successor. Apache 2.0 — the rare commercial-OK Mistral coder.

License
Apache 2.0 · OK
Context
128K
Codestral Mamba 7B
7B params · Mistral AI
▸ long-context coding workloads where memory matters

Mistral's Mamba (state-space) architecture coding model. Linear inference cost — the architectural alternative to attention-based coding models. Apache 2.0.

License
Apache 2.0 · OK
Context
250K
FAM · OPENCODER

opencoder

1 model
OpenCoder 8B
8B params · INFLY AI
▸ academic / reproducibility-sensitive coding research

Fully-open coding model — training data + recipes published. Apache 2.0 with verifiable open-data lineage. The right pick for academic / reproducibility-sensitive work.

License
Apache 2.0 · OK
Context
32K
FAM · YI

Yi-based

1 model
Yi Coder 9B
9B params · 01.AI
▸ 8GB-VRAM coding

01.AI's coding specialization at 9B. Apache 2.0; positioned as a lighter alternative to Qwen 2.5 Coder for the 16GB tier.

License
Apache 2.0 · OK
Context
128K
COVERAGE

Coding agent setup?

Cross-reference the model rows here with the runtime guidance at /apps for Cline, Aider, Continue, OpenInterpreter, and Claude Code adapters. The 'best coding agent for local models' Q&A digs into the actual workflows. Read the coding-agent comparison.