Local coding models

Coding-tuned LLMs aren't just smaller versions of general-purpose chat models — they're trained on much more code, often with fill-in-the-middle objectives, and they post-train with execution feedback. The result is dramatic per-parameter strength on coding benchmarks compared to general models at the same size.

Headline benchmark on this laptop (RTX 3080, 16GB): qwen-2.5-coder-7b-instruct scored 81.1 HumanEval+ pass@1 + 66.9 MBPP+ pass@1 — comparable to commercial models 5-10x its size. The pattern repeats across the coder family: smaller models hit much higher coding scores than their general-chat siblings.

Each row links to the model's full operator notes including the actual prompting kit, recommended quantization, and benchmark scores (HumanEval+, MBPP+) we've run. Filter by 'commercial OK' if the license matters.

DeepSeek-based

Qwen-based

Other / from-scratch

Llama-based

Mistral-based

opencoder

Yi-based

Coding agent setup?