What's the best coding agent for local models (Ollama / llama.cpp)?
The answer
One paragraph. No hedging beyond what the data actually warrants.
Five real options, ranked by my use:
1. Aider — terminal-native, git-aware, surgical. Reads code, proposes edits as git diffs, surgical edit-format that nails apply-cleanliness even on mid-tier local models. The killer app on the CLI side. Sweet spot: Qwen 2.5 Coder 32B on a 24GB GPU.
2. Cline — VS Code extension that runs a full agent loop locally. Plan → read → propose → ask permission → write → run → verify. Excellent permission UX. First-class Ollama support. Heavier on tokens than Aider — local models with weak context handling can struggle.
3. Continue — autocomplete + chat for VS Code and JetBrains. Open-source rival to Cursor / Copilot. Configurable to use Ollama / llama.cpp / vLLM. Default config nudges you toward local. JetBrains support is on par with VS Code — rare in this space.
4. Tabby — self-hosted coding-agent server with SSO, audit logs, team dashboards. Enterprise-leaning. Pick this when you need to deploy local AI to a team of 20+ and prove what was generated by whom.
5. Twinny — minimal-surface VS Code extension purpose-built for Ollama. Tighter integration than Continue for the autocomplete-only case, lower latency. Good "just give me Copilot but local" pick.
Model pairing matters more than the agent:
- 24GB GPU + Qwen 2.5 Coder 32B Q4_K_M → Aider / Cline / Continue all work well.
- 16GB GPU + DeepSeek Coder 6.7B Q4_K_M (FIM) → Twinny / Continue's autocomplete shines, agent-mode struggles.
- 12GB GPU + Qwen 2.5 Coder 7B Q4_K_M → autocomplete only; agent loops will fight you.
The misconception: "I'll use Cursor with my local backend." Cursor's local-backend support has historically been fragile + cloud-required even with local routing. Pick a native-local agent instead.
Explore the numbers for your specific stack
Where we got the numbers
All five agents have full editorial pages in /apps with hands-on verdicts. Model pairing thresholds from community runlocalai-bench submissions + my own runs May 2026.
Also see
Setup, sweet-spot model pairing, beginner mistakes.
Permission UX, plan mode, when it shines vs when it struggles.
The quantization decision specifically for multi-step agent loops.
Hardware + runtime + model + agent pairing that ships.
Other questions in this thread
Other /q/ landings on the same topic — same editorial discipline.
Found this via a forum search? Bookmark the URL — we update these pages as new data lands. Have a question that should live here? Open a GitHub issue.