What's the best coding agent for local models (Ollama / llama.cpp)?

The answer

One paragraph. No hedging beyond what the data actually warrants.

Five real options, ranked by my use:

1. Aider — terminal-native, git-aware, surgical. Reads code, proposes edits as git diffs, surgical edit-format that nails apply-cleanliness even on mid-tier local models. The killer app on the CLI side. Sweet spot: Qwen 2.5 Coder 32B on a 24GB GPU.

2. Cline — VS Code extension that runs a full agent loop locally. Plan → read → propose → ask permission → write → run → verify. Excellent permission UX. First-class Ollama support. Heavier on tokens than Aider — local models with weak context handling can struggle.

3. Continue — autocomplete + chat for VS Code and JetBrains. Open-source rival to Cursor / Copilot. Configurable to use Ollama / llama.cpp / vLLM. Default config nudges you toward local. JetBrains support is on par with VS Code — rare in this space.

4. Tabby — self-hosted coding-agent server with SSO, audit logs, team dashboards. Enterprise-leaning. Pick this when you need to deploy local AI to a team of 20+ and prove what was generated by whom.

5. Twinny — minimal-surface VS Code extension purpose-built for Ollama. Tighter integration than Continue for the autocomplete-only case, lower latency. Good "just give me Copilot but local" pick.

Model pairing matters more than the agent:

24GB GPU + Qwen 2.5 Coder 32B Q4_K_M → Aider / Cline / Continue all work well.
16GB GPU + DeepSeek Coder 6.7B Q4_K_M (FIM) → Twinny / Continue's autocomplete shines, agent-mode struggles.
12GB GPU + Qwen 2.5 Coder 7B Q4_K_M → autocomplete only; agent loops will fight you.

The misconception: "I'll use Cursor with my local backend." Cursor's local-backend support has historically been fragile + cloud-required even with local routing. Pick a native-local agent instead.

Explore the numbers for your specific stack

Open the /apps coding-agent directory →

All 5 coding agents with full editorial verdicts, pros, cons, runtime compatibility, minimum VRAM hint.

Where we got the numbers

All five agents have full editorial pages in /apps with hands-on verdicts. Model pairing thresholds from community runlocalai-bench submissions + my own runs May 2026.

Also see

Aider — full editorial verdict →

Setup, sweet-spot model pairing, beginner mistakes.

Cline — full editorial verdict →

Permission UX, plan mode, when it shines vs when it struggles.

Q4 vs Q6 for coding agents? →

The quantization decision specifically for multi-step agent loops.

Local coding agent stack →

Hardware + runtime + model + agent pairing that ships.

What's the best coding agent for local models (Ollama / llama.cpp)?

The answer

Explore the numbers for your specific stack

Where we got the numbers

Also see

Other questions in this thread