Local AI tools
46 tools reviewed. Runners, GUIs, and servers for every workflow.
Stable Diffusion WebUI (AUTOMATIC1111)
The original Stable Diffusion frontend. Less actively developed in 2026 than ComfyUI but still has the cleanest UX for simple gen.
Ollama
The default first-pull tool for local AI. One-line model installs (`ollama run llama3.1`), an OpenAI-compatible HTTP API, good defaults out of the box. Built on llama.cpp.
LangChain
Python/JS framework for chains, agents, and RAG. Batteries-included but heavyweight; many graduate to LangGraph or DIY.
llama.cpp
The bedrock of local LLM inference. Most other tools wrap or embed it. Maximum control, maximum platform support, sharpest learning curve.
Open WebUI
Self-hosted ChatGPT-style web frontend. Pairs with Ollama or any OpenAI-compatible backend. Multi-user, RAG built in, fast.
GPT4All
One of the original local-LLM apps from Nomic. Privacy-focused, runs on CPU, decent model library. Pace of development has slowed compared to Jan/Msty.
ComfyUI
Node-graph image-generation UI. Standard for Stable Diffusion and Flux workflows. Endlessly customizable.
Zed (with AI)
High-performance native editor from the Atom team, with built-in AI panel and inline assistant. BYO API key for any provider.
Open Interpreter
Lets LLMs execute code locally — Python, shell, AppleScript. The original 'Code Interpreter on your machine'. Useful for automation tasks.
Cline
VS Code extension agent — ~4M installs in 2026. Plan/Act mode, autonomous file edits with diff approval, terminal access. The leading open-source IDE agent.
vLLM
High-throughput serving engine. PagedAttention, continuous batching, prefix caching. Production default for self-hosted LLM APIs at scale.
Text Generation WebUI (oobabooga)
The 'AUTOMATIC1111 of LLMs'. Kitchen-sink Gradio UI with multi-backend support and a big extension ecosystem.
LlamaIndex
Python/JS framework focused on RAG and document indexing. Cleaner than LangChain for retrieval-heavy use cases.
Unsloth
2x faster QLoRA fine-tuning with hand-tuned Triton kernels. Free OSS for single-GPU; commercial Pro for multi-GPU.
AnythingLLM
Document-oriented LLM frontend with workspaces. Connects to Ollama, LM Studio, OpenAI, Anthropic, etc. Strong document RAG.
Claude Code
Anthropic's terminal-native coding agent. Tops SWE-bench Verified at 87.6% and SWE-bench Pro at 64.3% in 2026. Deep MCP integration, agentic file editing, and a $20/mo Pro tier are the standout signals.
Jan
Open-source desktop ChatGPT alternative. Privacy-first, runs offline, supports Hugging Face import.
Aider
Terminal-based AI pair programmer. Run in your project directory, describe a change, it edits files and creates meaningful git commits. Works with any LLM — local Ollama, Anthropic, OpenAI, etc.
Continue
Open-source VS Code and JetBrains assistant. Configurable autocomplete + chat + agent modes. Strong with local Ollama backends.
Llamafile
Mozilla's single-binary llama.cpp distribution. Download one file, run on any OS without dependencies.
Roo Code
Cline fork that ships features faster — diff-based editing reduces per-task token cost ~30%. Multiple specialized modes (Architect, Code, Debug).
Codex CLI
Open-source CLI client for the new Codex agent. Local CLI that orchestrates cloud Codex models against your file tree.
NVIDIA TensorRT-LLM
NVIDIA's optimized inference path for Hopper, Ada, and Blackwell. Compile your model once, serve at peak hardware speed.
OpenCode
Open-source terminal coding agent built by the SST team. TUI-first, BYO LLM, MCP-compatible. A Claude-Code-style workflow without the Anthropic lock-in.
Axolotl
YAML-config fine-tuning framework. Reference toolkit for the open fine-tuning community (Hermes, Dolphin, etc. all use it).
Text Generation Inference (TGI)
HuggingFace's production inference server. Slightly behind vLLM on raw throughput but tighter integration with the HF ecosystem.
Kilo Code
VS Code agent — 1.5M users in 2026, supports 500+ models, charges zero markup over upstream API costs. Cline lineage with Roo Code's diff approach.
Pinokio
Browser-style app launcher for AI tools. One-click installs of ComfyUI, oobabooga, RVC, and many other AI apps.
KoboldCPP
Single-file llama.cpp distribution focused on roleplay and creative writing. Bundles a web UI, image gen, and the Kobold API.
ExLlamaV2
GPU-only inference library optimized for consumer NVIDIA cards. Fastest tokens-per-second on a single 24GB card for 30B models in EXL2 quant.
MLX-LM
Apple's Metal-native ML framework's LLM runner. Now competitive with llama.cpp Metal on M-series silicon, with better long-context performance.
Sourcegraph Cody
Sourcegraph's AI assistant. Strong at large-codebase context retrieval thanks to the underlying Sourcegraph index.
Hugging Face Hub CLI
The CLI for the world's model hub. `hf download`, `hf upload`, model card editing.
OpenClaw Gateway
Open-source LLM gateway with multi-provider fallbacks. Sits between an agent and many LLM providers (Anthropic, OpenAI, Google, local Ollama) so you can fail over and load-balance.
Pi (Inflection AI)
Inflection AI's consumer assistant — voice-first, conversational, designed for personal use rather than coding. Powered by Inflection-2.5.
Cursor
Anysphere's AI-native IDE. Forks VS Code with Cursor Tab inline completion, agentic chat, and background agents. Best 'flow' for inline completion in 2026.
Claude Desktop
Anthropic's official desktop app for Claude. Native MCP server support means you can plug in local file access, GitHub, and custom tools. Distinct from the Claude Code CLI.
OpenAI Codex
OpenAI's 2025 coding agent (the new Codex, distinct from the deprecated 2021 model). Cloud task-runner pattern: hand it a multi-step task, it works in a sandbox and returns a PR.
LM Studio
Polished desktop GUI for local LLMs. Built-in HuggingFace search, OpenAI-compatible local server, side-by-side conversations.
Windsurf (Codeium)
Codeium's AI-native IDE (formerly known as Codeium). Cascade agent, supercomplete, and a generous free tier.
Droid (Factory)
Factory's autonomous SWE agent. Operates over GitHub PRs, Slack, Linear. Targets the long-running multi-file change workflow.
Devin
Cognition Labs' fully autonomous SWE agent. Cloud-only, browser interface, longest task horizons. Premium pricing.
Replit Agent 3
Replit's full-stack scaffolder agent. Goes from prompt to deployed app on Replit's hosted runtime.
JetBrains AI Assistant
JetBrains' first-party AI for IntelliJ, PyCharm, WebStorm, etc. Multi-LLM backend (OpenAI, Anthropic, Gemini, local).
Msty
Cross-platform desktop client supporting local and cloud models in one window. Strong on knowledge-stack RAG.
GitHub Copilot
GitHub's incumbent AI assistant. VS Code, JetBrains, Neovim integrations. Lost some inline-completion mindshare to Cursor and agentic mindshare to Claude Code, but still the easiest enterprise rollout via GitHub.