What plugs into your local AI runtime. 39 curated apps across 12 categories — chat UIs, coding agents, RAG pipelines, voice, image, browser extensions, editor plugins, mobile + desktop, agent frameworks, productivity, SDK wrappers.
Each entry carries an honest editorial verdict — pros, cons, the runtime it works against, the minimum VRAM, and the privacy posture. Filter to your stack, jump to the detail page, ship.
URL updates as you change filters — share or bookmark a result. All filters are server-rendered, so the page works without JS.
39 of 39 apps
The default chat UI for solo Ollama users. Multi-model, built-in RAG, web search, Docker-friendly.
“Best default chat UI for solo Ollama users. Pick this first; switch only if you outgrow it.”
The default agent framework. Pipelines, retrievers, tool-calling — works against any local backend.
“The default agent framework. Heavy on abstractions, deep ecosystem — pick this if you want defaults.”
Terminal coding agent that edits files via your local model. Git-aware, surgical, fast.
“Best terminal-native coding agent for local models. Qwen 2.5 Coder 32B is its sweet spot.”
Desktop app that bundles model download + chat + OpenAI-compatible local server. Closed-source but free.
“Best 'first install' desktop app for newcomers. Closed-source but the easiest first-run experience.”
VS Code extension that runs a full agent loop locally — reads, writes, runs commands, asks first.
“Best IDE-integrated agent that fully respects 'all local' as a first-class option.”
PewDiePie's self-hosted AI workspace: chat, agents, deep research, and a hardware-aware model Cookbook in one local-first dashboard.
“The most visible on-ramp to local AI yet — its hardware-aware Cookbook makes it a genuine beginner pick, but it's young and 'janky' by its own README; treat Agent mode's shell access with caution.”
Privacy-first desktop chat with a curated model catalog. Llama / Mistral / Qwen one click from the app.
“Best one-binary desktop chat. Curated catalog removes 'which model?' decision paralysis.”
Docs-aware chat with workspaces. Drop a folder of PDFs, get a working RAG chatbot in 5 minutes.
“Best fast-RAG app. Workspace model is the right abstraction for doc-corpora chat.”
RAG-first agent framework. Better defaults than LangChain for doc-corpora work; same local-runtime story.
“Best agent framework for RAG-first workloads. Less abstraction than LangChain.”
Open-source autocomplete + chat for VS Code and JetBrains. Local-model-first.
“Best Copilot replacement that defaults to local. Configurable; pair with Qwen 2.5 Coder.”
Air-gappable RAG over your docs. The OG offline-RAG project, now mature and team-friendly.
“Best when air-gap compliance is the requirement. Less polished than AnythingLLM, more configurable.”
Drop-in OpenAI-compatible proxy across 100+ providers. Route to local Ollama or cloud, same code.
“Best universal LLM proxy. Foundational layer for multi-provider deployments.”
Open-source clone of the ChatGPT UI with multi-provider routing. Local + cloud in one interface.
“Best if you mix local + cloud models in the same workflow. Strong team features.”
Official Python SDK for Ollama. Async, streaming, typed — the right primitive for scripts.
“Foundational primitive for Python scripts against Ollama. Official, maintained, typed.”
Free, native macOS / iOS Stable Diffusion app. Runs SD3, Flux on a phone (yes, really).
“Best mobile + macOS SD app. Free, native, no Python — runs Flux on Apple Silicon impressively well.”
Nomic's free desktop AI with model catalog + chat + Python SDK. Long-standing, open-source.
“Best fully-open-source desktop AI bundler. Less polished than LM Studio, fully MIT.”
Self-hosted coding agent server with team SSO, audit logs, and dashboards. Enterprise-grade.
“Best self-hosted server for teams. SSO + audit logs make it the IT-friendly pick.”
Production-leaning multi-agent framework. Role + goal + task — opinionated and ergonomic.
“Best ergonomic multi-agent framework. Picks defaults you'd otherwise have to argue about.”
Obsidian plugin that wires Ollama / OpenAI into your notes. Inline chat, summarize, prompt-templates.
“Best Obsidian plugin for local LLM in your notes. Pair with Smart Connections for RAG.”
Self-hosted AI assistant for your notes, emails, docs. Web + mobile + desktop, all local-first.
“Best 'AI second brain' app. Self-hosted, local-first, works against Obsidian.”
Native iOS / macOS Ollama client. Beautiful SwiftUI, talks to your home Ollama server.
“Best mobile Ollama client. Native SwiftUI; works against your home Ollama server.”
Microsoft's multi-agent framework. Conversation-first orchestration of role-played agents.
“Best for multi-agent role-played workflows. Niche; not the default agent framework.”
Local semantic search across all your Obsidian notes. Embed-once, query-fast, fully offline.
“Best local semantic search for personal notes. Foundational layer for Obsidian RAG.”
Native macOS app for Whisper transcription. Drag a file in, get a transcript out.
“Best Whisper desktop app on macOS. Pay once, transcribe locally forever.”
Open-source Whisper transcription with mic + file modes. Cross-platform Qt app.
“Best open-source Whisper desktop app. Cross-platform, free, less polish than MacWhisper.”
Krita plugin that wires ComfyUI into a real digital-art workflow. Inpaint, outpaint, upscale.
“Best 'SD as digital-art tool' integration. Real Krita workflow, not a wrapper UI.”
Official Node + browser SDK for Ollama. ESM-first, typed, streaming.
“Foundational primitive for Node + browser apps against Ollama. ESM-native, typed.”
Weaviate's open-source RAG demo turned production. Strong defaults, opinionated stack.
“Best for 'don't make me choose chunking strategy' teams. Opinionated stack works.”
Browser sidebar that talks to your local Ollama. Summarize pages, chat, vision support.
“Best 'sidebar AI' browser extension that's truly local-first.”
AI note-taking app that builds connections between your notes automatically. Local, open-source.
“Best AI-first note app that's actually local. Niche but well-executed.”
Desktop app focused on side-by-side multi-model chat. Compare local vs cloud answers in one view.
“Best 'compare local vs cloud answers' workflow. Niche but well-designed.”
Free, lightweight VS Code copilot that runs entirely on Ollama. Strong on autocomplete.
“Best minimal-surface Copilot-replacement that's been Ollama-native since day one.”
One-click Stable Diffusion app for macOS. No setup, just run.
“Easiest macOS SD app — picks defaults so you don't have to.”
Drop-in OpenAI TTS-compatible server. Self-hosted, talks to local voice models.
“Best 'drop-in local TTS for OpenAI clients'. Bridge solution for existing pipelines.”
Brave's built-in AI assistant. Configurable to talk to local Ollama out of the box.
“Best built-in browser AI for Brave users. Local mode is a checkbox, not a hack.”
Android Ollama client + on-device fallback for small models. Cross-platform Flutter.
“Best cross-platform Android-friendly Ollama client. Falls back to on-device for tiny models.”
Codeium self-hosted enterprise backend lets the popular IDE plugin run fully on your hardware.
“Best 'enterprise Copilot' replacement when self-hosting is mandatory. Paid tier.”
Terminal entry into Khoj's local AI assistant. Use grep, get answers, never leave the shell.
“Best terminal companion for note-summarization workflows. Pipe-friendly.”
Cloud LLM router pitching 'unlimited' inference from $10/month — proxies your requests across a pool of upstream models.
“Cloud-only LLM router. Useful category, novel pricing. The 'unlimited' math has a known failure mode at heavy usage.”
We curate this directory editorially — same review queue as the benchmarks feed. Open an issue with the project link and your one-line pitch for why it belongs.
Editorial review applies — same standards as the rest of the site. We won't list apps that don't actually work against a local runtime, regardless of marketing claims.
The runtime layer: Ollama, vLLM, llama.cpp, MLX, LM Studio server, ComfyUI. What apps in this directory talk to.
Tell us your use case, get a full rig recipe — runtime + models + the apps from this directory that fit your stack.
Match an app's minimum-VRAM requirement to real hardware with our price/perf comparison.
Real operator submissions on the model × hardware × app combos that work. The proof behind the editorial picks.