Docs-aware chat with workspaces. Drop a folder of PDFs, get a working RAG chatbot in 5 minutes.
Editorial verdict: “Best fast-RAG app. Workspace model is the right abstraction for doc-corpora chat.”
Which runtime + OS combos this app works against. Source of truth for "will it run on my setup?"
AnythingLLM is for anyone who needs to turn a folder of PDFs, markdown files, or plain text into a working RAG chatbot in minutes. Its workspace abstraction—each workspace bundles a chat, a knowledge base, and a model config—is the cleanest implementation of this idea. You point it at Ollama or LM Studio, drop in an 8B model like Llama 3.1 Q4_K_M plus a local embedder like bge-m3, and you’re chatting against your documents with citations back to source passages. The trade-off: default Docker setup can eat disk space for embeddings, and best retrieval quality requires running a separate embedding service. It’s hybrid by default—you can keep everything local or mix in cloud APIs—but the fastest path to a working doc-aware chat is still all-local on macOS, Linux, or Windows with at least 8 GB VRAM.
Web or desktop chat client that connects to your local runtime.
Best if you mix local + cloud models in the same workflow. Strong team features.
Best one-binary desktop chat. Curated catalog removes 'which model?' decision paralysis.
Best default chat UI for solo Ollama users. Pick this first; switch only if you outgrow it.
The most visible on-ramp to local AI yet — its hardware-aware Cookbook makes it a genuine beginner pick, but it's young and 'janky' by its own README; treat Agent mode's shell access with caution.
Pre-filled with this app's recommended use case + budget tier. Get the full rig + runtime + model picks.
The full directory — filter by category, runtime, OS, privacy posture, or VRAM.
What this app talks to: Ollama, vLLM, llama.cpp, MLX, LM Studio. The upstream layer.
Did this app work for you on a specific rig? Submit the benchmark — it powers the model + hardware pages.