RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38

Local AI tools

95 runtimes reviewed. Runners, GUIs, and servers for every workflow.

95 runtimes tracked·95 actively maintained·0 with reproduced benchmarks in last 12mo
TurboVec
orchestrator
Active
Jun 18, 2026
0 benchmarks
0 reproduced
LibreChat
gui
Active
Jun 12, 2026
0 benchmarks
0 reproduced
AnythingLLM
gui
Active
Jun 12, 2026
0 benchmarks
0 reproduced
SGLang
server
Active
Jun 12, 2026
0 benchmarks
0 reproduced
vLLM
server
Active
Jun 12, 2026
0 benchmarks
0 reproduced
Intel OpenVINO
runner
Active
Jun 12, 2026
0 benchmarks
0 reproduced
ONNX Runtime
runner
Active
Jun 12, 2026
0 benchmarks
0 reproduced
Open WebUI
gui
Active
Jun 12, 2026
0 benchmarks
0 reproduced
TensorRT-LLM
server
Active
Jun 12, 2026
0 benchmarks
0 reproduced
LM Studio
gui
Active
Jun 12, 2026
0 benchmarks
0 reproduced
ExLlamaV2
runner
Active
Jun 12, 2026
0 benchmarks
0 reproduced
ROCm
runner
Active
Jun 12, 2026
0 benchmarks
0 reproduced
MLX-LM
runner
Active
Jun 12, 2026
0 benchmarks
0 reproduced
llama.cpp
runner
Active
Jun 12, 2026
0 benchmarks
0 reproduced
Ollama
runner
Active
Jun 12, 2026
0 benchmarks
0 reproduced
OpenClaw
orchestrator
Active
Jun 12, 2026
0 benchmarks
0 reproduced
OpenHands
agent
Active
Jun 12, 2026
0 benchmarks
0 reproduced
Hyperspace (P2P inference network)
server
Active
Jun 12, 2026
0 benchmarks
0 reproduced
Mem0 (agent memory API)
agent
Active
Jun 12, 2026
0 benchmarks
0 reproduced
Letta (memory framework)
agent
Active
Jun 12, 2026
0 benchmarks
0 reproduced
Model Context Protocol (MCP)
agent
Active
Jun 12, 2026
0 benchmarks
0 reproduced
Goose
agent
Active
Jun 12, 2026
0 benchmarks
0 reproduced
Roo Code (sunsetting May 15, 2026)
agent
Active
Jun 12, 2026
0 benchmarks
0 reproduced
Claude Desktop
agent
Active
Jun 12, 2026
0 benchmarks
0 reproduced
Pi (Inflection AI)
agent
Active
Jun 12, 2026
0 benchmarks
0 reproduced
Zed (with AI)
ide
Active
Jun 12, 2026
0 benchmarks
0 reproduced
Sourcegraph Cody
agent
Active
Jun 12, 2026
0 benchmarks
0 reproduced
JetBrains AI Assistant
ide
Active
Jun 12, 2026
0 benchmarks
0 reproduced
Replit Agent 3
agent
Active
Jun 12, 2026
0 benchmarks
0 reproduced
Devin
agent
Active
Jun 12, 2026
0 benchmarks
0 reproduced
Droid (Factory)
agent
Active
Jun 12, 2026
0 benchmarks
0 reproduced
Windsurf (Codeium)
ide
Active
Jun 12, 2026
0 benchmarks
0 reproduced
Continue
agent
Active
Jun 12, 2026
0 benchmarks
0 reproduced
Kilo Code
agent
Active
Jun 12, 2026
0 benchmarks
0 reproduced
Cline
agent
Active
Jun 12, 2026
0 benchmarks
0 reproduced
OpenCode
agent
Active
Jun 12, 2026
0 benchmarks
0 reproduced
Aider
agent
Active
Jun 12, 2026
0 benchmarks
0 reproduced
Codex CLI
agent
Active
Jun 12, 2026
0 benchmarks
0 reproduced
OpenAI Codex
agent
Active
Jun 12, 2026
0 benchmarks
0 reproduced
GitHub Copilot
ide
Active
Jun 12, 2026
0 benchmarks
0 reproduced
Cursor
ide
Active
Jun 12, 2026
0 benchmarks
0 reproduced
Claude Code
agent
Active
Jun 12, 2026
0 benchmarks
0 reproduced
SillyTavern
gui
Active
Jun 12, 2026
0 benchmarks
0 reproduced
DirectML
runner
Active
Jun 12, 2026
0 benchmarks
0 reproduced
Aphrodite Engine
runner
Active
Jun 12, 2026
0 benchmarks
0 reproduced
IPEX-LLM
runner
Active
Jun 12, 2026
0 benchmarks
0 reproduced
llama-cpp-python
runner
Active
Jun 12, 2026
0 benchmarks
0 reproduced
CTranslate2
runner
Active
Jun 12, 2026
0 benchmarks
0 reproduced
Qualcomm AI Hub
runner
Active
Jun 12, 2026
0 benchmarks
0 reproduced
MLX Swift
runner
Active
Jun 12, 2026
0 benchmarks
0 reproduced
ONNX Runtime Mobile
runner
Active
Jun 12, 2026
0 benchmarks
0 reproduced
MLC LLM
runner
Active
Jun 12, 2026
0 benchmarks
0 reproduced
ExecuTorch
runner
Active
Jun 12, 2026
0 benchmarks
0 reproduced
Firecrawl MCP
server
Active
Jun 12, 2026
0 benchmarks
0 reproduced
MCP Sequential Thinking
server
Active
Jun 12, 2026
0 benchmarks
0 reproduced
MCP Git Server
server
Active
Jun 12, 2026
0 benchmarks
0 reproduced
MCP Fetch Server
server
Active
Jun 12, 2026
0 benchmarks
0 reproduced
MCP Memory Server
server
Active
Jun 12, 2026
0 benchmarks
0 reproduced
MCP Brave Search Server
server
Active
Jun 12, 2026
0 benchmarks
0 reproduced
Playwright MCP
server
Active
Jun 12, 2026
0 benchmarks
0 reproduced
MCP PostgreSQL Server
server
Active
Jun 12, 2026
0 benchmarks
0 reproduced
MCP GitHub Server
server
Active
Jun 12, 2026
0 benchmarks
0 reproduced
MCP Filesystem Server
server
Active
Jun 12, 2026
0 benchmarks
0 reproduced
Phoenix (Arize AI)
orchestrator
Active
Jun 12, 2026
0 benchmarks
0 reproduced
LangSmith
orchestrator
Active
Jun 12, 2026
0 benchmarks
0 reproduced
TabbyAPI
server
Active
Jun 12, 2026
0 benchmarks
0 reproduced
LocalAI
server
Active
Jun 12, 2026
0 benchmarks
0 reproduced
Exo
server
Active
Jun 12, 2026
0 benchmarks
0 reproduced
Petals
server
Active
Jun 12, 2026
0 benchmarks
0 reproduced
Ray Serve
orchestrator
Active
Jun 12, 2026
0 benchmarks
0 reproduced
Zep (memory platform)
server
Active
Jun 12, 2026
0 benchmarks
0 reproduced
Graphiti (Zep)
server
Active
Jun 12, 2026
0 benchmarks
0 reproduced
Neo4j GraphRAG
server
Active
Jun 12, 2026
0 benchmarks
0 reproduced
Redis (vector search)
server
Active
Jun 12, 2026
0 benchmarks
0 reproduced
Milvus
server
Active
Jun 12, 2026
0 benchmarks
0 reproduced
LanceDB
server
Active
Jun 12, 2026
0 benchmarks
0 reproduced
Weaviate
server
Active
Jun 12, 2026
0 benchmarks
0 reproduced
Qdrant
server
Active
Jun 12, 2026
0 benchmarks
0 reproduced
Chroma
server
Active
Jun 12, 2026
0 benchmarks
0 reproduced
Unsloth
finetuner
Active
Jun 12, 2026
0 benchmarks
0 reproduced
Axolotl
finetuner
Active
Jun 12, 2026
0 benchmarks
0 reproduced
Hugging Face Hub CLI
quantizer
Active
Jun 12, 2026
0 benchmarks
0 reproduced
Stable Diffusion WebUI (AUTOMATIC1111)
gui
Active
Jun 12, 2026
0 benchmarks
0 reproduced
ComfyUI
gui
Active
Jun 12, 2026
0 benchmarks
0 reproduced
Pinokio
orchestrator
Active
Jun 12, 2026
0 benchmarks
0 reproduced
Open Interpreter
orchestrator
Active
Jun 12, 2026
0 benchmarks
0 reproduced
LlamaIndex
orchestrator
Active
Jun 12, 2026
0 benchmarks
0 reproduced
LangChain
orchestrator
Active
Jun 12, 2026
0 benchmarks
0 reproduced
GPT4All
gui
Active
Jun 12, 2026
0 benchmarks
0 reproduced
Msty
gui
Active
Jun 12, 2026
0 benchmarks
0 reproduced
Jan
gui
Active
Jun 12, 2026
0 benchmarks
0 reproduced
Text Generation WebUI (oobabooga)
gui
Active
Jun 12, 2026
0 benchmarks
0 reproduced
KoboldCPP
gui
Active
Jun 12, 2026
0 benchmarks
0 reproduced
Llamafile
runner
Active
Jun 12, 2026
0 benchmarks
0 reproduced
Text Generation Inference (TGI)
server
Active
Jun 12, 2026
0 benchmarks
0 reproduced

Catalog

TurboVec

orchestrator
OSS

TurboVec is an open-source, **local-first vector index** (Rust core + Python bindings) by Ryan Codrai, MIT-licensed, built on Google Research's **TurboQuant** quantizer (presented at ICLR 2026). Its pitch for local AI: f

Linux
macOS
Windows
Our rating: 4.4/5

LibreChat

gui
OSS

Open-source ChatGPT clone with multi-provider support (OpenAI, Anthropic, local LLMs via OpenAI-compatible APIs). The most popular self-hosted ChatGPT-shaped frontend. Strong multi-user + RAG + plugin support; pairs well

Windows
macOS
Linux
Docker

AnythingLLM

gui
OSS

Document-oriented LLM frontend with workspaces. Connects to Ollama, LM Studio, OpenAI, Anthropic, etc. Strong document RAG.

macOS
Linux
Windows
Docker
Our rating: 4.4/5

SGLang

server
OSS

Structured generation language + runtime for LLM programs. RadixAttention reuses KV cache across prompts with shared prefixes — significant throughput wins for agent workloads where many tool calls share system prompts.

Linux
Docker

vLLM

server
OSS

High-throughput inference engine with PagedAttention, continuous batching, and tensor + pipeline parallelism. The reference deployment runtime when you've outgrown llama.cpp / Ollama for production serving. Backed by Any

Linux
Our rating: 4.8/5

Intel OpenVINO

runner
OSS

Intel's inference toolkit. The first-class path for Intel Arc GPUs, Intel NPUs (Lunar Lake / Meteor Lake), and CPU-optimized inference on x86. Ships pre-quantized model variants tuned for Intel hardware via the OpenVINO

Windows
Linux
macOS

ONNX Runtime

runner
OSS

Microsoft's cross-platform inference runtime for ONNX models. The reference path when you need a single runtime that targets CUDA + DirectML + CoreML + OpenVINO + ROCm from one binary. Stronger on classical models (visio

Windows
macOS
Linux

Open WebUI

gui
OSS

Self-hosted ChatGPT-style web frontend. Pairs with Ollama or any OpenAI-compatible backend. Multi-user, RAG built in, fast.

macOS
Linux
Windows
Docker
Our rating: 4.6/5

TensorRT-LLM

server
OSS

NVIDIA's first-party inference compiler. Generates optimized engines per model + GPU pair, with the lowest latency on NVIDIA hardware. The pick when you're committed to a single SKU and need the absolute fastest tokens-p

Linux
Windows
Our rating: 4.3/5

LM Studio

gui

Polished desktop GUI for local LLMs. Built-in HuggingFace search, OpenAI-compatible local server, side-by-side conversations.

macOS
Linux
Windows
Our rating: 4.5/5

ExLlamaV2

runner
OSS

Hand-optimized inference for EXL2-quantized models. Fastest single-GPU runtime for the EXL2 quant format on Ada/Hopper hardware. Lower-level than llama.cpp; pairs with text-generation-webui + TabbyAPI as front-ends.

Linux
Windows
Our rating: 4.4/5

ROCm

runner
OSS

AMD's open-source equivalent of NVIDIA CUDA. Required for any meaningful AMD GPU inference on Linux (vLLM, llama.cpp ROCm build, ExLlamaV2). Windows ROCm is improving as of 2026 but still trails Linux. Strix Halo APU + R

Linux
Windows

MLX-LM

runner
OSS

Apple's Metal-native ML framework's LLM runner. Now competitive with llama.cpp Metal on M-series silicon, with better long-context performance.

macOS
Our rating: 4.5/5

llama.cpp

runner
OSS

The bedrock of local LLM inference. Most other tools wrap or embed it. Maximum control, maximum platform support, sharpest learning curve.

macOS
Linux
Windows
BSD
Android
Our rating: 4.6/5

Ollama

runner
OSS

The default first-pull tool for local AI. One-line model installs (`ollama run llama3.1`), an OpenAI-compatible HTTP API, good defaults out of the box. Built on llama.cpp.

macOS
Linux
Windows
Our rating: 4.7/5

OpenClaw

orchestrator
OSS

Personal AI agent with a local-first gateway architecture. Connects your local LLMs (Ollama, llama.cpp) to the messaging surfaces you already use — WhatsApp, Telegram, Slack, Discord, iMessage, and 20+ more. The runaway

macOS
Linux
Windows
Our rating: 4/5

OpenHands

agent
OSS

AI-driven development agent that completes engineering tasks end-to-end — branches, code, PRs. v1.6 added a Planning Mode that drafts a plan before executing. Local-LLM-friendly via Ollama, vLLM, and SGLang. The stronges

macOS
Linux
Windows
Our rating: 4.5/5

Hyperspace (P2P inference network)

server
OSS

Decentralized peer-to-peer AI inference network. 2.7M+ CLI downloads, 2M+ active nodes globally as of April 2026. Three-tier model routing (local registry → DHT → gossip broadcast) supports any GGUF model. The April 2026

macOS
Linux
Windows
Browser
Our rating: 3.9/5

Mem0 (agent memory API)

agent
OSS

Drop-in memory layer for LLM agents. Vector + graph memory variants (Mem0g) — the graph variant builds a directed labeled knowledge graph alongside the vector store, with conflict detection on contradictory facts. Leads

macOS
Linux
Windows
Our rating: 4.3/5

Letta (memory framework)

agent
OSS

Agent memory framework that models memory like an operating system. Main context = RAM, archival storage = disk; the agent itself decides when to page. Originally MemGPT, now Letta. Model-agnostic (Anthropic, OpenAI, Oll

macOS
Linux
Windows
Our rating: 4.1/5

Model Context Protocol (MCP)

agent
OSS

Open protocol for LLM clients to talk to external tools and data sources. The 'USB-C for AI' that became the default in 2026 — supported by Anthropic, OpenAI, and Google DeepMind, with 500+ public MCP servers covering Gi

macOS
Linux
Windows
Our rating: 4.7/5

Goose

agent
OSS

Open-source extensible AI agent now governed by the Agentic AI Foundation (AAIF) at the Linux Foundation. Started inside Block (formerly Square). 25+ provider support including Ollama, Ramalama, Docker Model Runner. Best

macOS
Linux
Windows
Our rating: 4.2/5

Roo Code (sunsetting May 15, 2026)

agent
OSS

Open-source AI dev-team extension for VS Code (1.55M installs, 23.8k GitHub stars). **Discontinued: all Roo Code products — Extension, Cloud, and Router — shut down on May 15, 2026** with refunds for unused balances. The

macOS
Linux
Windows

Claude Desktop

agent

Anthropic's official desktop app for Claude. Native MCP server support means you can plug in local file access, GitHub, and custom tools. Distinct from the Claude Code CLI.

macOS
Windows
Our rating: 4.4/5

Pi (Inflection AI)

agent

Inflection AI's consumer assistant — voice-first, conversational, designed for personal use rather than coding. Powered by Inflection-2.5.

web (browser)
iOS
Android
Our rating: 3.9/5

Zed (with AI)

ide
OSS

High-performance native editor from the Atom team, with built-in AI panel and inline assistant. BYO API key for any provider.

macOS
Linux
Windows
Our rating: 4.5/5

Sourcegraph Cody

agent
OSS

Sourcegraph's AI assistant. Strong at large-codebase context retrieval thanks to the underlying Sourcegraph index.

macOS
Linux
Windows
Our rating: 4.1/5

JetBrains AI Assistant

ide

JetBrains' first-party AI for IntelliJ, PyCharm, WebStorm, etc. Multi-LLM backend (OpenAI, Anthropic, Gemini, local).

macOS
Linux
Windows
Our rating: 4/5

Replit Agent 3

agent

Replit's full-stack scaffolder agent. Goes from prompt to deployed app on Replit's hosted runtime.

web (browser)
Our rating: 4.3/5

Devin

agent

Cognition Labs' fully autonomous SWE agent. Cloud-only, browser interface, longest task horizons. Premium pricing.

web (browser)
Our rating: 4/5

Droid (Factory)

agent

Factory's autonomous SWE agent. Operates over GitHub PRs, Slack, Linear. Targets the long-running multi-file change workflow.

macOS
Linux
Windows
Our rating: 4.2/5

Windsurf (Codeium)

ide

Codeium's AI-native IDE (formerly known as Codeium). Cascade agent, supercomplete, and a generous free tier.

macOS
Linux
Windows
Our rating: 4.3/5

Continue

agent
OSS

Open-source VS Code and JetBrains assistant. Configurable autocomplete + chat + agent modes. Strong with local Ollama backends.

macOS
Linux
Windows
Our rating: 4.4/5

Kilo Code

agent
OSS

VS Code agent — 1.5M users in 2026, supports 500+ models, charges zero markup over upstream API costs. Cline lineage with Roo Code's diff approach.

macOS
Linux
Windows

Cline

agent
OSS

VS Code extension agent — ~4M installs in 2026. Plan/Act mode, autonomous file edits with diff approval, terminal access. The leading open-source IDE agent.

macOS
Linux
Windows
Our rating: 4.7/5

OpenCode

agent
OSS

Open-source terminal coding agent built by the SST team. TUI-first, BYO LLM, MCP-compatible. A Claude-Code-style workflow without the Anthropic lock-in.

macOS
Linux
Windows
Our rating: 4.4/5

Aider

agent
OSS

Terminal-based AI pair programmer. Run in your project directory, describe a change, it edits files and creates meaningful git commits. Works with any LLM — local Ollama, Anthropic, OpenAI, etc.

macOS
Linux
Windows
Our rating: 4.6/5

Codex CLI

agent
OSS

Open-source CLI client for the new Codex agent. Local CLI that orchestrates cloud Codex models against your file tree.

macOS
Linux
Windows
Our rating: 4.3/5

OpenAI Codex

agent

OpenAI's 2025 coding agent (the new Codex, distinct from the deprecated 2021 model). Cloud task-runner pattern: hand it a multi-step task, it works in a sandbox and returns a PR.

macOS
Linux
Windows
Our rating: 4.4/5

GitHub Copilot

ide

GitHub's incumbent AI assistant. VS Code, JetBrains, Neovim integrations. Lost some inline-completion mindshare to Cursor and agentic mindshare to Claude Code, but still the easiest enterprise rollout via GitHub.

macOS
Linux
Windows
Our rating: 4.2/5

Cursor

ide

Anysphere's AI-native IDE. Forks VS Code with Cursor Tab inline completion, agentic chat, and background agents. Best 'flow' for inline completion in 2026.

macOS
Linux
Windows
Our rating: 4.6/5

Claude Code

agent

Anthropic's terminal-native coding agent. Tops SWE-bench Verified at 87.6% and SWE-bench Pro at 64.3% in 2026. Deep MCP integration, agentic file editing, and a $20/mo Pro tier are the standout signals.

macOS
Linux
Windows
Our rating: 4.8/5

SillyTavern

gui
OSS

Character-driven LLM frontend originally for role-play; widely used for any persona-driven workflow. Supports OpenAI, KoboldAI, llama.cpp, Ollama, Aphrodite, oobabooga endpoints. Rich sampling controls, character cards,

Windows
macOS
Linux
Docker
Android (Termux)

DirectML

runner

Microsoft's DirectX 12 inference backend. The Windows-native path for AMD / Intel / Qualcomm GPU + NPU acceleration without ROCm or vendor-specific SDKs. Used through ONNX Runtime as the DML execution provider.

Windows

Aphrodite Engine

runner
OSS

vLLM fork specialized for creative writing / role-play workloads. Adds samplers (smoothing factor, dynatemp, mirostat, DRY, XTC) that mainline vLLM doesn't ship. Same continuous-batching architecture; trades some through

Linux
Windows

IPEX-LLM

runner
OSS

Intel's PyTorch extension for low-bit LLM inference on Intel GPUs / CPUs / NPUs. Strongest community-supported path for running LLMs on Intel Arc A770 / B580 and on Lunar Lake NPUs. Compatible with Hugging Face Transform

Linux
Windows

llama-cpp-python

runner
OSS

Python bindings for llama.cpp with an OpenAI-compatible HTTP server. The fastest path from `pip install` to a working local-LLM endpoint. Ships pre-built wheels with optional CUDA / Metal / ROCm / Vulkan support.

Windows
macOS
Linux

CTranslate2

runner
OSS

Specialized transformer inference engine. The reference runtime for Whisper (faster-whisper), NLLB translation, and other encoder-decoder models. Out-of-the-box INT8 quantization with strong CPU performance.

Windows
macOS
Linux

Qualcomm AI Hub

runner

Qualcomm's official on-device-AI compiler + model zoo for Snapdragon NPU targets. Pre-quantized model variants for Llama, Phi, Gemma, Qwen running on Hexagon NPU. The reference path for Android NPU acceleration in 2025-2

Android
Windows

MLX Swift

runner
OSS

Apple's Swift bindings for MLX. The native iOS / iPadOS path for on-device LLM inference. Apple-published example apps demonstrate Llama 3.2, Phi-3.5, Qwen 2.5 running on iPhone 15 Pro+ at usable rates.

iOS
macOS
iPadOS

ONNX Runtime Mobile

runner
OSS

Microsoft's mobile/edge variant of ONNX Runtime. The reference path for Snapdragon X / Lunar Lake / Ryzen AI on Windows + Copilot+ PC NPU acceleration. Mobile builds drop ops not used in inference to keep binary size sma

Android
iOS
Windows

MLC LLM

runner
OSS

TVM-based LLM compilation framework. Compiles models for any GPU with a Vulkan / Metal / WebGPU / CUDA backend. The most-deployed cross-platform on-device LLM runtime — runs Llama, Phi, Gemma, Qwen on phones, browsers, a

iOS
Android
Windows
macOS
Linux

ExecuTorch

runner
OSS

PyTorch's official mobile / edge inference runtime. Compiles PyTorch models to a mobile-optimized format for Android (NNAPI / GPU / NPU) and iOS (Metal / CoreML). The successor to the deprecated PyTorch Mobile path.

iOS
Android
Linux
macOS

Firecrawl MCP

server
OSS

MCP server wrapping Firecrawl — a managed crawler that handles JavaScript rendering, anti-bot evasion, and large-site map+scrape jobs at scale. The pragmatic upgrade from mcp-server-fetch when an agent needs to crawl tho

macOS
Linux
Windows

MCP Sequential Thinking

server
OSS

Reference MCP server that gives an agent a structured scratchpad for multi-step reasoning. Each call records a numbered thought with revision and branching support — the agent can backtrack, fork, and consolidate plans w

macOS
Linux
Windows

MCP Git Server

server
OSS

Reference MCP server for local Git repository operations. Status, diff, log, blame, branch listing — read-side operations against a checked-out repo without round-tripping to GitHub. Pairs with mcp-server-filesystem to g

macOS
Linux
Windows

MCP Fetch Server

server
OSS

Reference MCP server for fetching and converting web content. Pulls a URL, runs HTML through a readability extractor, returns markdown the model can chunk and reason over. The lightweight web-reader pair to Brave Search

macOS
Linux
Windows

MCP Memory Server

server
OSS

Reference MCP server that gives an agent a persistent knowledge graph — entities, relations, observations stored to disk and surfaced back across sessions. The simplest path to making an agent remember context between co

macOS
Linux
Windows

MCP Brave Search Server

server
OSS

Reference MCP server wrapping the Brave Search API. Privacy-respecting alternative to Google/Bing endpoints — Brave does not maintain a personal-history-linked index. The default web-search MCP in the Anthropic reference

macOS
Linux
Windows

Playwright MCP

server
OSS

Microsoft's MCP server that drives a real browser via Playwright — Chromium, Firefox, and WebKit. Ships ~22 tools that operate against the page's accessibility tree rather than pixel coordinates, which is dramatically mo

macOS
Linux
Windows

MCP PostgreSQL Server

server
OSS

Reference MCP server that exposes a Postgres database as a query surface. Read-only by default — but worth flagging that early versions had a SQL-injection class issue where the read-only wrapper could be bypassed by sta

macOS
Linux
Windows

MCP GitHub Server

server
OSS

GitHub's first-party MCP server. Surfaces issues, pull requests, code search, file contents, repo metadata, Actions runs, and discussions through the protocol. Now maintained by GitHub itself rather than the original Ant

macOS
Linux
Windows

MCP Filesystem Server

server
OSS

Anthropic's reference MCP server for filesystem access. Read, write, search, move, and list files inside a configured allowlist of directories. The canonical example for understanding how MCP tool exposure works in pract

macOS
Linux
Windows

Phoenix (Arize AI)

orchestrator
OSS

Open-source LLM tracing + evaluation. OpenInference standard for traces; runs locally with one pip install. The OSS-first pick for teams that want LangSmith-shaped functionality without vendor lock-in.

macOS
Linux
Windows

LangSmith

orchestrator

LangChain's observability + evaluation platform. Trace agent runs, run evaluators against benchmark suites, version prompts. The dominant trace+eval tool for the LangChain/LangGraph ecosystem.

macOS
Linux
Windows

TabbyAPI

server
OSS

OpenAI-API frontend for ExLlamaV2. Wraps the EXL2 inference engine in a clean HTTP API, adds streaming, batching, and OAI-compatible chat templates. The default front-of-house when you've already committed to the EXL2 qu

Linux
Windows
macOS

LocalAI

server
OSS

OpenAI-API-compatible drop-in for self-hosted inference, with a multi-backend twist: the same endpoint can serve LLMs (llama.cpp / vLLM under the hood), embeddings, image gen (stable-diffusion.cpp), audio (whisper.cpp),

Linux
macOS
Windows
Docker
Kubernetes

Exo

server
OSS

Personal AI cluster software. Auto-discovers Apple Silicon devices on a LAN and shards a model across them via pipeline + tensor parallelism on top of MLX. The 2026 unlock: Thunderbolt 5 + macOS 26.2 RDMA dropped inter-d

macOS
Linux

Petals

server
OSS

BitTorrent-style decentralized LLM inference. Splits a model into transformer-block shards distributed across volunteer hosts on the public internet — one client runs the input/output layers locally and streams activatio

Linux
macOS

Ray Serve

orchestrator
OSS

Distributed model serving on top of Ray. Lets you stitch vLLM / SGLang / custom runtimes into a multi-replica, multi-model deployment with autoscaling, traffic splitting, and pipeline composition. The orchestration layer

Linux
macOS
Kubernetes

Zep (memory platform)

server
OSS

Long-term memory platform for AI agents. Sits above Graphiti as the application layer — sessions, facts, summaries, vector + graph hybrid retrieval. The 'memory backend you don't have to build' choice.

macOS
Linux
Docker

Graphiti (Zep)

server
OSS

Temporal graph memory framework. Builds a bi-temporal knowledge graph from agent conversations, tracking when each fact was learned and when it was true. Powers Zep's hosted offering.

macOS
Linux
Windows

Neo4j GraphRAG

server
OSS

Neo4j's official GraphRAG toolkit — Python library + reference patterns for building retrieval-augmented generation against a knowledge graph. The mature pick for enterprises already running Neo4j.

macOS
Linux
Windows
Docker

Redis (vector search)

server
OSS

Vector search inside the same Redis you already run. HNSW + flat indices, hybrid filtering with FT.SEARCH. The pragmatic pick when you don't want to add another service to ops.

macOS
Linux
Windows
Docker

Milvus

server
OSS

Distributed vector database designed for billion-scale workloads. Compute-storage separation, GPU-accelerated index builds, multi-tenant from the ground up. The pick when you've outgrown Qdrant single-node.

Linux
Docker
Kubernetes

LanceDB

server
OSS

Embedded vector + columnar database. Lance file format reads serverless from S3/local disk; no separate process to run. The pick for embedded apps and notebook workflows.

macOS
Linux
Windows

Weaviate

server
OSS

Vector database with built-in modules for embedding, generative search, and reranking. Schema-first design appeals to teams used to traditional databases. Generative-search module pairs with local Ollama models out of th

macOS
Linux
Windows
Docker
Kubernetes

Qdrant

server
OSS

Vector database written in Rust. Strong filtering (payload-based pre-filter), HNSW index with quantization variants, gRPC + REST APIs. The performance pick when you cross 10M vectors.

macOS
Linux
Windows
Docker

Chroma

server
OSS

Open-source embedding database for LLM applications. The default 'just install pip and start' vector store for prototypes, with first-party clients in Python and JS. SQLite-backed locally, distributed mode in cloud.

macOS
Linux
Windows

Unsloth

finetuner
OSS

2x faster QLoRA fine-tuning with hand-tuned Triton kernels. Free OSS for single-GPU; commercial Pro for multi-GPU.

Linux
Our rating: 4.6/5

Axolotl

finetuner
OSS

YAML-config fine-tuning framework. Reference toolkit for the open fine-tuning community (Hermes, Dolphin, etc. all use it).

Linux
Our rating: 4.4/5

Hugging Face Hub CLI

quantizer
OSS

The CLI for the world's model hub. `hf download`, `hf upload`, model card editing.

any
Our rating: 4.5/5

Stable Diffusion WebUI (AUTOMATIC1111)

gui
OSS

The original Stable Diffusion frontend. Less actively developed in 2026 than ComfyUI but still has the cleanest UX for simple gen.

macOS
Linux
Windows
Our rating: 4.4/5

ComfyUI

gui
OSS

Node-graph image-generation UI. Standard for Stable Diffusion and Flux workflows. Endlessly customizable.

macOS
Linux
Windows
Our rating: 4.7/5

Pinokio

orchestrator
OSS

Browser-style app launcher for AI tools. One-click installs of ComfyUI, oobabooga, RVC, and many other AI apps.

macOS
Linux
Windows
Our rating: 4.1/5

Open Interpreter

orchestrator
OSS

Lets LLMs execute code locally — Python, shell, AppleScript. The original 'Code Interpreter on your machine'. Useful for automation tasks.

macOS
Linux
Windows
Our rating: 4.3/5

LlamaIndex

orchestrator
OSS

Python/JS framework focused on RAG and document indexing. Cleaner than LangChain for retrieval-heavy use cases.

any
Our rating: 4.2/5

LangChain

orchestrator
OSS

Python/JS framework for chains, agents, and RAG. Batteries-included but heavyweight; many graduate to LangGraph or DIY.

any
Our rating: 4/5

GPT4All

gui
OSS

One of the original local-LLM apps from Nomic. Privacy-focused, runs on CPU, decent model library. Pace of development has slowed compared to Jan/Msty.

macOS
Linux
Windows
Our rating: 4/5

Msty

gui

Cross-platform desktop client supporting local and cloud models in one window. Strong on knowledge-stack RAG.

macOS
Linux
Windows
Our rating: 4.3/5

Jan

gui
OSS

Open-source desktop ChatGPT alternative. Privacy-first, runs offline, supports Hugging Face import.

macOS
Linux
Windows
Our rating: 4.4/5

Text Generation WebUI (oobabooga)

gui
OSS

The 'AUTOMATIC1111 of LLMs'. Kitchen-sink Gradio UI with multi-backend support and a big extension ecosystem.

macOS
Linux
Windows
Our rating: 4.3/5

KoboldCPP

gui
OSS

Single-file llama.cpp distribution focused on roleplay and creative writing. Bundles a web UI, image gen, and the Kobold API.

macOS
Linux
Windows
Our rating: 4.4/5

Llamafile

runner
OSS

Mozilla's single-binary llama.cpp distribution. Download one file, run on any OS without dependencies.

macOS
Linux
Windows
Our rating: 4.4/5

Text Generation Inference (TGI)

server
OSS

HuggingFace's production inference server. Slightly behind vLLM on raw throughput but tighter integration with the HF ecosystem.

Linux
Our rating: 4.2/5