TabbyAPI

OpenAI-API frontend for ExLlamaV2. Wraps the EXL2 inference engine in a clean HTTP API, adds streaming, batching, and OAI-compatible chat templates. The default front-of-house when you've already committed to the EXL2 quant format and want to expose it to clients that speak OpenAI.

By Fredoline Eruo·Last verified May 6, 2026·1,500 GitHub stars

Overview

Stack & relationships

How TabbyAPI relates to other entries in the catalog — recommended pairings, alternatives, dependencies, and edges to avoid. Each edge carries a one-line operator note from our editorial team.

TabbyAPI ↔ ecosystem

Recommended stack

Pairs with
ExLlamaV2
The canonical pairing for production-ish ExLlamaV2 serving. ExLlamaV2 is the engine; TabbyAPI is the front of house.

Alternatives

Alternative to
Ollama
Both expose OpenAI-compatible APIs locally. TabbyAPI wins on raw single-card EXL2 speed for advanced users; Ollama wins on ergonomics and breadth of quant formats. Pick by quant commitment.

Depends on

Depends on
ExLlamaV2
TabbyAPI is purely a frontend — it wraps ExLlamaV2 in an OpenAI-compatible HTTP API. No TabbyAPI without ExLlamaV2 installed underneath.