TabbyAPI
OpenAI-API frontend for ExLlamaV2. Wraps the EXL2 inference engine in a clean HTTP API, adds streaming, batching, and OAI-compatible chat templates. The default front-of-house when you've already committed to the EXL2 quant format and want to expose it to clients that speak OpenAI.
Overview
OpenAI-API frontend for ExLlamaV2. Wraps the EXL2 inference engine in a clean HTTP API, adds streaming, batching, and OAI-compatible chat templates. The default front-of-house when you've already committed to the EXL2 quant format and want to expose it to clients that speak OpenAI.
Stack & relationships
How TabbyAPI relates to other entries in the catalog — recommended pairings, alternatives, dependencies, and edges to avoid. Each edge carries a one-line operator note from our editorial team.
Recommended stack
- Pairs withExLlamaV2
The canonical pairing for production-ish ExLlamaV2 serving. ExLlamaV2 is the engine; TabbyAPI is the front of house.
Alternatives
- Alternative toOllama
Both expose OpenAI-compatible APIs locally. TabbyAPI wins on raw single-card EXL2 speed for advanced users; Ollama wins on ergonomics and breadth of quant formats. Pick by quant commitment.
Depends on
- Depends onExLlamaV2
TabbyAPI is purely a frontend — it wraps ExLlamaV2 in an OpenAI-compatible HTTP API. No TabbyAPI without ExLlamaV2 installed underneath.
Pros
- Cleanest production wrapper for ExLlamaV2
- Streaming + batching + tool-call support
- Minimal operational footprint (single process)
Cons
- NVIDIA only (it's bound to ExLlamaV2)
- EXL2 quant only (no GGUF, no AWQ)
- Smaller community than vLLM / llama.cpp server mode
Compatibility
| Operating systems | Linux Windows macOS |
| GPU backends | NVIDIA CUDA |
| License | Open source · free (OSS, AGPL-3.0) |
Runtime health
Operator-grade signals on how actively TabbyAPI is being maintained, how fresh its measurements are, and what failure classes operators have flagged. Every label below is anchored to a real date or count — we never infer maintainer activity we can't show.
Release cadence
Derived from the most recent editorial signal on this row.
8 days since last refresh · source: lastUpdated
Benchmark freshness
How recent the editorial measurements on this runtime are.
No editorial benchmarks for this runtime yet.
Community reproduction
Submissions that match an editorial measurement on similar hardware.
No community reproductions on file yet.
Get TabbyAPI
Frequently asked
Is TabbyAPI free?
What operating systems does TabbyAPI support?
Which GPUs work with TabbyAPI?
Reviewed by RunLocalAI Editorial. See our editorial policy for how we evaluate tools.
Related — keep moving
Verify TabbyAPI runs on your specific hardware before committing money.