server
Open source
free (OSS, AGPL-3.0)

TabbyAPI

OpenAI-API frontend for ExLlamaV2. Wraps the EXL2 inference engine in a clean HTTP API, adds streaming, batching, and OAI-compatible chat templates. The default front-of-house when you've already committed to the EXL2 quant format and want to expose it to clients that speak OpenAI.

By Fredoline Eruo·Last verified May 6, 2026·1,500 GitHub stars

Overview

OpenAI-API frontend for ExLlamaV2. Wraps the EXL2 inference engine in a clean HTTP API, adds streaming, batching, and OAI-compatible chat templates. The default front-of-house when you've already committed to the EXL2 quant format and want to expose it to clients that speak OpenAI.

Stack & relationships

How TabbyAPI relates to other entries in the catalog — recommended pairings, alternatives, dependencies, and edges to avoid. Each edge carries a one-line operator note from our editorial team.

TabbyAPI ↔ ecosystem

Recommended stack

  • Pairs with
    ExLlamaV2

    The canonical pairing for production-ish ExLlamaV2 serving. ExLlamaV2 is the engine; TabbyAPI is the front of house.

Alternatives

  • Alternative to
    Ollama

    Both expose OpenAI-compatible APIs locally. TabbyAPI wins on raw single-card EXL2 speed for advanced users; Ollama wins on ergonomics and breadth of quant formats. Pick by quant commitment.

Depends on

  • Depends on
    ExLlamaV2

    TabbyAPI is purely a frontend — it wraps ExLlamaV2 in an OpenAI-compatible HTTP API. No TabbyAPI without ExLlamaV2 installed underneath.

Pros

  • Cleanest production wrapper for ExLlamaV2
  • Streaming + batching + tool-call support
  • Minimal operational footprint (single process)

Cons

  • NVIDIA only (it's bound to ExLlamaV2)
  • EXL2 quant only (no GGUF, no AWQ)
  • Smaller community than vLLM / llama.cpp server mode

Compatibility

Operating systems
Linux
Windows
macOS
GPU backends
NVIDIA CUDA
LicenseOpen source · free (OSS, AGPL-3.0)

Get TabbyAPI

Frequently asked

Is TabbyAPI free?

TabbyAPI has a paid tier (free (OSS, AGPL-3.0)). Check the pricing page for current terms.

What operating systems does TabbyAPI support?

TabbyAPI supports Linux, Windows, macOS.

Which GPUs work with TabbyAPI?

TabbyAPI supports NVIDIA CUDA. CPU-only inference is also possible but slow.

Reviewed by RunLocalAI Editorial. See our editorial policy for how we evaluate tools.