runner
Open source
free
4.4/5
ExLlamaV2
GPU-only inference library optimized for consumer NVIDIA cards. Fastest tokens-per-second on a single 24GB card for 30B models in EXL2 quant.
Overview
GPU-only inference library optimized for consumer NVIDIA cards. Fastest tokens-per-second on a single 24GB card for 30B models in EXL2 quant.
Pros
- Top single-card NVIDIA speed
- Custom EXL2 quant format
- Tight memory usage
Cons
- NVIDIA only
- EXL2 ecosystem narrower than GGUF
Compatibility
| Operating systems | Linux Windows |
| GPU backends | NVIDIA CUDA |
| License | Open source · free |
Get ExLlamaV2
Frequently asked
Is ExLlamaV2 free?
Yes — ExLlamaV2 is free to download and use and open-source under a permissive license.
What operating systems does ExLlamaV2 support?
ExLlamaV2 supports Linux, Windows.
Which GPUs work with ExLlamaV2?
ExLlamaV2 supports NVIDIA CUDA. CPU-only inference is also possible but slow.
Reviewed by RunLocalAI Editorial. See our editorial policy for how we evaluate tools.