server
Open source
free
4.3/5

NVIDIA TensorRT-LLM

NVIDIA's optimized inference path for Hopper, Ada, and Blackwell. Compile your model once, serve at peak hardware speed.

By Fredoline Eruo·Last verified May 6, 2026·12,000 GitHub stars

Overview

NVIDIA's optimized inference path for Hopper, Ada, and Blackwell. Compile your model once, serve at peak hardware speed.

Pros

  • Peak NVIDIA hardware utilization
  • FP8 / FP4 acceleration on Blackwell

Cons

  • NVIDIA only
  • Compilation step is heavy

Compatibility

Operating systems
Linux
Windows
GPU backends
NVIDIA CUDA
LicenseOpen source · free

Get NVIDIA TensorRT-LLM

Frequently asked

Is NVIDIA TensorRT-LLM free?

Yes — NVIDIA TensorRT-LLM is free to download and use and open-source under a permissive license.

What operating systems does NVIDIA TensorRT-LLM support?

NVIDIA TensorRT-LLM supports Linux, Windows.

Which GPUs work with NVIDIA TensorRT-LLM?

NVIDIA TensorRT-LLM supports NVIDIA CUDA. CPU-only inference is also possible but slow.

Reviewed by RunLocalAI Editorial. See our editorial policy for how we evaluate tools.