NVIDIA TensorRT-LLM

NVIDIA's optimized inference path for Hopper, Ada, and Blackwell. Compile your model once, serve at peak hardware speed.

By Fredoline Eruo·Last verified May 6, 2026·12,000 GitHub stars

Overview

NVIDIA's optimized inference path for Hopper, Ada, and Blackwell. Compile your model once, serve at peak hardware speed.

Yes — NVIDIA TensorRT-LLM is free to download and use and open-source under a permissive license.

NVIDIA TensorRT-LLM supports Linux, Windows.

NVIDIA TensorRT-LLM supports NVIDIA CUDA. CPU-only inference is also possible but slow.

Reviewed by RunLocalAI Editorial. See our editorial policy for how we evaluate tools.