How to install vLLM with pip
Python 3.10+, CUDA-compatible GPU, pip
What this does
Installs the vLLM inference engine into an existing Python environment, providing a pip package that enables high-throughput LLM serving on CUDA hardware. The result is a working vllm command-line entry point and Python API.
Steps
Create a clean virtual environment. Isolating vLLM prevents dependency conflicts with other packages.
python -m venv vllm-env source vllm-env/bin/activateExpected output: the prompt shows
(vllm-env).Upgrade pip and setuptools. Older setuptools can cause wheels to fail during build.
pip install --upgrade pip setuptools wheelExpected output:
Successfully installed pip-X.Y.Z setuptools-X.Y.Z wheel-X.Y.Z.Install vLLM stable release.
pip install vllmExpected output:
Successfully installed vllm-X.Y.Z. This step downloads pre-built CUDA wheels; expect 1-3 minutes on a fast connection.Verify installation.
python -c "import vllm; print(vllm.__version__)"Expected output: the version string, e.g.
0.8.5.
Verification
python -c "import vllm; print('vLLM version:', vllm.__version__)"
# Expected: vLLM version: 0.x.y
Common failures
cublas/ncclnot found — CUDA version mismatch. Ensure the CUDA toolkit version matches the driver. Runnvidia-smifirst, then install vLLM specifying the correct CUDA version.torchversion conflict — Another package pins an older PyTorch. Create a fresh virtual environment.- Out-of-memory during extension build — CUDA kernel compilation requires ~4 GB free RAM per GPU. Close other GPU processes before installing.
Permission deniedon pip writes — Use--userflag or a virtual environment instead of system-wide install.- pre-built wheel not available for this platform — Build from source or use the nightly wheel:
pip install vllm --pre.