What this does

Installs the vLLM inference engine into an existing Python environment, providing a pip package that enables high-throughput LLM serving on CUDA hardware. The result is a working vllm command-line entry point and Python API.

Steps

Create a clean virtual environment. Isolating vLLM prevents dependency conflicts with other packages.
```
python -m venv vllm-env
source vllm-env/bin/activate
```
Expected output: the prompt shows (vllm-env).
Upgrade pip and setuptools. Older setuptools can cause wheels to fail during build.
```
pip install --upgrade pip setuptools wheel
```
Expected output: Successfully installed pip-X.Y.Z setuptools-X.Y.Z wheel-X.Y.Z.
Install vLLM stable release.
```
pip install vllm
```
Expected output: Successfully installed vllm-X.Y.Z. This step downloads pre-built CUDA wheels; expect 1-3 minutes on a fast connection.
Verify installation.
```
python -c "import vllm; print(vllm.__version__)"
```
Expected output: the version string, e.g. 0.8.5.

Verification

python -c "import vllm; print('vLLM version:', vllm.__version__)"
# Expected: vLLM version: 0.x.y

Common failures

cublas / nccl not found — CUDA version mismatch. Ensure the CUDA toolkit version matches the driver. Run nvidia-smi first, then install vLLM specifying the correct CUDA version.
torch version conflict — Another package pins an older PyTorch. Create a fresh virtual environment.
Out-of-memory during extension build — CUDA kernel compilation requires ~4 GB free RAM per GPU. Close other GPU processes before installing.
Permission denied on pip writes — Use --user flag or a virtual environment instead of system-wide install.
pre-built wheel not available for this platform — Build from source or use the nightly wheel: pip install vllm --pre.

How to install vLLM with pip

What this does

Steps

Verification

Common failures

Related guides