What causes "vLLM install picks the wrong CUDA wheel"?

**Environment:** [vLLM](/tools/vllm) installed via pip onto a host whose driver/toolkit pair doesn't match the default wheel. **Severity: high** — vLLM won't import. - Pip's default index ships the latest-CUDA vLLM wheel; older drivers can't load it - Mixed conda env where libcudart was upgraded by another package - Custom-built PyTorch + pip vLLM landing on different CUDA majors - Air-gapped install where the cu118/cu121 wheel needed wasn't downloaded

How do you fix "vLLM install picks the wrong CUDA wheel"?

**1. Use the matching CUDA wheel index for your driver:** ```bash # Driver supports CUDA 12.1 pip install vllm --extra-index-url https://download.pytorch.org/whl/cu121 # Or for CUDA 11.8 (older H100/A100 fleets) pip install vllm==0.4.3 --extra-index-url https://download.pytorch.org/whl/cu118 ``` **2. Or install via the official vLLM Docker image** (zero CUDA config needed): ```bash docker run --gpus all -p 8000:8000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ vllm/vllm-openai:latest --model meta-llama/Llama-3.1-8B-Instruct ``` **3. Verify the wheel CUDA target after install:** ```bash python -c "import torch; print('torch cuda:', torch.version.cuda); \ import vllm; print('vllm version:', vllm.__version__)" ``` **4. Force-reinstall PyTorch first** so the toolkit version is pinned before vLLM picks it up: ```bash pip install --force-reinstall torch==2.4.0 \ --index-url https://download.pytorch.org/whl/cu121 pip install vllm ``` **5. Build vLLM from source** when no pre-built wheel matches your stack: ```bash git clone https://github.com/vllm-project/vllm cd vllm && pip install -e . --no-build-isolation ``` Build is 10-30 min; only do this if the wheel path is impossible.

vLLM install picks the wrong CUDA wheel — fix and explanation

Cause

Environment: vLLM installed via pip onto a host whose driver/toolkit pair doesn't match the default wheel.

Severity: high — vLLM won't import.

Pip's default index ships the latest-CUDA vLLM wheel; older drivers can't load it
Mixed conda env where libcudart was upgraded by another package
Custom-built PyTorch + pip vLLM landing on different CUDA majors
Air-gapped install where the cu118/cu121 wheel needed wasn't downloaded

Solution

1. Use the matching CUDA wheel index for your driver:

# Driver supports CUDA 12.1
pip install vllm --extra-index-url https://download.pytorch.org/whl/cu121
# Or for CUDA 11.8 (older H100/A100 fleets)
pip install vllm==0.4.3 --extra-index-url https://download.pytorch.org/whl/cu118

2. Or install via the official vLLM Docker image (zero CUDA config needed):

docker run --gpus all -p 8000:8000 \
  -v ~/.cache/huggingface:/root/.cache/huggingface \
  vllm/vllm-openai:latest --model meta-llama/Llama-3.1-8B-Instruct

3. Verify the wheel CUDA target after install:

python -c "import torch; print('torch cuda:', torch.version.cuda); \
import vllm; print('vllm version:', vllm.__version__)"

4. Force-reinstall PyTorch first so the toolkit version is pinned before vLLM picks it up:

pip install --force-reinstall torch==2.4.0 \
  --index-url https://download.pytorch.org/whl/cu121
pip install vllm

5. Build vLLM from source when no pre-built wheel matches your stack:

git clone https://github.com/vllm-project/vllm
cd vllm && pip install -e . --no-build-isolation

Build is 10-30 min; only do this if the wheel path is impossible.

vLLM install picks the wrong CUDA wheel

Cause

Solution

Related errors

Did this fix it?