RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Learn
  4. /How-to
  5. /How to install vLLM with pip
HOW-TO · SET

How to install vLLM with pip

intermediate·15 min·By Fredoline Eruo
Target environment
Ubuntu 24.04 · Ollama 0.4.xWindows 11 · Ollama 0.4.xmacOS 15 · Ollama 0.4.x
PREREQUISITES

Python 3.10+, CUDA-compatible GPU, pip

What this does

Installs the vLLM inference engine into an existing Python environment, providing a pip package that enables high-throughput LLM serving on CUDA hardware. The result is a working vllm command-line entry point and Python API.

Steps

  1. Create a clean virtual environment. Isolating vLLM prevents dependency conflicts with other packages.

    python -m venv vllm-env
    source vllm-env/bin/activate
    

    Expected output: the prompt shows (vllm-env).

  2. Upgrade pip and setuptools. Older setuptools can cause wheels to fail during build.

    pip install --upgrade pip setuptools wheel
    

    Expected output: Successfully installed pip-X.Y.Z setuptools-X.Y.Z wheel-X.Y.Z.

  3. Install vLLM stable release.

    pip install vllm
    

    Expected output: Successfully installed vllm-X.Y.Z. This step downloads pre-built CUDA wheels; expect 1-3 minutes on a fast connection.

  4. Verify installation.

    python -c "import vllm; print(vllm.__version__)"
    

    Expected output: the version string, e.g. 0.8.5.

Verification

python -c "import vllm; print('vLLM version:', vllm.__version__)"
# Expected: vLLM version: 0.x.y

Common failures

  • cublas / nccl not found — CUDA version mismatch. Ensure the CUDA toolkit version matches the driver. Run nvidia-smi first, then install vLLM specifying the correct CUDA version.
  • torch version conflict — Another package pins an older PyTorch. Create a fresh virtual environment.
  • Out-of-memory during extension build — CUDA kernel compilation requires ~4 GB free RAM per GPU. Close other GPU processes before installing.
  • Permission denied on pip writes — Use --user flag or a virtual environment instead of system-wide install.
  • pre-built wheel not available for this platform — Build from source or use the nightly wheel: pip install vllm --pre.

Related guides

  • How to run vLLM with a HuggingFace model
  • How to configure vLLM GPU memory allocation
  • Course Local AI Fundamentals
RELATED GUIDES
SET
How to run vLLM with a HuggingFace model
SET
How to configure vLLM GPU memory allocation
← All how-to guidesCourses →