02. Installation Failures

Chapter 2 of 15 · 20 min

Pip Install Failures

The most common pip failure in local AI contexts is missing build dependencies. Packages like transformers or llama-cpp-python require compilation, which requires system libraries that pip cannot install.

# Common fix for missing编译 dependencies on Ubuntu/Debian
sudo apt-get update
sudo apt-get install -y build-essential cmake git libpq-dev libopenblas-dev

If the package still fails after installing build dependencies, check the specific error. Wheels (pre-built binaries) fail less often than source builds.

# Check if a wheel is available for your platform
pip download transformers --platform linux_x86_64 --only-binary :all:

CUDA Version Mismatches

CUDA version mismatches cause silent failures—you install packages that appear to work but fail at runtime with cryptic import errors.

# Check installed CUDA version
nvcc --version
# Check runtime CUDA version
cat /usr/local/cuda/version.txt
# Check PyTorch's expected CUDA version
python -c "import torch; print(torch.version.cuda)"

These three must match. PyTorch built for CUDA 11.8 does not run on a CUDA 12.1 system, even if nvcc --version shows 12.1.

Docker Installation Failures

# Verify Docker has GPU access
docker run --rm --gpus all nvidia/cuda:12.0.0-base-ubuntu22.04 nvidia-smi

If nvidia-smi fails inside the container but works outside it, NVIDIA Container Toolkit is not installed correctly.

# Reinstall NVIDIA Container Toolkit
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \
  sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker

Local verification checkpoint

Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.

EXERCISE

Attempt a fresh installation of llama-cpp-python with CMAKE_ARGS="-DGGML_CUDA=ON" pip install llama-cpp-python. If it fails, document the exact error and identify which layer (system dependency, CUDA mismatch, or compilation) caused the failure.