02. LLaVA Installation

Chapter 2 of 18 · 20 min

KEY INSIGHT

LLaVA requires compatible versions of CUDA, PyTorch, and transformer libraries. Version mismatches are the most common installation failure, so verify your environment before proceeding. LLaVA (Large Language and Vision Assistant) is a widely-used open-source multi-modal model. The recommended installation relies on the `llamafactory` package which provides a unified interface for running various multi-modal architectures. Ensure NVIDIA driver version 525+ and CUDA 11.8 or 12.1 before beginning. ```bash # Verify CUDA availability python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}'); print(f'CUDA version: {torch.version.cuda}')" ``` The installation sequence matters. Create a fresh virtual environment to avoid dependency conflicts: ```bash python -m venv llava-env source llava-env/bin/activate # Install PyTorch with CUDA support pip install torch==2.3.0 torchvision==0.18.0 --index-url https://download.pytorch.org/whl/cu118 # Install transformer libraries pip install transformers==4.40.0 pip install accelerate==0.28.0 pip install bitsandbytes==0.43.1 # Install LLaVA interfacing through llamafactory pip install llfactory ``` Model weight download uses significant disk space. The 7B LLaVA model requires approximately 14GB for the quantized variant and 40GB+ for full precision. Download in advance: ```bash # Create model directory mkdir -p models huggingface-cli download --repo-type model \ liuhaotian/llava-v1.6-mistral-7b --local-dir models/llava-v1.6-mistral-7b ``` Common failure modes include: - **Out of memory at import**: Reduce batch size to 1, enable CPU offloading - **CUDA not found**: Verify `LD_LIBRARY_PATH` includes CUDA lib directory - **Weight download timeout**: Use `--local-dir-use-symlinks False` for reliability After installation, run a sanity check: ```python from transformers import AutoProcessor, AutoModelForVision2Seq processor = AutoProcessor.from_pretrained("liuhaotian/llava-v1.6-mistral-7b") model = AutoModelForVision2Seq.from_pretrained("liuhaotian/llava-v1.6-mistral-7b") print("LLaVA loaded successfully") ```

Local verification checkpoint

Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.

Local verification checkpoint

EXERCISE

Run the sanity check with your hardware configuration. Record the model loading time and peak memory usage. If errors occur, diagnose and resolve before continuing.