RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Learn
  4. /Courses
  5. /Multi-Modal AI: Vision and Text
  6. /Ch. 2
Multi-Modal AI: Vision and Text

02. LLaVA Installation

Chapter 2 of 18 · 20 min
KEY INSIGHT

LLaVA requires compatible versions of CUDA, PyTorch, and transformer libraries. Version mismatches are the most common installation failure, so verify your environment before proceeding. LLaVA (Large Language and Vision Assistant) is a widely-used open-source multi-modal model. The recommended installation relies on the `llamafactory` package which provides a unified interface for running various multi-modal architectures. Ensure NVIDIA driver version 525+ and CUDA 11.8 or 12.1 before beginning. ```bash # Verify CUDA availability python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}'); print(f'CUDA version: {torch.version.cuda}')" ``` The installation sequence matters. Create a fresh virtual environment to avoid dependency conflicts: ```bash python -m venv llava-env source llava-env/bin/activate # Install PyTorch with CUDA support pip install torch==2.3.0 torchvision==0.18.0 --index-url https://download.pytorch.org/whl/cu118 # Install transformer libraries pip install transformers==4.40.0 pip install accelerate==0.28.0 pip install bitsandbytes==0.43.1 # Install LLaVA interfacing through llamafactory pip install llfactory ``` Model weight download uses significant disk space. The 7B LLaVA model requires approximately 14GB for the quantized variant and 40GB+ for full precision. Download in advance: ```bash # Create model directory mkdir -p models huggingface-cli download --repo-type model \ liuhaotian/llava-v1.6-mistral-7b --local-dir models/llava-v1.6-mistral-7b ``` Common failure modes include: - **Out of memory at import**: Reduce batch size to 1, enable CPU offloading - **CUDA not found**: Verify `LD_LIBRARY_PATH` includes CUDA lib directory - **Weight download timeout**: Use `--local-dir-use-symlinks False` for reliability After installation, run a sanity check: ```python from transformers import AutoProcessor, AutoModelForVision2Seq processor = AutoProcessor.from_pretrained("liuhaotian/llava-v1.6-mistral-7b") model = AutoModelForVision2Seq.from_pretrained("liuhaotian/llava-v1.6-mistral-7b") print("LLaVA loaded successfully") ```

Local verification checkpoint

Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.

Local verification checkpoint

Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.

EXERCISE

Run the sanity check with your hardware configuration. Record the model loading time and peak memory usage. If errors occur, diagnose and resolve before continuing.

← Chapter 1
Multi-Modal Models Overview
Chapter 3 →
BakLLaVA Setup