Troubleshooting Runbook Project — Troubleshooting Local AI (Chapter 15)

Building Your Personal Runbook

A runbook documents your specific system's configuration, recurring problems, and their fixes. Generic documentation covers your hardware; your runbook covers your system.

Runbook Template

# System: [Hostname/Description]
## Hardware
- GPU: [Model, VRAM]
- RAM: [Total]
- OS: [Distribution, Kernel version]

## Common Problems

### Problem: Ollama returns "connection refused"
**Symptoms**: curl http://localhost:11434/api/tags fails
**Cause**: Ollama not running
**Fix**: 
```bash
sudo systemctl restart ollama
sudo systemctl status ollama

Problem: Model loads but inference is slow

Symptoms: <5 tokens/second on 7B model Cause: Running on CPU instead of GPU Fix:

## Verify GPU detection
python -c "import torch; print(torch.cuda.is_available())"
## Check environment variables
echo $CUDA_VISIBLE_DEVICES

Installation Notes

CUDA Version: 12.1
Driver Version: 535.154.05
Ollama Version: 0.1.26

Model Registry

Model	Size	Quantization	Location
Llama-2-7B	13B	Q4_K_M	/models/llama-2-7b-q4



## Completion Criteria

You have completed this course when you can:

- Run the full GPU diagnostic sequence and interpret each command's output
- Identify which system layer (hardware, driver, runtime, application) is responsible for any given error
- Fix the 10 most common local AI errors from memory rather than by searching
- Build a runbook that documents your specific system's configuration and recurring fixes
- Profile inference performance and identify the bottleneck (compute, memory bandwidth, or transfer)

These skills are not about memorizing error messages—they are about developing a mental model of how local AI systems stack, so diagnosing a new error takes minutes instead of hours.

Completion Criteria

You have completed this course when you can:

Run the full GPU diagnostic sequence and interpret each command's output

Identify which system layer (hardware, driver, runtime, application) is responsible for any given error

Fix the 10 most common local AI errors from memory rather than by searching

Build a runbook that documents your specific system's configuration and recurring fixes

Profile inference performance and identify the bottleneck (compute, memory bandwidth, or transfer)

These skills are not about memorizing error messages—they are about developing a mental model of how local AI systems stack, so diagnosing a new error takes minutes instead of hours.