Ethics, safety & society

XAI (Explainable AI)

Explainable AI (XAI) refers to methods that make the decisions of machine learning models understandable to humans. For local AI operators, XAI matters because black-box models can produce outputs without revealing why—useful for debugging, trust, and compliance. Techniques like attention visualization, feature attribution (e.g., SHAP, LIME), or logit inspection help operators see which input tokens or features influenced a response. In practice, XAI is less common in local inference runtimes (llama.cpp, Ollama) but appears in Hugging Face Transformers via model outputs like attention maps or integrated gradients.

Deeper dive

XAI encompasses a range of techniques that vary by model type and operator need. For transformer-based LLMs, attention weights provide a window into which tokens the model focused on when generating each output token—though attention is not a perfect explanation. Feature attribution methods like SHAP or LIME approximate model behavior by perturbing inputs and measuring output changes. For local operators, XAI is often limited by VRAM and latency: running SHAP on a 7B model requires multiple forward passes, which can be slow on consumer GPUs. Some runtimes (e.g., Hugging Face Transformers with output_attentions=True) expose raw attention matrices, but interpreting them requires additional tooling. XAI is more mature in computer vision (saliency maps) than in LLM text generation, where explanations remain an active research area.

Practical example

An operator running Llama 3.1 8B via Hugging Face Transformers can set output_attentions=True in the model config to get per-layer attention weights. For a prompt like 'The capital of France is', the attention maps show which tokens (e.g., 'capital', 'France') the model weighted most when predicting 'Paris'. On an RTX 4090, extracting attention for a 512-token sequence adds ~10% to inference time and ~2 GB VRAM overhead.

Workflow example

In Hugging Face Transformers, after loading a model with model.config.output_attentions = True, calling model.generate() returns a tuple including attentions—a list of tensors per layer. Operators can visualize these with matplotlib or bertviz. For SHAP, the shap Python library supports Hugging Face models: shap.Explainer(model, tokenizer) runs on CPU/GPU but may take minutes per explanation on a 7B model. In Ollama and llama.cpp, XAI features are not built-in; operators must use separate Python scripts to extract logits or attention.

Reviewed by Fredoline Eruo. See our editorial policy.