10. Model Provenance
Model provenance answers: where did this model come from, who trained it, and can you trust it? Malicious models carry risks of backdoors, data exfiltration, and unpredictable behavior. Verifying provenance prevents deploying compromised models.
What to verify:
Source verification confirms you downloaded the model from the expected source (official repository, not a mirror). Check SHA256 hashes published by the model creator.
Signature verification uses cryptographic signatures to confirm the model hasn't been tampered with. Some model hubs sign releases.
License review identifies legal constraints on model use. Some models prohibit commercial use, require attribution, or impose restrictions on output usage.
Build verification for self-hosted models confirms the build process hasn't been modified. Reproducible builds help here.
Verify model integrity:
# Download model and verify hash
wget https://ollama.ai/library/llama3:latest
sha256sum llama3.bin
# Compare against published hash
echo "expected: 8abb54b1e2... llama3.bin"
# If mismatch, do not load the model
# For signed models, verify signature
gpg --verify llama3.bin.sig llama3.bin
# Check that signature key matches known fingerprint
Document model inventory:
# model-inventory.yaml
models:
- name: llama3:8b
version: "2024-05-28"
source: https://ollama.ai/library/llama3
sha256: 8abb54b1e2...
license: llama3 community license
imported_at: "2024-05-28"
responsible: [email protected]
risk_level: medium
notes: Default chat model for internal support
- name: mistral:7b
version: "2024-04-15"
source: https://ollama.ai/library/mistral
sha256: 5b2b28c9f1...
license: Apache 2.0
imported_at: "2024-04-15"
responsible: [email protected]
risk_level: low
notes: Code analysis model
Dangerous patterns in model sources:
- Models from unknown GitHub accounts with few stars and recent activity
- Models with pre-set system prompts that instruct bypassing safety
- Models downloaded from IPFS without content verification
- Mirrors of official models with different file sizes
Create a model inventory document listing every model loaded in your local AI deployment. For each model, record: source URL, SHA256 hash (computed now), license, and last update date. Set a monthly reminder to verify hashes haven't changed.