02. Threat Taxonomy
Effective defense requires understanding what you're defending against. This chapter establishes a taxonomy of threats to local AI systems, enabling systematic security assessment.
Threat Categories
AI threats fall into four primary categories, each with distinct characteristics and mitigations.
Model Extraction occurs when attackers query a model to reconstruct its functionality or approximate its weights. Local models face this risk through repeated API calls that map input-output relationships. The attacker builds a substitute model or gains capabilities exceeding intended access levels.
Prompt Injection embeds malicious instructions within inputs that the model executes, overriding its system prompt or intended behavior. This threat is particularly relevant for local deployments processing external inputs—emails, documents, user messages.
Jailbreaking explicitly attempts to bypass safety measures, forcing models to produce outputs their designers restricted. Local models may lack dependable safety layers, creating vulnerabilities.
Data Poisoning compromises training data to influence model behavior. For local deployments, this applies when fine-tuning on potentially adversarial inputs or processing user-generated content that enters training loops.
Attack Surface Analysis
Each local AI deployment has an attack surface determined by its access patterns. Consider a document analysis system:
# Sample attack surface for document-processing AI
# Input: User-uploaded documents
# Processing: Local model inference
# Output: Summaries, extractions, analysis
class DocumentProcessor:
def __init__(self, model_path):
self.model = load_model(model_path)
self.system_prompt = "Analyze documents professionally."
def process(self, uploaded_file):
# Path 1: Direct file content injection
content = uploaded_file.read()
# Path 2: Filename metadata embedding
filename = uploaded_file.filename
# Path 3: Metadata headers
metadata = uploaded_file.metadata
# All three paths feed into model
prompt = f"{self.system_prompt}\n\nContent: {content}"
return self.model.generate(prompt)
Attackers exploit multiple input vectors simultaneously. A poisoned filename might contain instructions; extracted metadata might bypass filters.
Severity and Likelihood Framework
Not all threats require equal investment. A severity-likelihood matrix guides prioritization:
| Threat | Severity | Likelihood | Priority |
|---|---|---|---|
| Model theft via API abuse | High | Medium | Critical |
| Malicious document injection | High | Low | High |
| Prompt injection via web scraper | Medium | High | High |
| Accidental PII leakage | Medium | Medium | Medium |
Understanding threat taxonomy enables operators to allocate defensive resources appropriately.
For a local AI assistant handling internal company queries, identify the top five threat vectors and classify each by category. Explain your prioritization reasoning.