07. Zero-Shot Classification

Chapter 7 of 18 · 15 min

Zero-shot classification enables models to categorize text into categories not seen during training. This capability distinguishes transformer-based models from traditional classifiers requiring category-specific training data. Zero-shot learning transforms classification into sequence-to-sequence generation by framing categories as translation targets.

The technique works by presenting text and candidate labels to a model trained with natural language supervision. During pretraining, models learn semantic relationships between text spans and descriptive phrases. Zero-shot inference exploits these learned associations without task-specific fine-tuning.

from transformers import pipeline

 classifier = pipeline(
    "zero-shot-classification",
    model="facebook/bart-large-mnli",
    device=-1  # CPU; use 0 for CUDA
)

candidate_labels = [
    "political news",
    "sports coverage",
    "technology review",
    "entertainment gossip",
    "financial reporting"
]

text = """The Supreme Court decision carries significant implications 
for technology companies operating in the health data sector, 
according to legal analysts specializing in privacy regulations."""

result = classifier(text, candidate_labels)
print(f"Top label: {result['labels'][0]}")
print(f"Confidence: {result['scores'][0]:.3f}")

Local deployment strategies for zero-shot classification leverage model families with strong instruction following capabilities. Llama variants and Mistral models with instruction tuning demonstrate competitive zero-shot performance on standard benchmarks when provided with well-structured classification prompts.

Template engineering significantly influences zero-shot classification performance. Labels presented as natural language phrases consistently outperform abstract category codes. Comparative formulations ("This is about X") sometimes outperform assertion formulations ("X, politics, or news").

Cross-lingual zero-shot classification extends the approach to languages without training data. Models trained on English text often generalize to other languages through multilingual representations, enabling classification infrastructure for international applications without language-specific models.

Local verification checkpoint

Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.

EXERCISE

Evaluate zero-shot classification performance across three prompt template variations. Measure accuracy, latency, and consistency on an annotated dataset spanning multiple domains.