Learning paradigms

Few-Shot Learning

Few-shot learning is a technique where a model performs a task after seeing only a small number of examples (typically 2–5) in the prompt, without any fine-tuning or weight updates. In local AI, this is the default mode for large language models: you provide a few input-output pairs in the system or user message, and the model infers the pattern to apply to new queries. The model's pre-trained knowledge is used directly, so no training loop or VRAM overhead for gradient storage is needed. The key operator concern is that few-shot performance depends heavily on prompt formatting and example selection, not on model size alone.

Deeper dive

Few-shot learning contrasts with zero-shot (no examples) and fine-tuning (weight updates). In practice, operators use few-shot by including 2–5 examples in the prompt. For example, to classify sentiment, you might write: 'Review: "Great movie" -> Positive; Review: "Terrible film" -> Negative; Review: "Decent watch" ->' and let the model complete. The model's ability to do this comes from its training on vast text corpora where patterns like this appear. For local models, few-shot is the most common workflow because it avoids the time and VRAM cost of fine-tuning. However, context window length limits how many examples fit; with a 4K context, 5 examples of 200 tokens each consume 1K tokens, leaving room for the actual query. Larger models (e.g., 70B) tend to follow few-shot patterns more reliably than smaller ones (e.g., 7B), but at higher VRAM cost.

Practical example

An operator wants to extract names from text. They craft a prompt: 'Extract names: Input: "John went to the store." Output: John. Input: "Alice and Bob met." Output: Alice, Bob. Input: "Dr. Smith called." Output:' and run it through llama.cpp with Llama 3.1 8B at Q4_K_M (~5 GB VRAM). The model outputs 'Dr. Smith'. This is few-shot learning: two examples given, no training needed. If the model fails, the operator might add a third example or rephrase the pattern.

Workflow example

In Ollama, an operator runs ollama run llama3.1:8b and pastes a few-shot prompt directly. In LM Studio, they load a model, set the system prompt to include examples, then chat. In Hugging Face Transformers, they use pipeline('text-generation', model='meta-llama/Llama-3.1-8B') and pass a formatted string with examples. The operator tunes the number of examples based on context window limits and observed accuracy—too few may under-specify the task, too many may exceed the context or dilute the query's relevance.

Reviewed by Fredoline Eruo. See our editorial policy.