RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
Glossary / Large language models / Few-Shot Prompting
Large language models

Few-Shot Prompting

Few-shot prompting is a technique where you include a small number of input-output examples in the prompt to guide the model's response, without updating model weights. The model uses these examples to infer the desired pattern or format. Operators encounter this when they want consistent output structure (e.g., JSON, bullet lists) or task-specific behavior without fine-tuning. The number of examples is typically 2-5; more than that may exceed context length or degrade performance.

Deeper dive

Few-shot prompting leverages in-context learning, where the model generalizes from examples provided in the prompt. It sits between zero-shot (no examples) and many-shot (many examples, sometimes hundreds). The examples act as a task specification: the model sees a pattern and continues it. For operators, few-shot is a practical way to steer outputs without retraining. However, it consumes context window space—each example adds tokens, reducing room for other instructions or user input. The quality of examples matters: they should be representative and diverse. If the model misinterprets the pattern, adjusting examples or adding explicit instructions can help. Some runtimes (e.g., vLLM, llama.cpp) support system prompts that can include few-shot examples, but the same token budget applies.

Practical example

An operator wants the model to extract names and dates from text into JSON. A few-shot prompt might include two examples: "Input: 'John was born on 1990-05-12.' Output: {"name": "John", "date": "1990-05-12"}" followed by the actual input. This costs about 30-40 tokens per example. On a 4K context window, using 5 examples leaves ~3.5K tokens for the actual task—enough for a few paragraphs.

Workflow example

In LM Studio, you can prepend few-shot examples in the user message or system prompt. For Ollama, you'd include them in the prompt string: ollama run llama3.2 'Translate English to French. Example: hello -> bonjour. Example: cat -> chat. Now translate: dog'. The model sees the pattern and outputs 'chien'. If the output is wrong, operators adjust examples or add formatting instructions.

Reviewed by Fredoline Eruo. See our editorial policy.

Buyer guides
  • Best GPU for local AI →
  • Best laptop for local AI →
  • Best Mac for local AI →
When it doesn't work
  • CUDA out of memory →
  • Ollama running slowly →
  • ROCm not detected →