RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
Glossary / Large language models / In-Context Learning
Large language models

In-Context Learning

In-context learning (ICL) is a capability of large language models where the model adapts its behavior based solely on examples or instructions provided in the input prompt, without updating its weights. Operators encounter ICL when they include a few examples of desired input-output pairs (few-shot) or a task description (zero-shot) in the prompt. The model uses its attention mechanism to infer the pattern from the context and apply it to new queries. ICL is distinct from fine-tuning, which modifies model weights. It is a key reason operators can adapt a single model to many tasks without retraining, but it consumes context window tokens and may be less reliable than fine-tuning for complex tasks.

Deeper dive

In-context learning works because transformer models process all tokens in the context window simultaneously through self-attention. When examples are placed in the prompt, the model's attention heads learn to associate patterns (e.g., 'sentiment: positive' after a movie review) and apply that mapping to the final query. The number of examples (shots) and their ordering significantly affect performance. ICL is sensitive to prompt formatting, example quality, and model size—larger models tend to perform ICL more reliably. Operators can use ICL for quick prototyping, data labeling, or task switching without retraining. However, ICL consumes context tokens, so long examples reduce the space available for the actual task. It also does not guarantee consistent performance across diverse inputs, and the model may overfit to spurious patterns in the examples. For mission-critical tasks, fine-tuning or RLHF is often preferred over ICL.

Practical example

An operator wants to classify customer emails as 'urgent' or 'normal'. Instead of fine-tuning a model, they craft a prompt with three examples: 'Email: "Server down!" -> urgent', 'Email: "Password reset request" -> normal', 'Email: "Meeting rescheduled" -> normal'. Then they append the new email: 'Email: "Billing issue, payment failed" ->'. The model outputs 'urgent' based on the pattern. This few-shot ICL works on a 7B model running on an RTX 3060 12 GB at ~20 tok/s, but consumes ~200 tokens of context for the examples.

Workflow example

In LM Studio or Ollama, an operator loads a model (e.g., Llama 3.1 8B) and types a prompt with few-shot examples directly in the chat interface. In llama.cpp, they run ./main -m model.gguf -p "Translate English to French: hello -> bonjour, goodbye -> au revoir, cat ->" to see ICL in action. In Hugging Face Transformers, they set tokenizer.apply_chat_template with a list of example messages. The operator must ensure the total prompt length (examples + query) fits within the context window; if it exceeds, the model may truncate or ignore later examples.

Reviewed by Fredoline Eruo. See our editorial policy.

Buyer guides
  • Best GPU for local AI →
  • Best laptop for local AI →
  • Best Mac for local AI →
When it doesn't work
  • CUDA out of memory →
  • Ollama running slowly →
  • ROCm not detected →