04. Few-Shot Prompting
Chapter 4 of 25 · 15 min
Few-shot prompting provides examples within the prompt. Each example demonstrates input-output pairs, showing the model how to handle your specific task. This grounds the model's behavior in your desired pattern rather than relying on inference.
The structure is: instructions + several (input → output) pairs + new input. The examples must be representative of the full range of cases you expect.
Extract the company name, revenue, and fiscal year from financial text.
Example 1:
Input: "Apple reported $394.3 billion in revenue for FY2023."
Output: {"company": "Apple", "revenue": "$394.3 billion", "fiscal_year": "FY2023"}
Example 2:
Input: "Tesla generated $96.8B in annual sales during 2024."
Output: {"company": "Tesla", "revenue": "$96.8B", "fiscal_year": "2024"}
Example 3:
Input: "Microsoft's cloud segment contributed $85.2 billion to 2023 revenue."
Output: {"company": "Microsoft", "revenue": "$85.2 billion", "fiscal_year": "2023"}
Now extract from:
Input: "Nvidia posted $60.9 billion in revenue for fiscal year 2024."
The examples show format, field names, currency notation, and fiscal year naming conventions. The model applies this pattern to the new input.
Few-shot quality depends on example selection:
- Cover edge cases: Include examples of each category in classification tasks
- Use realistic inputs: Synthetic examples may not match real data distribution
- Maintain consistency: Varying format in examples teaches the model to improvise
- Limit to 5-10 examples: More examples rarely help and increase token cost
Failure modes include:
- Unrepresentative examples: If you only show positive cases for classification, the model may never output negative
- Conflicting patterns: If examples contradict each other, the model may output inconsistent results
- Overfitting to examples: The model may copy example structure exactly rather than the underlying pattern
# Testing few-shot consistency
outputs = []
for trial in range(5):
response = model.generate(prompt_with_3_examples)
outputs.append(parse_json(response))
# Check consistency
formats = [type(o) for o in outputs]
print(f"Output types: {set(formats)}") # Should be uniform
EXERCISE
Create a few-shot prompt for a task in your workflow. Test it with 10 diverse inputs and count how many produce correctly formatted output.