RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Learn
  4. /Courses
  5. /Prompt Engineering Fundamentals
  6. /Ch. 16
Prompt Engineering Fundamentals

16. Self-Consistency

Chapter 16 of 25 · 15 min
KEY INSIGHT

The mechanism relies on answer agreement across independent reasoning paths, not on confidence calibration or metadata. ```python def self_consistency_query(model, problem, n_samples=5): """Generate multiple independent solutions and vote on answer.""" # Prompt each sample independently (different random seeds) samples = [] for i in range(n_samples): prompt = f"""Problem: {problem} Reason through this step by step. Show your reasoning. Your final answer should be clearly marked as: ANSWER: [your answer]""" response = model.generate(prompt, temperature=0.8, seed=i*42) samples.append(response) # Extract answers (simplified parsing) answers = [] for sample in samples: answer = extract_final_answer(sample) answers.append(answer) # Majority vote from collections import Counter vote_counts = Counter(answers) consensus_answer = vote_counts.most_common(1)[0][0] confidence = vote_counts.most_common(1)[0][1] / n_samples return consensus_answer, confidence, vote_counts ``` The temperature parameter controls stochasticity. Values below 0.3 produce near-identical samples, defeating the purpose. Values above 1.0 generate increasingly random output that loses solution validity. Verified optimal range: 0.6–0.9 for most models. **Failure mode:** Voting on answers without canonical format normalization produces false disagreements. The same mathematical answer may appear as "3", "three", "③", "=3". The voting mechanism counts these as distinct answers. ```python # Normalization step required before voting import re def normalize_answer(text): """Canonicalize answer formats before voting.""" # Remove punctuation text = re.sub(r'[^\w\s]', '', text) # Convert words to numbers where applicable num_words = { 'one': '1', 'two': '2', 'three': '3', 'first': '1', 'second': '2', 'third': '3' } text = text.lower() for word, num in num_words.items(): text = re.sub(rf'\b{word}\b', num, text) return text.strip() ``` Self-consistency with 20 samples improved accuracy on reasoning benchmarks by 4–9% over single-sample chain reasoning. The gain diminishes above 15 samples due to computational cost without proportional accuracy improvement.

Self-consistency prompting generates multiple solution paths for a single problem, then selects the most frequently occurring answer. The insight is that correct solutions converge while incorrect solutions diverge—even when they sound equally confident.

EXERCISE

Implement self-consistency querying for a code generation task. Generate 5 samples for each of 10 test cases, record the consensus answer and vote margin, then compare consensus accuracy against single-sample baseline accuracy.

← Chapter 15
Tree-of-Thought
Chapter 17 →
Prompt Chaining