Advanced Chain-of-Thought — Advanced Prompt Engineering (Chapter 2)

Chain-of-thought (CoT) prompting was introduced to address a fundamental limitation: models asked to reason step-by-step perform better on multi-step problems than models asked to produce direct answers. This observation holds across model sizes and types, but basic CoT—asking the model to "think step by step"—is a starting point, not an endpoint.

The failure mode of basic CoT is unanchored reasoning. The model generates plausible-sounding intermediate steps that don't actually connect to the final answer. For computation-heavy tasks, this produces confident wrong answers. The model appears to reason but the reasoning is self-referential rather than grounded in the problem.

Advanced CoT variants address this anchoring problem. "Let me work through this carefully" produces better results than "think step by step" because it implies verification rather than generation. The distinction matters: asking for verification changes the model's behavior from "generate reasoning" to "check reasoning as you generate it."

Another advanced pattern: explicitly separating analysis from synthesis. The prompt requests that the model first identify relevant facts, then apply reasoning to those facts, then combine into a final answer. This separation gives the model discrete stages that it can self-verify more easily than continuous reasoning.

SYSTEM_PROMPT = """You are a precise analyst. Follow this exact format:

## ANALYSIS
List the specific facts from the input that are relevant to the question.
Number each fact. Do not add facts not present in the input.

## REASONING
Apply logic to derive the answer from the facts listed above.
Reference each fact by number when using it.

## CONCLUSION
State the final answer based only on the REASONING section.
"""

def structured_cot_query(question: str, context: str) -> str:
    """Chain-of-thought with explicit stage separation."""
    full_prompt = f"""{SYSTEM_PROMPT}

Question: {question}

Context:
{context}

Format your response according to the required structure."""

    response = ollama.generate(
        model='llama3.2',
        prompt=full_prompt,
        system=SYSTEM_PROMPT  # Ollama handles this via system parameter
    )
    
    return response['response']

A more effective variant uses "least-to-most" prompting: the model first determines what information is needed to answer, then requests that information, then produces the answer. This is particularly effective for compound questions where the user assumes implicit knowledge the model might not have.

The anchoring problem in CoT becomes severe with larger contexts. When the input contains significant irrelevant information, the model's reasoning steps reference both relevant and irrelevant content. Advanced CoT prompts explicitly instruct the model to flag when information is missing or contradictory rather than guessing.