Model-Specific Prompting: Llama — Prompt Engineering Fundamentals (Chapter 12)

Llama models (including Llama 2, Llama 3, and variants) have specific behaviors that affect prompt engineering. Understanding these improves output quality.

System prompt handling: Llama models respond inconsistently to system prompts compared to chat-tuned variants. With base models, instructions in a "system" role may be ignored. Solution: put critical instructions in the user prompt, prefixed with clear delimiters.

INSTRUCTIONS: [your task]
INPUT: [your data]

Llama 3 instruction following: Llama 3 was trained with specialized instruction data and responds better to natural language prompts than earlier versions. Overly mechanical prompting ("TASK: X PARAM: Y") may reduce quality.

Context length considerations: Llama 3 variants support up to 128K tokens context, but performance degrades after ~32K tokens for complex reasoning. If you need long-context reasoning, chunk the input and aggregate results.

def process_long_document(document, chunk_size=8000):
    chunks = split_into_chunks(document, chunk_size)
    results = []
    
    for i, chunk in enumerate(chunks):
        prompt = f"""Analyze this chunk (part {i+1} of {len(chunks)}).

Task: Extract key claims and their supporting evidence.

Chunk: {chunk}

Output JSON with 'claims' array."""
        
        response = model.generate(prompt, format="json")
        results.append(parse(response))
    
    # Aggregate across chunks
    return aggregate_results(results)

Llama temperature behavior: Llama models show higher variance at temperature 0.7 compared to other families. For consistent output, use temperature 0.3-0.5 for most tasks. Reserve higher temperatures for creative tasks.

Prompt format preference:

Llama 3: Natural language, minimal markup works best
Llama 2: Structure helps, especially for extraction tasks
CodeLlama: Accepts docstring-style prompts better than free-form

Common Llama failure modes:

Repetition loops: Llama may repeat phrases at context boundaries. If you see repetition, truncate context or add "Do not repeat information already stated."
Under-specified outputs: Llama may truncate structured output prematurely. Always include "Complete all fields" or "Return valid JSON with all required fields."
Instruction ignoring: System-level instructions may be overridden by user content. Use explicit role assignment in the main prompt.

You are an expert data analyst. Task: Extract structured information from text.

[Continue with task details]