12. Model-Specific Prompting: Llama
Llama models (including Llama 2, Llama 3, and variants) have specific behaviors that affect prompt engineering. Understanding these improves output quality.
System prompt handling: Llama models respond inconsistently to system prompts compared to chat-tuned variants. With base models, instructions in a "system" role may be ignored. Solution: put critical instructions in the user prompt, prefixed with clear delimiters.
INSTRUCTIONS: [your task]
INPUT: [your data]
Llama 3 instruction following: Llama 3 was trained with specialized instruction data and responds better to natural language prompts than earlier versions. Overly mechanical prompting ("TASK: X PARAM: Y") may reduce quality.
Context length considerations: Llama 3 variants support up to 128K tokens context, but performance degrades after ~32K tokens for complex reasoning. If you need long-context reasoning, chunk the input and aggregate results.
def process_long_document(document, chunk_size=8000):
chunks = split_into_chunks(document, chunk_size)
results = []
for i, chunk in enumerate(chunks):
prompt = f"""Analyze this chunk (part {i+1} of {len(chunks)}).
Task: Extract key claims and their supporting evidence.
Chunk: {chunk}
Output JSON with 'claims' array."""
response = model.generate(prompt, format="json")
results.append(parse(response))
# Aggregate across chunks
return aggregate_results(results)
Llama temperature behavior: Llama models show higher variance at temperature 0.7 compared to other families. For consistent output, use temperature 0.3-0.5 for most tasks. Reserve higher temperatures for creative tasks.
Prompt format preference:
- Llama 3: Natural language, minimal markup works best
- Llama 2: Structure helps, especially for extraction tasks
- CodeLlama: Accepts docstring-style prompts better than free-form
Common Llama failure modes:
Repetition loops: Llama may repeat phrases at context boundaries. If you see repetition, truncate context or add "Do not repeat information already stated."
Under-specified outputs: Llama may truncate structured output prematurely. Always include "Complete all fields" or "Return valid JSON with all required fields."
Instruction ignoring: System-level instructions may be overridden by user content. Use explicit role assignment in the main prompt.
You are an expert data analyst. Task: Extract structured information from text.
[Continue with task details]
Take a prompt that produces inconsistent results with Llama and rewrite it with clearer structure and explicit format requirements. Test across five runs and measure consistency.