01. Beyond Basic Prompting
Basic prompting follows a straightforward pattern: provide an instruction, receive a response. A user asks for a haiku about recursion, the model responds. This works for exploratory, one-off tasks. Production systems require more.
Consider a task with measurement: extracting structured data from unstructured text. A basic prompt might request JSON output. The model complies—for most inputs. Edge cases break silently: nested structures flatten unexpectedly, dates format inconsistently, the model occasionally prepends explanatory text. These failures are invisible unless the output is validated programmatically.
Advanced prompting begins with explicit specification of the desired output format, often called "output formatting" or "constrained decoding hints." More importantly, it treats every interaction as a measurement opportunity. How often does the model produce valid JSON? How often does the output pass validation? This is the shift from "does it look right?" to "does it measure right?"
The second dimension beyond basic prompting is iteration awareness. The same prompt fed identical input twice may produce different outputs due to sampling temperature. Sophisticated prompting accounts for variance: running multiple samples or implementing self-consistency checks rather than assuming single-shot output is representative.
A third consideration: context management. Basic prompting ignores context window implications. Advanced prompting tracks token counts, manages truncation strategies, and makes deliberate decisions about what context to include based on task requirements.
import json
import ollama
def basic_extraction_prompt(text: str) -> dict:
"""Basic approach: works most of the time, fails silently."""
prompt = f"""Extract the person name and their role from this text:
{text}
Return JSON with 'name' and 'role' fields."""
response = ollama.generate(
model='llama3.2',
prompt=prompt
)
# No parsing error handling - invalid JSON causes downstream failure
return json.loads(response['response'])
def measured_extraction_prompt(text: str, max_retries: int = 3) -> dict:
"""Measured approach: validates output, handles failures."""
prompt = f"""Extract the person name and their role from this text.
Return ONLY valid JSON with 'name' (string) and 'role' (string) fields.
No markdown, no explanation, just JSON.
Text: {text}"""
for attempt in range(max_retries):
response = ollama.generate(
model='llama3.2',
prompt=prompt,
options={'temperature': 0.1} # Low temperature for reproducibility
)
try:
result = json.loads(response['response'].strip())
# Validate expected keys exist
if 'name' in result and 'role' in result:
return result
except json.JSONDecodeError:
continue
raise ValueError(f"Failed to extract valid JSON after {max_retries} attempts")
The measured version adds explicit requirements ("no markdown, no explanation"), temperature control for reproducibility, and structured retry logic with validation. This is minimal advanced prompting—the model still has full flexibility in how it generates the response.
Take a basic prompt from an existing workflow. Add output validation, measure the failure rate over 50 runs, and record whether the failures are silent (wrong format, silently ignored) or noisy (parse errors). Document the baseline before attempting improvements.