21. Automated Optimization

Chapter 21 of 25 · 20 min

KEY INSIGHT

Automated optimization works when evaluation is fast and objective—optimizing for style or naturalness requires human feedback that automation cannot replicate. ```python class AutomatedPromptOptimizer: def __init__(self, base_prompt, model, evaluator): self.model = model self.evaluator = evaluator self.base_prompt = base_prompt self.history = [] def generate_variant(self, current_prompt, feedback): """Generate improved variant based on evaluator feedback.""" improvement_prompt = f"""Given this prompt: {current_prompt} --- And this evaluation feedback: --- {feedback} --- Generate an improved version of the prompt that addresses the feedback. Changes should be specific, not vague rewording. Output only the new prompt, no explanation.""" variant = self.model.generate(improvement_prompt) return variant def optimize(self, test_cases, max_iterations=10, threshold=0.95): """ Iterative optimization loop. Returns best prompt when threshold met or iterations exhausted. """ current_prompt = self.base_prompt best_score = 0 for iteration in range(max_iterations): # Evaluate current state scores = self.evaluator.evaluate(current_prompt, test_cases) current_score = scores['avg_correctness'] self.history.append({ 'iteration': iteration, 'prompt': current_prompt, 'score': current_score }) if current_score >= threshold: print(f"Threshold reached at iteration {iteration}") return current_prompt, self.history # Generate feedback for improvement feedback = self.evaluator.detailed_feedback(current_prompt, test_cases) # Check for score stagnation if iteration > 2 and self.history[-1]['score'] == self.history[-2]['score']: feedback += " Consider structural changes, not rewording." # Generate and test variant variant = self.generate_variant(current_prompt, feedback) variant_score = self.evaluator.evaluate(variant, test_cases)['avg_correctness'] # Accept improvement, keep current on regression if variant_score > current_score: current_prompt = variant best_score = variant_score else: self.history.append({ 'iteration': iteration, 'prompt': f"<REJECTED: score={variant_score}>", 'score': variant_score }) return current_prompt, self.history ``` **Failure mode:** Optimization converges to local maxima that exploit evaluation blind spots. A prompt that includes test case answers as hints within instructions will score 100% on evaluation while failing on unseen inputs. Countermeasure: held-out test cases not used during optimization. ```python def split_test_cases(all_cases, holdout_ratio=0.2): """Reserve test cases for final validation only.""" import random random.shuffle (all_cases) split_point = int(len(all_cases) * (1 - holdout_ratio)) return { 'development': all_cases[:split_point], 'holdout': all_cases[split_point:] } # Optimization uses only development set # Final report shows scores on both sets # Discrepancy > 10% indicates evaluation exploitation ``` Automated optimization typically yields 5–15% improvement over manually-written baseline prompts within 10 iterations. Gains plateau after 15 iterations in most cases—additional iterations rarely produce proportional improvement.

Automated prompt optimization uses meta-prompting to improve prompts without manual iteration. The system generates prompt variants, evaluates them against test cases, and selects improvements iteratively.

EXERCISE

Implement automated prompt optimization for a classification task. Split test cases with 20% holdout. Run 10 iterations of optimization and report improvement on both development and holdout sets. Document any evaluation exploitation discrepancies.