24. Prompt Version Control
Prompt version control applies software engineering practices—commits, branches, diffs, and rollback—to prompt engineering. Unlike code, prompts lack formal syntax for change tracking, requiring conventions to compensate.
classifier (active: v2.1)
v2.1 - 2025-05-28
- Removed "be concise" directive causing under-generation
- Added entity type enumeration before generation
- Test accuracy: 0.91 (up from 0.87 in v2.0)
- Edge case: legal IDs now extracted correctly (was missing hyphen handling)
v2.0 - 2025-05-15
- Changed from JSON to natural language output
- Rationale: User preference survey showed natural language preferred 3:1
- Test accuracy: 0.87 (down from 0.94 in v1.5)
- Edge case: Introduced regression on technical document extraction
v1.5 - 2025-04-30
- Added explicit date format handling (ISO 8601)
- Test accuracy: 0.94
- Note: Format constraint caused 12% latency increase
v1.0 - 2025-03-01
- Initial production prompt
- Test accuracy: 0.88 """
def commit_prompt(prompt_path, changelog_entry, test_results): """ Commit updated prompt with diff and metadata. """ import subprocess
# Generate diff against HEAD
diff = subprocess.check_output(
['git', 'diff', 'HEAD', prompt_path],
text=True
)
# Create archive copy with version marker
with open(f'prompts/archive/v{get_next_version()}_{prompt_path}', 'w') as f:
f.write(read_prompt(prompt_path))
# Stage changes
subprocess.run(['git', 'add', prompt_path, f'prompts/archive/'])
# Commit with structured message
commit_msg = f"""Update {prompt_path}
Version: {get_next_version()} Test accuracy: {test_results['avg_correctness']:.2f} Latency (ms): {test_results['avg_latency_ms']:.0f}
{changlog_entry}
Diff: {diff}"""
subprocess.run(['git', 'commit', '-m', commit_msg])
**Failure mode:** Prompt changes without corresponding test updates produce false confidence. A prompt updated with "improved instructions" but run against outdated test cases may report artificially high accuracy that reflects test set memorization, not genuine improvement.
```python
def audit_test_currency(test_set_path, prompt_path):
"""Verify test set is newer than most recent prompt update."""
import os
test_mtime = os.path.getmtime(test_set_path)
prompt_mtime = os.path.getmtime(prompt_path)
if prompt_mtime > test_mtime:
import warnings
warnings.warn(
f"Tests ({test_mtime}) are older than prompt ({prompt_mtime}). "
f"Results may not reflect current prompt behavior.",
UserWarning
)
return False
return True
Teams moving from ad-hoc prompt management to version-controlled prompts reported 60% reduction in production incidents caused by undocumented prompt changes. The 40% of incidents not prevented involved multi-variable interactions that still escaped single-prompt diff tracking.
Implement version control for the prompts in your prompt kit. Create a CHANGELOG, archive your current version, make one improvement with a corresponding test run, and commit with a structured message documenting the change and results.