Prompt Version Control — Prompt Engineering Fundamentals (Chapter 24)

Prompt version control applies software engineering practices—commits, branches, diffs, and rollback—to prompt engineering. Unlike code, prompts lack formal syntax for change tracking, requiring conventions to compensate.

classifier (active: v2.1)

v2.1 - 2025-05-28

Removed "be concise" directive causing under-generation
Added entity type enumeration before generation
Test accuracy: 0.91 (up from 0.87 in v2.0)
Edge case: legal IDs now extracted correctly (was missing hyphen handling)

v2.0 - 2025-05-15

Changed from JSON to natural language output
Rationale: User preference survey showed natural language preferred 3:1
Test accuracy: 0.87 (down from 0.94 in v1.5)
Edge case: Introduced regression on technical document extraction

v1.5 - 2025-04-30

Added explicit date format handling (ISO 8601)
Test accuracy: 0.94
Note: Format constraint caused 12% latency increase

v1.0 - 2025-03-01

Initial production prompt
Test accuracy: 0.88 """

def commit_prompt(prompt_path, changelog_entry, test_results): """ Commit updated prompt with diff and metadata. """ import subprocess

# Generate diff against HEAD
diff = subprocess.check_output(
    ['git', 'diff', 'HEAD', prompt_path],
    text=True
)

# Create archive copy with version marker
with open(f'prompts/archive/v{get_next_version()}_{prompt_path}', 'w') as f:
    f.write(read_prompt(prompt_path))

# Stage changes
subprocess.run(['git', 'add', prompt_path, f'prompts/archive/'])

# Commit with structured message
commit_msg = f"""Update {prompt_path}

Version: {get_next_version()} Test accuracy: {test_results['avg_correctness']:.2f} Latency (ms): {test_results['avg_latency_ms']:.0f}

{changlog_entry}

Diff: {diff}"""

subprocess.run(['git', 'commit', '-m', commit_msg])


**Failure mode:** Prompt changes without corresponding test updates produce false confidence. A prompt updated with "improved instructions" but run against outdated test cases may report artificially high accuracy that reflects test set memorization, not genuine improvement.

```python
def audit_test_currency(test_set_path, prompt_path):
    """Verify test set is newer than most recent prompt update."""
    import os
    test_mtime = os.path.getmtime(test_set_path)
    prompt_mtime = os.path.getmtime(prompt_path)
    
    if prompt_mtime > test_mtime:
        import warnings
        warnings.warn(
            f"Tests ({test_mtime}) are older than prompt ({prompt_mtime}). "
            f"Results may not reflect current prompt behavior.",
            UserWarning
        )
        return False
    return True

Teams moving from ad-hoc prompt management to version-controlled prompts reported 60% reduction in production incidents caused by undocumented prompt changes. The 40% of incidents not prevented involved multi-variable interactions that still escaped single-prompt diff tracking.