RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Learn
  4. /Courses
  5. /Prompt Engineering Fundamentals
  6. /Ch. 24
Prompt Engineering Fundamentals

24. Prompt Version Control

Chapter 24 of 25 · 20 min
KEY INSIGHT

Prompt version control fails without change attribution. Each modification requires a rationale in the commit message, otherwise the version history becomes unreadable. ```python # Version control structure for prompts # prompts/ # ├── archive/ # │ ├── v1.2_classifier_stable.txt # │ ├── v1.3_classifier_added_constraints.txt # │ └── v1.5_classifier_reverted_format.txt # ├── active/ # │ ├── classifier.txt # │ └── summarizer.txt # └── CHANGELOG.md # CHANGELOG.md format for prompts """ # Prompt Changelog ## classifier (active: v2.1) ### v2.1 - 2025-05-28 - Removed "be concise" directive causing under-generation - Added entity type enumeration before generation - Test accuracy: 0.91 (up from 0.87 in v2.0) - Edge case: legal IDs now extracted correctly (was missing hyphen handling) ### v2.0 - 2025-05-15 - Changed from JSON to natural language output - Rationale: User preference survey showed natural language preferred 3:1 - Test accuracy: 0.87 (down from 0.94 in v1.5) - Edge case: Introduced regression on technical document extraction ### v1.5 - 2025-04-30 - Added explicit date format handling (ISO 8601) - Test accuracy: 0.94 - Note: Format constraint caused 12% latency increase ### v1.0 - 2025-03-01 - Initial production prompt - Test accuracy: 0.88 """ def commit_prompt(prompt_path, changelog_entry, test_results): """ Commit updated prompt with diff and metadata. """ import subprocess # Generate diff against HEAD diff = subprocess.check_output( ['git', 'diff', 'HEAD', prompt_path], text=True ) # Create archive copy with version marker with open(f'prompts/archive/v{get_next_version()}_{prompt_path}', 'w') as f: f.write(read_prompt(prompt_path)) # Stage changes subprocess.run(['git', 'add', prompt_path, f'prompts/archive/']) # Commit with structured message commit_msg = f"""Update {prompt_path} Version: {get_next_version()} Test accuracy: {test_results['avg_correctness']:.2f} Latency (ms): {test_results['avg_latency_ms']:.0f} {changlog_entry} Diff: {diff}""" subprocess.run(['git', 'commit', '-m', commit_msg]) ``` **Failure mode:** Prompt changes without corresponding test updates produce false confidence. A prompt updated with "improved instructions" but run against outdated test cases may report artificially high accuracy that reflects test set memorization, not genuine improvement. ```python def audit_test_currency(test_set_path, prompt_path): """Verify test set is newer than most recent prompt update.""" import os test_mtime = os.path.getmtime(test_set_path) prompt_mtime = os.path.getmtime(prompt_path) if prompt_mtime > test_mtime: import warnings warnings.warn( f"Tests ({test_mtime}) are older than prompt ({prompt_mtime}). " f"Results may not reflect current prompt behavior.", UserWarning ) return False return True ``` Teams moving from ad-hoc prompt management to version-controlled prompts reported 60% reduction in production incidents caused by undocumented prompt changes. The 40% of incidents not prevented involved multi-variable interactions that still escaped single-prompt diff tracking.

Prompt version control applies software engineering practices—commits, branches, diffs, and rollback—to prompt engineering. Unlike code, prompts lack formal syntax for change tracking, requiring conventions to compensate.

classifier (active: v2.1)

v2.1 - 2025-05-28

  • Removed "be concise" directive causing under-generation
  • Added entity type enumeration before generation
  • Test accuracy: 0.91 (up from 0.87 in v2.0)
  • Edge case: legal IDs now extracted correctly (was missing hyphen handling)

v2.0 - 2025-05-15

  • Changed from JSON to natural language output
  • Rationale: User preference survey showed natural language preferred 3:1
  • Test accuracy: 0.87 (down from 0.94 in v1.5)
  • Edge case: Introduced regression on technical document extraction

v1.5 - 2025-04-30

  • Added explicit date format handling (ISO 8601)
  • Test accuracy: 0.94
  • Note: Format constraint caused 12% latency increase

v1.0 - 2025-03-01

  • Initial production prompt
  • Test accuracy: 0.88 """

def commit_prompt(prompt_path, changelog_entry, test_results): """ Commit updated prompt with diff and metadata. """ import subprocess

# Generate diff against HEAD
diff = subprocess.check_output(
    ['git', 'diff', 'HEAD', prompt_path],
    text=True
)

# Create archive copy with version marker
with open(f'prompts/archive/v{get_next_version()}_{prompt_path}', 'w') as f:
    f.write(read_prompt(prompt_path))

# Stage changes
subprocess.run(['git', 'add', prompt_path, f'prompts/archive/'])

# Commit with structured message
commit_msg = f"""Update {prompt_path}

Version: {get_next_version()} Test accuracy: {test_results['avg_correctness']:.2f} Latency (ms): {test_results['avg_latency_ms']:.0f}

{changlog_entry}

Diff: {diff}"""

subprocess.run(['git', 'commit', '-m', commit_msg])

**Failure mode:** Prompt changes without corresponding test updates produce false confidence. A prompt updated with "improved instructions" but run against outdated test cases may report artificially high accuracy that reflects test set memorization, not genuine improvement.

```python
def audit_test_currency(test_set_path, prompt_path):
    """Verify test set is newer than most recent prompt update."""
    import os
    test_mtime = os.path.getmtime(test_set_path)
    prompt_mtime = os.path.getmtime(prompt_path)
    
    if prompt_mtime > test_mtime:
        import warnings
        warnings.warn(
            f"Tests ({test_mtime}) are older than prompt ({prompt_mtime}). "
            f"Results may not reflect current prompt behavior.",
            UserWarning
        )
        return False
    return True

Teams moving from ad-hoc prompt management to version-controlled prompts reported 60% reduction in production incidents caused by undocumented prompt changes. The 40% of incidents not prevented involved multi-variable interactions that still escaped single-prompt diff tracking.

EXERCISE

Implement version control for the prompts in your prompt kit. Create a CHANGELOG, archive your current version, make one improvement with a corresponding test run, and commit with a structured message documenting the change and results.

← Chapter 23
Cross-Model Testing
Chapter 25 →
Final Project: Prompt Framework