RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Learn
  4. /Courses
  5. /Advanced Prompt Engineering
  6. /Ch. 18
Advanced Prompt Engineering

18. Prompt Framework Project

Chapter 18 of 18 · 25 min
KEY INSIGHT

A framework without tests is just scaffolding—a production prompt system requires the same rigor as software development: version control, testing, CI/CD, and monitoring.

This final chapter integrates previous concepts into a complete, production-ready prompt framework. The project structure demonstrates how all components work together.

Project Structure

prompt-framework/
├── prompts/
│   ├── customer-service/
│   │   ├── v1.0.yaml
│   │   ├── v1.1.yaml
│   │   └── v2.0.yaml
│   └── technical-support/
│       └── v1.0.yaml
├── src/
│   ├── __init__.py
│   ├── loader.py
│   ├── generator.py
│   ├── evaluator.py
│   └── security.py
├── tests/
│   ├── test_loader.py
│   ├── test_generator.py
│   ├── test_evaluator.py
│   └── regression/
│       └── golden_set.yaml
├── pyproject.toml
└── README.md

Core Framework Implementation

# src/loader.py
from pathlib import Path
import yaml
from dataclasses import dataclass
from typing import Optional

@dataclass
class Prompt:
    name: str
    version: str
    template: str
    model: str
    parameters: dict
    metadata: dict

class PromptLoader:
    def __init__(self, prompts_dir: str = "prompts"):
        self.prompts_dir = Path(prompts_dir)
    
    def load(self, name: str, version: Optional[str] = None) -> Prompt:
        """Load a prompt by name and optional version."""
        if version is None:
            version = self.get_latest_version(name)
        
        path = self.prompts_dir / name / f"{version}.yaml"
        
        with open(path) as f:
            data = yaml.safe_load(f)
        
        return Prompt(
            name=name,
            version=version,
            template=data["template"],
            model=data.get("model", "gpt-4"),
            parameters=data.get("parameters", {}),
            metadata=data.get("metadata", {})
        )
    
    def get_latest_version(self, name: str) -> str:
        """Find highest version number for a prompt."""
        versions = []
        for path in (self.prompts_dir / name).glob("v*.yaml"):
            versions.append(path.stem)
        return sorted(versions)[-1]
    
    def list_versions(self, name: str) -> list[str]:
        """List all versions of a prompt."""
        return sorted([
            path.stem for path in (self.prompts_dir / name).glob("v*.yaml")
        ])
# src/generator.py
import ollama
from typing import Optional
from .loader import Prompt
from .security import sanitize_input, validate_output

class PromptGenerator:
    def __init__(self, default_model: str = "llama3:70b"):
        self.default_model = default_model
        self.model_configs = {
            "ollama": {"api_base": "http://localhost:11434"},
            "vllm": {"api_base": "http://localhost:8000"}
        }
    
    def generate(
        self, 
        prompt: Prompt, 
        context: dict,
        model: Optional[str] = None,
        validate: bool = True
    ) -> dict:
        """Generate response from prompt and context."""
        
        # Sanitize context inputs
        sanitized_context = {
            key: sanitize_input(str(val)) 
            for key, val in context.items()
        }
        
        # Format template
        formatted = prompt.template.format(**sanitized_context)
        
        # Select model
        model = model or prompt.model or self.default_model
        
        # Call model
        response = self._call_model(model, formatted, prompt.parameters)
        
        # Validate output if requested
        if validate:
            is_valid, violations = validate_output(response["content"])
            if not is_valid:
                raise ValueError(f"Output validation failed: {violations}")
        
        return {
            "content": response["content"],
            "model": model,
            "prompt_version": prompt.version,
            "tokens_used": response.get("tokens", 0)
        }
    
    def _call_model(self, model: str, prompt: str, params: dict):
        """Call appropriate model backend."""
        if model.startswith("ollama:"):
            model_name = model.replace("ollama:", "")
            return self._call_ollama(model_name, prompt, params)
        elif model.startswith("vllm:"):
            model_name = model.replace("vllm:", "")
            return self._call_vllm(model_name, prompt, params)
        else:
            return self._call_ollama(model, prompt, params)
    
    def _call_ollama(self, model: str, prompt: str, params: dict):
        """Call Ollama API."""
        response = ollama.chat(
            model=model,
            messages=[{"role": "user", "content": prompt}],
            options=params
        )
        return {
            "content": response["message"]["content"],
            "tokens": response.get("eval_count", 0)
        }
    
    def _call_vllm(self, model: str, prompt: str, params: dict):
        """Call vLLM API."""
        import requests
        response = requests.post(
            f"{self.model_configs['vllm']['api_base']}/v1/chat/completions",
            json={
                "model": model,
                "messages": [{"role": "user", "content": prompt}],
                **params
            }
        )
        data = response.json()
        return {
            "content": data["choices"][0]["message"]["content"],
            "tokens": data["usage"]["total_tokens"]
        }
# src/evaluator.py
import re
from typing import List, Optional

class PromptEvaluator:
    def __init__(self, golden_set_path: str = "tests/regression/golden_set.yaml"):
        self.golden_set = self._load_golden_set(golden_set_path)
    
    def evaluate(
        self, 
        prompt, 
        test_cases: Optional[List[dict]] = None
    ) -> dict:
        """Run evaluation against test cases."""
        cases = test_cases or self.golden_set
        
        results = {
            "total": len(cases),
            "passed": 0,
            "failed": 0,
            "details": []
        }
        
        for case in cases:
            response = prompt.generator.generate(prompt, case["input"])
            
            passed = self._check_patterns(
                response["content"], 
                case.get("expected_patterns", [])
            )
            
            if passed:
                results["passed"] += 1
            else:
                results["failed"] += 1
            
            results["details"].append({
                "input": case["input"],
                "response": response["content"],
                "passed": passed
            })
        
        results["pass_rate"] = results["passed"] / results["total"]
        return results
    
    def _check_patterns(self, response: str, patterns: List[str]) -> bool:
        """Check if response matches expected patterns."""
        for pattern in patterns:
            if not re.search(pattern, response, re.IGNORECASE):
                return False
        return True

Running the Framework

# main.py
from src.loader import PromptLoader
from src.generator import PromptGenerator
from src.evaluator import PromptEvaluator

def main():
    # Initialize components
    loader = PromptLoader("prompts")
    generator = PromptGenerator(default_model="llama3:70b")
    evaluator = PromptEvaluator()
    
    # Load latest version of customer service prompt
    prompt = loader.load("customer-service")
    
    # Attach generator to prompt for evaluation
    prompt.generator = generator
    
    # Run evaluation
    results = evaluator.evaluate(prompt)
    print(f"Pass rate: {results['pass_rate']:.1%}")
    
    # Generate response for new query
    response = generator.generate(
        prompt, 
        {"query": "I need to return an item"}
    )
    print(response["content"])

if __name__ == "__main__":
    main()

CI/CD Integration

# .github/workflows/prompt-ci.yml
name: Prompt Framework CI
on:
  push:
    paths:
      - 'prompts/**'
      - 'src/**'
      - 'tests/**'

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.11'
      - name: Install dependencies
        run: pip install -e .
      - name: Start Ollama
        run: |
          curl -fsSL https://ollama.com/install.sh | sh
          ollama pull llama3:70b
      - name: Run prompt tests
        run: pytest tests/ -v --tb=short
      - name: Run regression suite
        run: python -m src.evaluator --regression
EXERCISE

Complete the framework by implementing the security module (sanitize_input, validate_output), adding at least 5 test cases to the golden set, and running the full test suite against a local model. End of course content for I012: Advanced Prompt Engineering (Chapters 10-18)

← Chapter 17
Prompt Compression
Course complete →
Browse all courses