->Will it run?Best GPU Compare Troubleshoot Start Learn Pulse Models Hardware Tools Bench

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Eruo Fredoline

DIR

Models
Hardware
Tools
Benchmarks

TOOLS

Will it run?
Compare hardware
Cost vs cloud
Choose my GPU
Prompting kits
Quick answers

REF

All buyer guides
Learn local AI
Methodology
Glossary
Errors KB
Trust

EDITOR

About
Author
How we make money
Editorial policy
Contact

LEGAL

Privacy
Terms
Sitemap

MAIL · MONTHLY DIGEST

Get monthly local AI changes

Monthly recap. No spam.

Email address

DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated

RUNLOCALAI · v38

>
Home
Learn
Courses
Prompt Engineering Fundamentals
Ch. 25

Prompt Engineering Fundamentals

25. Final Project: Prompt Framework

Chapter 25 of 25 · 30 min

KEY INSIGHT

A prompt framework succeeds when it makes the right choice obvious and the wrong choice impossible. Constraints codified in code outperform conventions documented in text. ### Framework Architecture The framework consists of five interconnected modules: ```python # framework/ # ├── __init__.py # ├── core/ # │ ├── template.py # Template management # │ ├── schema.py # Input/output validation # │ └── render.py # Multi-model rendering # ├── testing/ # │ ├── harness.py # Evaluation infrastructure # │ ├── ab_test.py # A/B testing integration # │ └── optimizer.py # Automated improvement # ├── deployment/ # │ ├── router.py # Model routing # │ └── monitor.py # Production monitoring # └── cli.py # Command-line interface # core/template.py class PromptTemplate: """Production-compatibly prompt template.""" def __init__(self, name, template_str, input_schema, output_schema): self.name = name self.template = template_str self.input_schema = schema_validator(input_schema) self.output_schema = schema_validator(output_schema) self.models = [] # Model compatibility list self.metadata = {} def register_model(self, model_name, model_config): """Register model-specific rendering.""" self.models.append({ 'name': model_name, 'config': model_config, 'format_variant': model_config.get('format_variant', 'default') }) def render(self, model_name=None, **kwargs): """Render for specific model or default.""" validated_input = self.input_schema.validate(kwargs) model = self.resolve_model(model_name) return render_for_model(self.template, model['format_variant'], **validated_input) ``` ### Schema-Based Validation The framework enforces input/output schemas to guarantee production compatibility: ```python # core/schema.py from pydantic import BaseModel, Field from typing import Generic, TypeVar, Literal T = TypeVar('T') class PromptSchema(Generic[T]): """Schema wrapper that adds prompt-specific validation.""" def __init__(self, model_cls): self.model_cls = model_cls def validate(self, data: dict) -> T: """Validate and return typed instance.""" instance = self.model_cls(**data) self._validate_prompt_constraints(instance) return instance def _validate_prompt_constraints(self, instance): """Hook for prompt-specific validation rules.""" pass class DocumentInput(PromptSchema): """Standard input for document processing tasks.""" class Model(BaseModel): text: str = Field(min_length=10, max_length=50000) modality: Literal['legal', 'technical', 'casual'] = 'casual' language: str = Field(default='en', pattern=r'^[a-z]{2}$') priority: Literal['low', 'normal', 'high'] = 'normal' # Validation catches errors before model call try: validated = DocumentInput.validate({ 'text': 'Short', # Too short 'modality': 'legal' }) except ValidationError as e: print(e) # Error raised before API call ``` ### Testing Infrastructure The testing module evaluates templates across models with statistical rigor: ```python # testing/harness.py class EvaluationHarness: def __init__(self, tests_dir='tests/fixtures'): self.tests_dir = Path(tests_dir) self.results_cache = {} def load_test_cases(self, prompt_name): """Load test cases from fixtures directory.""" path = self.tests_dir / f'{prompt_name}.yaml' if path.exists(): return yaml.safe_load(path.read_text())['cases'] return [] def evaluate(self, template, model_client, n_samples=5): """Statistical evaluation with confidence intervals.""" cases = self.load_test_cases(template.name) results = {'cases': []} for case in cases: samples = self._collect_samples(template, model_client, case, n_samples) consensus = self._compute_consensus(samples) results['cases'].append({ 'input': case['input'], 'expected': case['expected'], 'samples': samples, 'consensus': consensus, 'consensus_correct': self._score(consensus, case['expected']) }) results['summary'] = self._summarize(results['cases']) return results def _summarize(self, cases): """Compute aggregate metrics with confidence intervals.""" scores = [c['consensus_correct'] for c in cases] return { 'n': len(cases), 'mean': np.mean(scores), 'std': np.std(scores), 'p5': np.percentile(scores, 5), 'p95': np.percentile(scores, 95), 'ci95_lower': np.mean(scores) - 1.96 * np.std(scores) / np.sqrt(len(scores)), 'ci95_upper': np.mean(scores) + 1.96 * np.std(scores) / np.sqrt(len(scores)) } ``` ### Deployment Router Routing selects the optimal model per request: ```python # deployment/router.py class PromptRouter: """Route requests to optimal model based on task and model capabilities.""" def __init__(self, model_registry): self.registry = model_registry self.routing_rules = [] def add_rule(self, condition_fn, model_name, priority=0): """Register routing rule with condition function.""" self.routing_rules.append({ 'condition': condition_fn, 'model': model_name, 'priority': priority }) self.routing_rules.sort(key=lambda r: r['priority'], reverse=True) def route(self, template, input_data): """Select optimal model for this template+input combination.""" for rule in self.routing_rules: if rule['condition'](template, input_data): return rule['model'] # Default: use template's first registered model if template.models: return template.models[0]['name'] return self._fallback_model() def _fallback_model(self): """Return most reliable fallback model.""" return 'gpt4o' # Configured via environment # Example routing rules router = PromptRouter(model_registry) router.add_rule( condition_fn=lambda t, i: i.get('priority') == 'high', model_name='claude', priority=100 ) router.add_rule( condition_fn=lambda t, i: 'code' in t.name or 'code' in i.get('text', ''), model_name='deepseek', priority=80 ) router.add_rule( condition_fn=lambda t, i: t.name == 'summarizer' and len(i.get('text', '')) > 5000, model_name='gpt4o', priority=50 ) ``` ### Integration and Testing Assemble the framework and run the test suite: ```python # Full integration test def test_framework_integration(): """End-to-end test of framework lifecycle.""" # 1. Create template with schemas summarizer = PromptTemplate( name='document_summarizer', template=SUMMARIZER_TEMPLATE, input_schema=DocumentInput, output_schema=SummaryOutput ) summarizer.register_model('claude', CLAUDE_CONFIG) summarizer.register_model('gpt4o', GPT4O_CONFIG) summarizer.register_model('deepseek', DEEPSEEK_CONFIG) # 2. Add routing rules router.add_rule( condition_fn=lambda t, i: i.get('modality') == 'technical', model_name='deepseek', priority=70 ) # 3. Evaluate across models harness = EvaluationHarness() results = harness.evaluate(summarizer, model_registry) assert results['summary']['mean'] > 0.85, "Accuracy below threshold" assert results['summary']['p5'] > 0.70, "Bottom 5% below acceptable" # 4. Deploy router.register(summarizer) deployment = DeploymentManager(router, monitor) deployment.deploy('document_summarizer', canary=0.05) return True # Run full test test_framework_integration() ```

This capstone integrates all previous chapters into a production-ready prompt framework. The framework handles the complete lifecycle from template creation through evaluation and deployment, with cross-model compatibility built throughout.

EXERCISE

Build a production-ready prompt framework for two related tasks (example: document classification and entity extraction). Time: 3 hours.

Requirements:

Create core/ module with schema-validated templates for both tasks
Implement testing/ module with statistical evaluation harness
Build deployment/ module with model routing based on input characteristics
Create test fixtures with 30 test cases per task
Run evaluation and document results with confidence intervals
Write CLI commands for: test, deploy, revert, status
Implement version control with CHANGELOG

Deliverables:

Complete Python module with reproducible structure
Test report with per-model metrics
Deployment configuration with at least one routing rule
CHANGELOG documenting template evolution
README with setup instructions and usage examples

Success criteria:

Templates render without errors for all registered models
Evaluation harness produces results with 95% confidence intervals
Router selects correct model based on routing rules
Tests pass on held-out test cases not used during development

Prompt Version Control

Course complete →

Browse all courses