14. Cross-Model Portability
Chapter 14 of 18 · 20 min
Prompts often need to work across different models. A prompt optimized for one model may behave differently on another due to architecture, training data, or tokenization differences.
Model-Specific Prompt Characteristics
| Model Type | Characteristics | Prompt Adjustment |
|---|---|---|
| Decoder-only (Llama, Mistral) | Follows instructions well, needs clear structure | Explicit formatting helps |
| Encoder-decoder (T5, Flan) | Task-agnostic, needs task specification | Stronger task prefix |
| Fine-tuned instruction models | Baked-in behavior patterns | May conflict with prompt instructions |
Architecture for Model Portability
# prompts/adapters.py
from abc import ABC, abstractmethod
class PromptAdapter(ABC):
@abstractmethod
def format(self, template: str, **kwargs) -> str:
pass
@abstractmethod
def get_model_config(self) -> dict:
pass
class OllamaAdapter(PromptAdapter):
def format(self, template: str, **kwargs) -> str:
# Ollama handles system prompts well
return template.format(**kwargs)
def get_model_config(self) -> dict:
return {
"options": {
"num_ctx": 4096,
"temperature": 0.7
}
}
class VLLMAdapter(PromptAdapter):
def format(self, template: str, **kwargs) -> str:
# vLLM often needs chat template wrapping
return f"[INST]<<SYS>>\n{template}\n<</SYS>>\n[/INST]"
def get_model_config(self) -> dict:
return {
"temperature": 0.7,
"max_tokens": 512
}
Testing Across Models
# test_cross_model.py
def test_prompt_portability(prompt_name, models, test_cases):
results = {}
for model in models:
try:
if model.startswith("ollama:"):
adapter = OllamaAdapter()
response = ollama.chat(
model=model.replace("ollama:", ""),
messages=[{"role": "user", "content": prompt_name}]
)
elif model.startswith("vllm:"):
adapter = VLLMAdapter()
response = call_vllm(model.replace("vllm:"), prompt_name)
results[model] = {
"success": True,
"response": response,
"quality": evaluate_response(response, test_cases)
}
except Exception as e:
results[model] = {
"success": False,
"error": str(e)
}
return results
Portability Pain Points
Tokenization differences cause the most portability issues. A prompt that fits in context for one model may overflow for another:
# Check token length across models
from transformers import AutoTokenizer
def check_token_lengths(prompt, models):
results = {}
for model_name in models:
tokenizer = AutoTokenizer.from_pretrained(model_name)
token_count = len(tokenizer.encode(prompt))
results[model_name] = {
"tokens": token_count,
"fits": token_count < get_context_window(model_name)
}
return results
System prompt conflicts occur when both the prompt and model define behavior:
# Conflict resolution strategy
SYSTEM_PROMPTS = {
"llama3": "Remove redundant instruction - model already follows instructions",
"mistral": "Add explicit reasoning chain instructions",
"gpt-4": "Use minimal system prompt - model is strong at following implicit structure"
}
EXERCISE
Take one prompt and test it across three different models (e.g., llama3 via Ollama, a vLLM served model, and GPT-4). Document what works and what breaks, then adjust the prompt to maximize portability.