14. Cross-Model Portability

Chapter 14 of 18 · 20 min

Prompts often need to work across different models. A prompt optimized for one model may behave differently on another due to architecture, training data, or tokenization differences.

Model-Specific Prompt Characteristics

Model Type Characteristics Prompt Adjustment
Decoder-only (Llama, Mistral) Follows instructions well, needs clear structure Explicit formatting helps
Encoder-decoder (T5, Flan) Task-agnostic, needs task specification Stronger task prefix
Fine-tuned instruction models Baked-in behavior patterns May conflict with prompt instructions

Architecture for Model Portability

# prompts/adapters.py
from abc import ABC, abstractmethod

class PromptAdapter(ABC):
    @abstractmethod
    def format(self, template: str, **kwargs) -> str:
        pass
    
    @abstractmethod
    def get_model_config(self) -> dict:
        pass

class OllamaAdapter(PromptAdapter):
    def format(self, template: str, **kwargs) -> str:
        # Ollama handles system prompts well
        return template.format(**kwargs)
    
    def get_model_config(self) -> dict:
        return {
            "options": {
                "num_ctx": 4096,
                "temperature": 0.7
            }
        }

class VLLMAdapter(PromptAdapter):
    def format(self, template: str, **kwargs) -> str:
        # vLLM often needs chat template wrapping
        return f"[INST]<<SYS>>\n{template}\n<</SYS>>\n[/INST]"
    
    def get_model_config(self) -> dict:
        return {
            "temperature": 0.7,
            "max_tokens": 512
        }

Testing Across Models

# test_cross_model.py
def test_prompt_portability(prompt_name, models, test_cases):
    results = {}
    
    for model in models:
        try:
            if model.startswith("ollama:"):
                adapter = OllamaAdapter()
                response = ollama.chat(
                    model=model.replace("ollama:", ""),
                    messages=[{"role": "user", "content": prompt_name}]
                )
            elif model.startswith("vllm:"):
                adapter = VLLMAdapter()
                response = call_vllm(model.replace("vllm:"), prompt_name)
            
            results[model] = {
                "success": True,
                "response": response,
                "quality": evaluate_response(response, test_cases)
            }
        except Exception as e:
            results[model] = {
                "success": False,
                "error": str(e)
            }
    
    return results

Portability Pain Points

Tokenization differences cause the most portability issues. A prompt that fits in context for one model may overflow for another:

# Check token length across models
from transformers import AutoTokenizer

def check_token_lengths(prompt, models):
    results = {}
    for model_name in models:
        tokenizer = AutoTokenizer.from_pretrained(model_name)
        token_count = len(tokenizer.encode(prompt))
        results[model_name] = {
            "tokens": token_count,
            "fits": token_count < get_context_window(model_name)
        }
    return results

System prompt conflicts occur when both the prompt and model define behavior:

# Conflict resolution strategy
SYSTEM_PROMPTS = {
    "llama3": "Remove redundant instruction - model already follows instructions",
    "mistral": "Add explicit reasoning chain instructions",
    "gpt-4": "Use minimal system prompt - model is strong at following implicit structure"
}
EXERCISE

Take one prompt and test it across three different models (e.g., llama3 via Ollama, a vLLM served model, and GPT-4). Document what works and what breaks, then adjust the prompt to maximize portability.