Cost-Per-Token Optimization — Advanced Prompt Engineering (Chapter 16)

Every token has a cost. Optimizing token usage reduces expenses without degrading output quality—often improving it through conciseness.

Token Cost Breakdown

Model	Input Cost/1M tokens	Output Cost/1M tokens	Notes
GPT-4o	$5.00	$15.00	Higher quality, higher cost
GPT-3.5-turbo	$0.50	$1.50	Lower cost, acceptable quality
Llama 3 70B (Ollama)	~$0.00	~$0.00	Self-hosted costs (GPU + electricity)
Mistral 7B (Ollama)	~$0.00	~$0.00	Lower resource requirements

Prompt Compression Techniques

Remove redundancy while preserving meaning:

# Before: 287 tokens
"""
You are an expert data analyst working for a Fortune 500 company.
Your role is to analyze datasets and provide insights. You have
access to tools that can help you process data. When given a
dataset, first explore its structure, then identify key patterns,
and finally present your findings in a clear format.

Dataset: {dataset}
"""

# After: 89 tokens (69% reduction)
"""
Analyze this dataset and report key patterns: {dataset}
"""

# Preserved: Task definition, input/output specification

Structure Optimization

Reduce tokens through formatting changes:

# Verbose structure
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "system", "content": "Answer questions accurately."},
    {"role": "system", "content": "Be concise in your responses."},
    {"role": "user", "content": "What is Python?"}
]

# Optimized structure
messages = [
    {"role": "system", "content": "Helpful assistant. Answer accurately, be concise."},
    {"role": "user", "content": "What is Python?"}
]

Dynamic Few-Shot Selection

Only include relevant examples:

def select_few_shot_examples(query, example_bank, max_examples=2):
    """Choose examples similar to the query to minimize token use."""
    query_embedding = embed(query)
    
    scored_examples = []
    for example in example_bank:
        example_embedding = embed(example["input"])
        similarity = cosine_similarity(query_embedding, example_embedding)
        scored_examples.append((similarity, example))
    
    # Select top-k most similar, not all
    return [ex for _, ex in sorted(scored_examples, reverse=True)[:max_examples]]

Cost Monitoring

Track cost per output to identify optimization opportunities:

import tiktoken

def estimate_cost(prompt, model="gpt-4"):
    encoder = tiktoken.encoding_for_model(model)
    input_tokens = len(encoder.encode(prompt))
    
    # Rough cost estimates (verify current pricing)
    pricing = {
        "gpt-4": {"input": 0.000005, "output": 0.000015},
        "gpt-3.5-turbo": {"input": 0.0000005, "output": 0.0000015}
    }
    
    return input_tokens * pricing[model]["input"]