10. Chat Template

Chapter 10 of 24 · 15 min

Chat templates provide a programmatic way to apply consistent formatting across datasets and models. Rather than manually constructing format strings, chat templates use a declarative specification that handles edge cases and model-specific requirements automatically.

The Hugging Face transformers library introduced chat templates as a standardized interface for conversation formatting. Each model can define a template specifying how roles, messages, and special tokens combine. The same API works across different models, reducing format-related bugs.

Templates use Jinja2 syntax for logic and formatting. This allows conditional inclusion of elements, looping over messages, and applying transformations. A basic template might simply concatenate messages, while a complex template handles system prompts, multi-modal content, and tool use definitions.

The apply_chat_template method on tokenizers handles the actual formatting. It accepts a list of message dictionaries and returns a formatted string ready for tokenization. This method also handles tokenization with proper truncation and padding conventions.

For fine-tuning, the template must be applied consistently during training data preparation. The formatted strings then undergo tokenization with labels computed based on the target response positions. Any format mismatches between training and inference cause degraded performance.

Template evolution represents a challenge: newer model versions may introduce improved formatting conventions. When fine-tuning, use the template version compatible with your target inference framework. Mismatches between training template and inference template produce confusing behavior.

EXERCISE

Load a tokenizer with a chat template and apply it to a test conversation. Examine the resulting format and verify special token insertion.

# chat_template_demo.py
from transformers import AutoTokenizer
import json

def load_and_inspect_chat_template(model_name: str):
    """Load tokenizer and inspect its chat template."""
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    
    print(f"Model: {model_name}")
    print(f"Tokenizer class: {type(tokenizer).__name__}")
    print()
    
    if hasattr(tokenizer, "chat_template"):
        print("Chat template found:")
        print("-" * 40)
        print(tokenizer.chat_template)
        print("-" * 40)
    else:
        print("No chat template defined")
        return None
    
    return tokenizer

def test_template_formatting(tokenizer, test_conversation: list):
    """Test chat template with a sample conversation."""
    print("\nInput messages:")
    for msg in test_conversation:
        print(f"  {msg['role']}: {msg['content'][:50]}...")
    
    # Apply template
    formatted = tokenizer.apply_chat_template(
        test_conversation,
        tokenize=False,
        add_generation_prompt=True
    )
    
    print("\nFormatted output:")
    print(formatted)
    print()
    
    # Tokenize and check
    tokens = tokenizer.encode(formatted, add_special_tokens=False)
    print(f"Total tokens: {len(tokens)}")
    
    # Show first/last few tokens
    print(f"First 10 tokens: {tokens[:10]}")
    print(f"First 10 decoded: {tokenizer.decode(tokens[:10])}")
    
    return formatted

def create_training_labels(tokenizer, formatted_text: str) -> dict:
    """
    Create training labels from formatted text.
    Assumes last message is the assistant response to train on.
    """
    # Tokenize full text
    full_tokens = tokenizer.encode(
        formatted_text,
        add_special_tokens=True,
        return_tensors="pt"
    )[0]
    
    # Find the assistant response start
    # This is model-specific; here we show the concept
    response_marker = tokenizer.encode("[/RESPONSE]", add_special_tokens=False)
    
    # For now, mask all but the assistant response
    # Real implementation would parse the template properly
    labels = full_tokens.clone()
    labels[:len(full_tokens)//2] = -100  # Mask first half
    
    return {
        "input_ids": full_tokens.tolist(),
        "labels": labels.tolist()
    }

# Example usage
test_conv = [
    {"role": "system", "content": "You are a helpful coding assistant."},
    {"role": "user", "content": "Write a Python function to calculate fibonacci numbers."},
    {"role": "assistant", "content": "def fibonacci(n):\n    if n <= 1:\n        return n\n    return fibonacci(n-1) + fibonacci(n-2)"}
]

# Note: Replace with actual model for testing
# tokenizer = load_and_inspect_chat_template("meta-llama/Llama-2-7b-chat-hf")
# test_template_formatting(tokenizer, test_conv)