RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Learn
  4. /Courses
  5. /Fine-Tuning with LoRA and QLoRA
  6. /Ch. 9
Fine-Tuning with LoRA and QLoRA

09. Data Formatting

Chapter 9 of 24 · 20 min
KEY INSIGHT

Training format must match inference format; consistent use of special tokens and role markers teaches the model to interpret and generate structured outputs.

Formatting determines how the model interprets training examples. The model learns to produce outputs that match the formatting it observes in training data. Inconsistent or confusing formatting degrades the model's ability to follow instructions correctly.

For instruction-tuning, a common format includes system, user, and assistant message roles with clear delimiters. The model learns that content between specific markers represents different message types and should be handled differently. This segmentation teaches the model conversational structure.

The chosen format should match the format the model will encounter at inference time. Fine-tuning on one format and prompting with another creates mismatch that confuses the model. When deploying adapters, ensure inference code uses identical formatting conventions.

Special tokens play a crucial role in formatting. These tokens (often represented as <s>, </s>, [INST], [/INST], or similar) mark boundaries between different content types. The tokenizer must recognize these tokens and the vocabulary must include them. Most instruction-tuned models include appropriate special tokens.

Conversation formats vary across model families. Llama models typically use a specific template with [INST] and [/INST] markers. Mistral models use similar conventions with variations. Vicuna and related models use yet another format. Training on the wrong format for a given model produces poor results.

Handling multi-turn conversations requires deciding how to structure context. Options include including full conversation history (higher memory, better context) or only the current turn (lower memory, less context). Most fine-tuning pipelines truncate to a maximum sequence length, cutting off older turns when necessary.

EXERCISE

Implement a formatter that converts raw conversation data into tokenized sequences for a specific model family. Verify the output matches expected special token placement.

# data_formatter.py
from typing import List, Dict, Optional

class ChatFormatter:
    """Format conversations for instruction-tuning."""
    
    def __init__(
        self,
        tokenizer,
        system_template: str = "Below is an instruction that describes a task, paired with an input that provides further context.\n\n### Instruction:\n{system}\n\n### Input:\n{input}\n\n### Response:\n{response}",
        system_message: str = "You are a helpful assistant."
    ):
        self.tokenizer = tokenizer
        self.system_template = system_template
        self.system_message = system_message
    
    def format_single_turn(
        self,
        instruction: str,
        input_text: str,
        response: str
    ) -> Dict[str, str]:
        """Format a single instruction-input-response example."""
        if input_text:
            formatted = self.system_template.format(
                system=instruction,
                input=input_text,
                response=response
            )
        else:
            formatted = self.system_template.format(
                system=instruction,
                input="N/A",
                response=response
            )
        return {"text": formatted}
    
    def format_conversation(
        self,
        messages: List[Dict[str, str]],
        add_generation_prompt: bool = True
    ) -> str:
        """
        Format a multi-turn conversation using model-specific tokens.
        Example for Llama/Mistral style models.
        """
        result = ""
        
        for i, msg in enumerate(messages):
            role = msg.get("role", "user")
            content = msg["content"]
            
            if role == "system":
                result += f"<<SYS>>\n{content}\n<</SYS>>\n\n"
            elif role == "user":
                result += f"[INST] {content} [/INST]"
            elif role == "assistant":
                result += f"{content}</s>\n"
        
        if add_generation_prompt and messages[-1].get("role") == "user":
            # Add generation prompt marker
            result += "[INST] "
        
        return result.strip()
    
    def tokenize_for_training(
        self,
        example: Dict[str, str],
        max_length: int = 2048
    ) -> Dict[str, List[int]]:
        """
        Tokenize formatted text for training.
        Returns input_ids with labels (masked non-response tokens).
        """
        text = example["text"]
        
        # Tokenize entire sequence
        tokenized = self.tokenizer(
            text,
            truncation=True,
            max_length=max_length,
            padding="max_length",
            return_tensors=None
        )
        
        # Find where the response starts
        response_marker = "### Response:\n"
        response_start = text.find(response_marker)
        
        if response_start == -1:
            # Mask entire sequence if no response marker found
            tokenized["labels"] = [-100] * len(tokenized["input_ids"])
            return tokenized
        
        # Calculate token offset to response
        response_text_start = response_start + len(response_marker)
        
        # Find token position where response begins
        # This is approximate; tokenizer-dependent
        prefix = text[:response_text_start]
        prefix_tokens = len(self.tokenizer.encode(prefix))
        
        # Create labels: mask non-response tokens
        input_ids = tokenized["input_ids"]
        labels = [-100] * len(input_ids)
        
        for i in range(prefix_tokens, len(input_ids)):
            labels[i] = input_ids[i]
        
        tokenized["labels"] = labels
        return tokenized

# Verify formatting
def verify_formatting(formatter: ChatFormatter, tokenizer):
    """Verify special tokens are handled correctly."""
    example_messages = [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the capital of France?"},
        {"role": "assistant", "content": "The capital of France is Paris."}
    ]
    
    formatted = formatter.format_conversation(example_messages)
    print("Formatted conversation:")
    print(formatted)
    print()
    
    tokens = tokenizer.encode(formatted, add_special_tokens=False)
    decoded = tokenizer.decode(tokens)
    print("Re-decoded matches original:", formatted.strip() == decoded.strip())
← Chapter 8
Dataset Preparation
Chapter 10 →
Chat Template