02. Fine-Tuning vs RAG

Chapter 2 of 24 · 15 min

Retrieval-Augmented Generation and fine-tuning represent complementary approaches to improving model outputs, each addressing different failure modes. RAG augments inference by retrieving relevant documents from an external corpus and including them in the context window. Fine-tuning modifies model weights to change the underlying behavior.

RAG excels when the model needs access to information that changes frequently, exceeds context window capacity, or requires verification against authoritative sources. A legal research assistant benefits from RAG because case law updates continuously and the system must cite specific precedents. RAG provides traceabilityâ€”users can verify exactly which documents influenced the response.

Fine-tuning excels when the model needs to learn patterns, formats, or behaviors that cannot be easily specified in prompts. A customer service chatbot might require fine-tuning to adopt a consistent brand voice, handle objection responses fluently, or follow company-specific escalation protocols. These behaviors involve learned patterns rather than retrieved facts.

Combining both approaches often yields best results. A support system might use fine-tuning to handle conversation flow and tone while using RAG to retrieve current product documentation and policy details. This architecture separates concerns: fine-tuning handles "how to respond" while RAG handles "what information to include."

Memory and compute requirements differ substantially. RAG requires maintaining a vector database and retrieval infrastructure but uses no additional training compute. Fine-tuning requires training compute proportional to model size but adds no inference-time infrastructure overhead beyond storing adapter weights.

EXERCISE

Analyze an existing application and categorize each capability as better suited for RAG, fine-tuning, or both. Document the reasoning for each classification.

# capability_classifier.py
from enum import Enum

class CapabilityType(Enum):
    FACT_RETRIEVAL = "retrieval"
    PATTERN_LEARNING = "finetuning"
    HYBRID = "both"

def classify_capability(description: str) -> CapabilityType:
    """Classify a capability based on its characteristics."""
    retrieval_indicators = ["up-to-date", "specific document", "citation needed"]
    finetuning_indicators = ["consistent format", "style", "tone", "protocol"]
    
    retrieval_score = sum(1 for ind in retrieval_indicators if ind in description.lower())
    finetuning_score = sum(1 for ind in finetuning_indicators if ind in description.lower())
    
    if retrieval_score > finetuning_score:
        return CapabilityType.FACT_RETRIEVAL
    elif finetuning_score > retrieval_score:
        return CapabilityType.PATTERN_LEARNING
    return CapabilityType.HYBRID