Fixed-Size vs Semantic Tradeoffs — Advanced RAG — Chunking, Retrieval, Re-ranking (Chapter 3)

Both chunking strategies have distinct performance profiles that suit different use cases.

Fixed-size chunking offers predictable memory usage, uniform embedding computation, and simplicity. Implementation requires only character/word counting and slice operations. However, this approach frequently splits sentences mid-thought and separates related concepts across chunk boundaries.

Semantic chunking preserves document structure but introduces complexity in boundary detection and variable chunk sizes. The overhead becomes significant when processing millions of documents.

Factor	Fixed-Size	Semantic
Implementation complexity	Low	Medium-High
Chunk size variance	Low	High
Context preservation	Poor	Good
Index size predictability	High	Medium
Retrieval on fragmented docs	Degraded	Stable

Hybrid approaches offer a middle path. First, identify semantic boundaries (headings, paragraph breaks). Then apply fixed-size chunking within sections. This preserves section context while maintaining reasonable size uniformity.

def hybrid_chunk(document: str, target_size: int = 400) -> List[str]:
    """Chunk by sections first, then apply size limits within sections."""
    
    # Split by headings (lines starting with # or ALL CAPS followed by :)
    section_pattern = r'(?=\n#{1,6}\s|\n[A-Z][A-Z\s]+:\n)'
    sections = re.split(section_pattern, document)
    
    chunks = []
    for section in sections:
        if len(section) <= target_size:
            if section.strip():
                chunks.append(section.strip())
        else:
            # Apply fixed-size within long sections
            words = section.split()
            for i in range(0, len(words), target_size // 5):
                chunk = ' '.join(words[i:i + target_size // 5])
                if chunk.strip():
                    chunks.append(chunk)
    
    return chunks

Local verification checkpoint

Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.