RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Learn
  4. /Courses
  5. /Advanced RAG — Chunking, Retrieval, Re-ranking
  6. /Ch. 2
Advanced RAG — Chunking, Retrieval, Re-ranking

02. Semantic Chunking at Scale

Chapter 2 of 24 · 15 min
KEY INSIGHT

Semantic chunking preserves topic coherence at boundaries, allowing retrieval to return whole arguments rather than fragments.

Fixed-size chunking ignores document structure. Semantic chunking identifies natural topic boundaries and splits accordingly, improving retrieval relevance by preserving coherent units.

Sentence splitting forms the foundation. Naive regex-based splitting fails on abbreviations, decimal numbers, and edge cases. Use a tokenizer-aware splitter that understands sentence boundaries in your target language.

Paragraph detection identifies topic shifts within documents. Sections separated by blank lines or headings typically represent distinct concepts. Long paragraphs may contain multiple sub-topics requiring subdivision.

Hierarchical merging combines short segments up to a target size while respecting semantic boundaries. Merge paragraphs until reaching the target chunk size, but stop at heading boundaries.

import re
from typing import List, Tuple

def semantic_chunk(
    text: str,
    min_chunk_size: int = 100,
    max_chunk_size: int = 500,
    overlap: int = 50
) -> List[Tuple[str, dict]]:
    """Split text into semantically coherent chunks with overlap."""
    
    # Split into paragraphs at blank lines
    paragraphs = re.split(r'\n\s*\n', text)
    
    chunks = []
    current_chunk = []
    current_length = 0
    
    for para in paragraphs:
        para = para.strip()
        if not para:
            continue
            
        para_length = len(para)
        
        # If single paragraph exceeds max, split by sentences
        if para_length > max_chunk_size:
            sentences = split_into_sentences(para)
            for sentence in sentences:
                if current_length + len(sentence) > max_chunk_size:
                    if current_chunk:
                        chunks.append(('\n'.join(current_chunk), {}))
                    current_chunk = []
                    current_length = 0
                    # Carry overlap
                    if overlap > 0 and current_chunk:
                        current_chunk = current_chunk[-1:]
                current_chunk.append(sentence)
                current_length += len(sentence)
        else:
            if current_length + para_length > max_chunk_size:
                chunks.append(('\n'.join(current_chunk), {}))
                # Start new chunk with overlap from previous
                overlap_text = '\n'.join(current_chunk)[-overlap:] if overlap else ''
                current_chunk = [overlap_text, para] if overlap_text else [para]
                current_length = len(overlap_text) + para_length
            else:
                current_chunk.append(para)
                current_length += para_length
    
    if current_chunk:
        chunks.append(('\n'.join(current_chunk), {}))
    
    return chunks

def split_into_sentences(text: str) -> List[str]:
    """Use punctuation-aware sentence splitting."""
    # Split on sentence-ending punctuation followed by space and uppercase
    sentences = re.split(r'(?<=[.!?])\s+(?=[A-Z])', text)
    return [s.strip() for s in sentences if s.strip()]
EXERCISE

Implement semantic_chunk and compare results against fixed-size chunking on a technical blog post. Count how many code blocks, lists, or distinct topics span multiple chunks in each approach.

← Chapter 1
RAG Pipeline Anatomy
Chapter 3 →
Fixed-Size vs Semantic Tradeoffs