Sliding Window Context — RAG Systems: Part 2 (Chapter 14)

When documents exceed your chunk size, sliding windows let you search across the full document while keeping chunks small enough for embedding quality.

Window Configuration

A sliding window has three parameters: chunk_size (characters or tokens), overlap (how much consecutive chunks share), and stride (step between windows).

def sliding_window_chunks(document: str, 
                          chunk_size: int = 500,
                          overlap: int = 100,
                          tokenizer=None) -> list:
    """Create overlapping sliding window chunks."""
    
    if tokenizer:
        tokens = tokenizer.encode(document)
    else:
        # Simple character-based fallback
        tokens = document.split()
    
    chunks = []
    start = 0
    
    while start < len(tokens):
        end = min(start + chunk_size, len(tokens))
        
        if tokenizer:
            chunk_text = tokenizer.decode(tokens[start:end])
        else:
            chunk_text = " ".join(tokens[start:end])
        
        chunks.append({
            "text": chunk_text,
            "start_token": start,
            "end_token": end
        })
        
        # Slide by chunk_size - overlap
        start += chunk_size - overlap
    
    return chunks

Context Reconstruction

When relevant information falls in the overlap region between chunks, you need to reconstruct the full context.

def reconstruct_full_context(hit_chunks: list, all_chunks: list, 
                             overlap_tokens: int = 100) -> str:
    """Reconstruct full context when hits are in overlap regions."""
    
    # Get token positions of hits
    hit_positions = [(c["start_token"], c["end_token"]) for c in hit_chunks]
    
    # Find minimum start and maximum end
    min_start = min(p[0] for p in hit_positions)
    max_end = max(p[1] for p in hit_positions)
    
    # Expand to include overlap buffer
    expanded_start = max(0, min_start - overlap_tokens)
    expanded_end = max_end + overlap_tokens
    
    # Find all chunks that contribute to this range
    contributing = []
    for chunk in all_chunks:
        if (chunk["start_token"] < expanded_end and 
            chunk["end_token"] > expanded_start):
            contributing.append(chunk)
    
    # Merge in order
    contributing.sort(key=lambda x: x["start_token"])
    merged = " ".join(c["text"] for c in contributing)
    
    return merged

Skip-Window Search

For very long documents, not every window needs to be indexed. Skip windows sample through the document.

def skip_window_index(document: str, 
                       chunk_size: int = 500,
                       skip_rate: int = 3) -> list:
    """Index every Nth window to reduce index size."""
    
    all_windows = sliding_window_chunks(document, chunk_size, 
                                        overlap=chunk_size // 2)
    
    # Select every skip_rate-th window
    selected = all_windows[::skip_rate]
    
    return selected

Failure Modes

Overlapping windows can retrieve the same fact twice, making the LLM see redundant information. Large skip rates can miss relevant content that falls between indexed windows. Always test with documents where the answer appears near chunk boundaries.