RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Learn
  4. /Courses
  5. /RAG Systems: Part 2
  6. /Ch. 14
RAG Systems: Part 2

14. Sliding Window Context

Chapter 14 of 22 · 20 min
KEY INSIGHT

Sliding windows search long documents by creating overlapping chunks, but you must handle cases where relevant information spans multiple chunks.

When documents exceed your chunk size, sliding windows let you search across the full document while keeping chunks small enough for embedding quality.

Window Configuration

A sliding window has three parameters: chunk_size (characters or tokens), overlap (how much consecutive chunks share), and stride (step between windows).

def sliding_window_chunks(document: str, 
                          chunk_size: int = 500,
                          overlap: int = 100,
                          tokenizer=None) -> list:
    """Create overlapping sliding window chunks."""
    
    if tokenizer:
        tokens = tokenizer.encode(document)
    else:
        # Simple character-based fallback
        tokens = document.split()
    
    chunks = []
    start = 0
    
    while start < len(tokens):
        end = min(start + chunk_size, len(tokens))
        
        if tokenizer:
            chunk_text = tokenizer.decode(tokens[start:end])
        else:
            chunk_text = " ".join(tokens[start:end])
        
        chunks.append({
            "text": chunk_text,
            "start_token": start,
            "end_token": end
        })
        
        # Slide by chunk_size - overlap
        start += chunk_size - overlap
    
    return chunks

Context Reconstruction

When relevant information falls in the overlap region between chunks, you need to reconstruct the full context.

def reconstruct_full_context(hit_chunks: list, all_chunks: list, 
                             overlap_tokens: int = 100) -> str:
    """Reconstruct full context when hits are in overlap regions."""
    
    # Get token positions of hits
    hit_positions = [(c["start_token"], c["end_token"]) for c in hit_chunks]
    
    # Find minimum start and maximum end
    min_start = min(p[0] for p in hit_positions)
    max_end = max(p[1] for p in hit_positions)
    
    # Expand to include overlap buffer
    expanded_start = max(0, min_start - overlap_tokens)
    expanded_end = max_end + overlap_tokens
    
    # Find all chunks that contribute to this range
    contributing = []
    for chunk in all_chunks:
        if (chunk["start_token"] < expanded_end and 
            chunk["end_token"] > expanded_start):
            contributing.append(chunk)
    
    # Merge in order
    contributing.sort(key=lambda x: x["start_token"])
    merged = " ".join(c["text"] for c in contributing)
    
    return merged

Skip-Window Search

For very long documents, not every window needs to be indexed. Skip windows sample through the document.

def skip_window_index(document: str, 
                       chunk_size: int = 500,
                       skip_rate: int = 3) -> list:
    """Index every Nth window to reduce index size."""
    
    all_windows = sliding_window_chunks(document, chunk_size, 
                                        overlap=chunk_size // 2)
    
    # Select every skip_rate-th window
    selected = all_windows[::skip_rate]
    
    return selected

Failure Modes

Overlapping windows can retrieve the same fact twice, making the LLM see redundant information. Large skip rates can miss relevant content that falls between indexed windows. Always test with documents where the answer appears near chunk boundaries.

EXERCISE

Index a 50-page PDF using sliding windows with 100-token overlap. Query for a fact that appears near a chunk boundary. Verify that the retrieved context contains the complete answer.

← Chapter 13
Context Compression
Chapter 15 →
Document Re-ranking