14. Sliding Window Context
When documents exceed your chunk size, sliding windows let you search across the full document while keeping chunks small enough for embedding quality.
Window Configuration
A sliding window has three parameters: chunk_size (characters or tokens), overlap (how much consecutive chunks share), and stride (step between windows).
def sliding_window_chunks(document: str,
chunk_size: int = 500,
overlap: int = 100,
tokenizer=None) -> list:
"""Create overlapping sliding window chunks."""
if tokenizer:
tokens = tokenizer.encode(document)
else:
# Simple character-based fallback
tokens = document.split()
chunks = []
start = 0
while start < len(tokens):
end = min(start + chunk_size, len(tokens))
if tokenizer:
chunk_text = tokenizer.decode(tokens[start:end])
else:
chunk_text = " ".join(tokens[start:end])
chunks.append({
"text": chunk_text,
"start_token": start,
"end_token": end
})
# Slide by chunk_size - overlap
start += chunk_size - overlap
return chunks
Context Reconstruction
When relevant information falls in the overlap region between chunks, you need to reconstruct the full context.
def reconstruct_full_context(hit_chunks: list, all_chunks: list,
overlap_tokens: int = 100) -> str:
"""Reconstruct full context when hits are in overlap regions."""
# Get token positions of hits
hit_positions = [(c["start_token"], c["end_token"]) for c in hit_chunks]
# Find minimum start and maximum end
min_start = min(p[0] for p in hit_positions)
max_end = max(p[1] for p in hit_positions)
# Expand to include overlap buffer
expanded_start = max(0, min_start - overlap_tokens)
expanded_end = max_end + overlap_tokens
# Find all chunks that contribute to this range
contributing = []
for chunk in all_chunks:
if (chunk["start_token"] < expanded_end and
chunk["end_token"] > expanded_start):
contributing.append(chunk)
# Merge in order
contributing.sort(key=lambda x: x["start_token"])
merged = " ".join(c["text"] for c in contributing)
return merged
Skip-Window Search
For very long documents, not every window needs to be indexed. Skip windows sample through the document.
def skip_window_index(document: str,
chunk_size: int = 500,
skip_rate: int = 3) -> list:
"""Index every Nth window to reduce index size."""
all_windows = sliding_window_chunks(document, chunk_size,
overlap=chunk_size // 2)
# Select every skip_rate-th window
selected = all_windows[::skip_rate]
return selected
Failure Modes
Overlapping windows can retrieve the same fact twice, making the LLM see redundant information. Large skip rates can miss relevant content that falls between indexed windows. Always test with documents where the answer appears near chunk boundaries.
Index a 50-page PDF using sliding windows with 100-token overlap. Query for a fact that appears near a chunk boundary. Verify that the retrieved context contains the complete answer.