RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Learn
  4. /How-to
  5. /How to Use Recursive Character Text Splitter
HOW-TO · RAG

How to Use Recursive Character Text Splitter

intermediate·15 min·By Fredoline Eruo
Target environment
Ubuntu 24.04 · Ollama 0.4.x
PREREQUISITES

LangChain installed

What this does

The RecursiveCharacterTextSplitter is the default chunking strategy in most LangChain pipelines. It recursively splits text on a list of separators—paragraphs first, then newlines, then sentences—until every chunk is below the target size. This preserves natural structure while guaranteeing uniform, embeddable pieces.

Steps

  1. Create the splitter with sensible defaults.

    from langchain.text_splitter import RecursiveCharacterTextSplitter
    
    splitter = RecursiveCharacterTextSplitter(
        chunk_size=500,
        chunk_overlap=50,
        separators=["\n\n", "\n", ". ", " ", ""],
    )
    
  2. Split raw text directly.

    sample = "Natural language processing enables computers to understand text. RAG combines retrieval with generation."
    chunks = splitter.split_text(sample)
    print(f"Split into {len(chunks)} chunk(s)")
    for i, chunk in enumerate(chunks):
        print(f"  [{i}] ({len(chunk)} chars): {chunk}")
    
  3. Split pre-loaded LangChain documents.

    from langchain_core.documents import Document
    
    docs = [Document(page_content=sample, metadata={"source": "demo"})]
    chunked_docs = splitter.split_documents(docs)
    print(f"Produced {len(chunked_docs)} document chunks")
    
  4. Customize separators for code.

    code_splitter = RecursiveCharacterTextSplitter(
        chunk_size=200,
        chunk_overlap=20,
        separators=["\nclass ", "\ndef ", "\n    ", "\n", " "],
    )
    print(code_splitter.split_text("class Agent:\n    def run(self):\n        pass"))
    

Verification

python -c "
from langchain.text_splitter import RecursiveCharacterTextSplitter
s = RecursiveCharacterTextSplitter(chunk_size=100, chunk_overlap=10)
c = s.split_text('A B C D E F G H I J K L M N O P Q R S T U V W X Y Z')
print(f'Chunks: {len(c)}, max len: {max(len(x) for x in c)}')
"
# Expected: Chunks: <N>, max len: <=100

Common failures

  • Overlap causes duplicate concepts. Set overlap to ~10% of chunk size.
  • Empty chunks produced. split_text always returns at least one chunk. Verify with assert chunks.
  • Whitespace-heavy chunks. Strip with chunk.strip() in a post-processing step.
  • Chunks too large for embedding. Reduce chunk_size to 512 tokens or fewer.
  • Version mismatch - The installed package or runtime differs from the command shown; check the version first and rerun the smallest verification command.
  • Local environment drift - Another service, virtual environment, model, or path is being used; print the active binary path and configuration before changing the guide steps.

Related guides

  • optimize-chunk-size-overlap
  • implement-semantic-chunking-langchain
RELATED GUIDES
RAG
How to Implement Semantic Chunking with LangChain
RAG
How to Optimize Chunk Size and Overlap Strategy
← All how-to guidesCourses →