What this does

The RecursiveCharacterTextSplitter is the default chunking strategy in most LangChain pipelines. It recursively splits text on a list of separators—paragraphs first, then newlines, then sentences—until every chunk is below the target size. This preserves natural structure while guaranteeing uniform, embeddable pieces.

Steps

Create the splitter with sensible defaults.

from langchain.text_splitter import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=50,
    separators=["\n\n", "\n", ". ", " ", ""],
)

Split raw text directly.

sample = "Natural language processing enables computers to understand text. RAG combines retrieval with generation."
chunks = splitter.split_text(sample)
print(f"Split into {len(chunks)} chunk(s)")
for i, chunk in enumerate(chunks):
    print(f"  [{i}] ({len(chunk)} chars): {chunk}")

Split pre-loaded LangChain documents.

from langchain_core.documents import Document

docs = [Document(page_content=sample, metadata={"source": "demo"})]
chunked_docs = splitter.split_documents(docs)
print(f"Produced {len(chunked_docs)} document chunks")

Customize separators for code.

code_splitter = RecursiveCharacterTextSplitter(
    chunk_size=200,
    chunk_overlap=20,
    separators=["\nclass ", "\ndef ", "\n    ", "\n", " "],
)
print(code_splitter.split_text("class Agent:\n    def run(self):\n        pass"))

Verification

python -c "
from langchain.text_splitter import RecursiveCharacterTextSplitter
s = RecursiveCharacterTextSplitter(chunk_size=100, chunk_overlap=10)
c = s.split_text('A B C D E F G H I J K L M N O P Q R S T U V W X Y Z')
print(f'Chunks: {len(c)}, max len: {max(len(x) for x in c)}')
"
# Expected: Chunks: <N>, max len: <=100

Common failures

Overlap causes duplicate concepts. Set overlap to ~10% of chunk size.
Empty chunks produced. split_text always returns at least one chunk. Verify with assert chunks.
Whitespace-heavy chunks. Strip with chunk.strip() in a post-processing step.
Chunks too large for embedding. Reduce chunk_size to 512 tokens or fewer.
Version mismatch - The installed package or runtime differs from the command shown; check the version first and rerun the smallest verification command.
Local environment drift - Another service, virtual environment, model, or path is being used; print the active binary path and configuration before changing the guide steps.

How to Use Recursive Character Text Splitter

What this does

Steps

Verification

Common failures

Related guides