What this does

A Retrieval-Augmented Generation (RAG) pipeline connects a document store to a language model so that answers are grounded in your own data rather than generic training knowledge. This guide walks through creating an end-to-end pipeline using LangChain and Ollama, covering document loading, chunking, embedding, vector storage, and answering queries.

Steps

Install and configure Ollama. Ensure the service is reachable at the default local endpoint.
```
import os
os.environ["OLLAMA_BASE_URL"] = "http://localhost:11434"
```

Load documents. Use a text loader to ingest raw files.

from langchain_community.document_loaders import TextLoader

loader = TextLoader("context/sample.txt")
docs = loader.load()

Split text into chunks. Chunking controls how much context fits in each retrieval unit.

from langchain.text_splitter import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
chunks = splitter.split_documents(docs)

Create embeddings and store vectors. Ollama powers the embedding model.

from langchain_ollama import OllamaEmbeddings
from langchain_community.vectorstores import Chroma

embeddings = OllamaEmbeddings(model="llama3")
db = Chroma.from_documents(chunks, embeddings)

Set up the retrieval chain. Combine a retriever with the LLM.

from langchain_ollama import ChatOllama
from langchain.chains import RetrievalQA

llm = ChatOllama(model="llama3")
chain = RetrievalQA.from_chain_type(llm=llm, retriever=db.as_retriever())

Query the pipeline. Pass a natural-language question.
```
result = chain.invoke("What does the document say about retrieval?")
print(result["result"])
```
Expected output: a grounded answer citing retrieved chunks.

Verification

python -c "from langchain_ollama import ChatOllama; print(ChatOllama(model='llama3').invoke('Hi'))"
# Expected: AIMessage(content='Hi')

Common failures

Ollama server not running. Verify with curl http://localhost:11434. Start with ollama serve if the connection is refused.
Model not pulled. Run ollama pull llama3 before executing the chain.
Chunk size too large for small documents. Overlapping chunks of 50 tokens helps prevent context gaps.
Embedding model mismatch. Use the same model for embeddings and chat; mismatches cause poor retrieval accuracy.
Version mismatch - The installed package or runtime differs from the command shown; check the version first and rerun the smallest verification command.
Local environment drift - Another service, virtual environment, model, or path is being used; print the active binary path and configuration before changing the guide steps.

How to Build a Basic RAG Pipeline with LangChain

What this does

Steps

Verification

Common failures

Related guides