How to Build a Basic RAG Pipeline with LangChain
Python 3.10+, LangChain installed, Ollama running
What this does
A Retrieval-Augmented Generation (RAG) pipeline connects a document store to a language model so that answers are grounded in your own data rather than generic training knowledge. This guide walks through creating an end-to-end pipeline using LangChain and Ollama, covering document loading, chunking, embedding, vector storage, and answering queries.
Steps
Install and configure Ollama. Ensure the service is reachable at the default local endpoint.
import os os.environ["OLLAMA_BASE_URL"] = "http://localhost:11434"Load documents. Use a text loader to ingest raw files.
from langchain_community.document_loaders import TextLoader loader = TextLoader("context/sample.txt") docs = loader.load()Split text into chunks. Chunking controls how much context fits in each retrieval unit.
from langchain.text_splitter import RecursiveCharacterTextSplitter splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50) chunks = splitter.split_documents(docs)Create embeddings and store vectors. Ollama powers the embedding model.
from langchain_ollama import OllamaEmbeddings from langchain_community.vectorstores import Chroma embeddings = OllamaEmbeddings(model="llama3") db = Chroma.from_documents(chunks, embeddings)Set up the retrieval chain. Combine a retriever with the LLM.
from langchain_ollama import ChatOllama from langchain.chains import RetrievalQA llm = ChatOllama(model="llama3") chain = RetrievalQA.from_chain_type(llm=llm, retriever=db.as_retriever())Query the pipeline. Pass a natural-language question.
result = chain.invoke("What does the document say about retrieval?") print(result["result"])Expected output: a grounded answer citing retrieved chunks.
Verification
python -c "from langchain_ollama import ChatOllama; print(ChatOllama(model='llama3').invoke('Hi'))"
# Expected: AIMessage(content='Hi')
Common failures
- Ollama server not running. Verify with
curl http://localhost:11434. Start withollama serveif the connection is refused. - Model not pulled. Run
ollama pull llama3before executing the chain. - Chunk size too large for small documents. Overlapping chunks of 50 tokens helps prevent context gaps.
- Embedding model mismatch. Use the same model for embeddings and chat; mismatches cause poor retrieval accuracy.
- Version mismatch - The installed package or runtime differs from the command shown; check the version first and rerun the smallest verification command.
- Local environment drift - Another service, virtual environment, model, or path is being used; print the active binary path and configuration before changing the guide steps.