RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Learn
  4. /How-to
  5. /How to Build a Basic RAG Pipeline with LangChain
HOW-TO · RAG

How to Build a Basic RAG Pipeline with LangChain

intermediate·30 min·By Fredoline Eruo
Target environment
Ubuntu 24.04 · Ollama 0.4.x
PREREQUISITES

Python 3.10+, LangChain installed, Ollama running

What this does

A Retrieval-Augmented Generation (RAG) pipeline connects a document store to a language model so that answers are grounded in your own data rather than generic training knowledge. This guide walks through creating an end-to-end pipeline using LangChain and Ollama, covering document loading, chunking, embedding, vector storage, and answering queries.

Steps

  1. Install and configure Ollama. Ensure the service is reachable at the default local endpoint.

    import os
    os.environ["OLLAMA_BASE_URL"] = "http://localhost:11434"
    
  2. Load documents. Use a text loader to ingest raw files.

    from langchain_community.document_loaders import TextLoader
    
    loader = TextLoader("context/sample.txt")
    docs = loader.load()
    
  3. Split text into chunks. Chunking controls how much context fits in each retrieval unit.

    from langchain.text_splitter import RecursiveCharacterTextSplitter
    
    splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
    chunks = splitter.split_documents(docs)
    
  4. Create embeddings and store vectors. Ollama powers the embedding model.

    from langchain_ollama import OllamaEmbeddings
    from langchain_community.vectorstores import Chroma
    
    embeddings = OllamaEmbeddings(model="llama3")
    db = Chroma.from_documents(chunks, embeddings)
    
  5. Set up the retrieval chain. Combine a retriever with the LLM.

    from langchain_ollama import ChatOllama
    from langchain.chains import RetrievalQA
    
    llm = ChatOllama(model="llama3")
    chain = RetrievalQA.from_chain_type(llm=llm, retriever=db.as_retriever())
    
  6. Query the pipeline. Pass a natural-language question.

    result = chain.invoke("What does the document say about retrieval?")
    print(result["result"])
    

    Expected output: a grounded answer citing retrieved chunks.

Verification

python -c "from langchain_ollama import ChatOllama; print(ChatOllama(model='llama3').invoke('Hi'))"
# Expected: AIMessage(content='Hi')

Common failures

  • Ollama server not running. Verify with curl http://localhost:11434. Start with ollama serve if the connection is refused.
  • Model not pulled. Run ollama pull llama3 before executing the chain.
  • Chunk size too large for small documents. Overlapping chunks of 50 tokens helps prevent context gaps.
  • Embedding model mismatch. Use the same model for embeddings and chat; mismatches cause poor retrieval accuracy.
  • Version mismatch - The installed package or runtime differs from the command shown; check the version first and rerun the smallest verification command.
  • Local environment drift - Another service, virtual environment, model, or path is being used; print the active binary path and configuration before changing the guide steps.

Related guides

  • implement-hybrid-search-rag-bm25-vector
  • add-reranking-rag-pipeline
RELATED GUIDES
RAG
How to Implement Hybrid Search RAG (BM25 + Vector)
RAG
How to Add Reranking to Your RAG Pipeline
← All how-to guidesCourses →