Simple RAG Pipeline — LangChain for Local AI (Chapter 14)

Retrieval-Augmented Generation combines document retrieval with LLM generation. The pipeline: embed documents, store in vector database, retrieve relevant chunks for user query, pass chunks to LLM with the question.

First, set up the embedding and vector store.

from langchain_ollama import OllamaEmbeddings
from langchain_chroma import Chroma

embeddings = OllamaEmbeddings(
    model="nomic-embed-text",
    base_url="http://localhost:11434"
)

vectorstore = Chroma(
    collection_name="course_content",
    embedding_function=embeddings,
    persist_directory="./chroma_db"
)

Load and chunk documents, then add to the vector store.

from langchain_community.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

loader = TextLoader("./course_notes.txt")
documents = loader.load()

splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
chunks = splitter.split_documents(documents)

vectorstore.add_documents(chunks)
print(f"Indexed {len(chunks)} chunks")

Now build the retrieval and generation chain.

from langchain_ollama import ChatOllama
from langchain.chains import RetrievalQA
from langchain_core.prompts import PromptTemplate

llm = ChatOllama(model="llama3.2", base_url="http://localhost:11434")

retriever = vectorstore.as_retriever(search_kwargs={"k": 3})

prompt = PromptTemplate.from_template("""
Use the following context to answer the question.

Context: {context}

Question: {question}

Answer:""")

qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=retriever,
    chain_type="stuff",  # Stuff all retrieved docs into prompt
    return_source_documents=True
)

result = qa_chain.invoke({"query": "What was the assignment from chapter 3?"})
print(result["result"])
print(f"\nSources: {[doc.metadata for doc in result['source_documents']]}")

The chain_type="stuff" parameter puts all retrieved documents into a single prompt. This works up to ~4 retrieved documents. For larger retrieval sets, use map_reduce which processes documents in batches.

Local verification checkpoint

Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.