14. Simple RAG Pipeline
Retrieval-Augmented Generation combines document retrieval with LLM generation. The pipeline: embed documents, store in vector database, retrieve relevant chunks for user query, pass chunks to LLM with the question.
First, set up the embedding and vector store.
from langchain_ollama import OllamaEmbeddings
from langchain_chroma import Chroma
embeddings = OllamaEmbeddings(
model="nomic-embed-text",
base_url="http://localhost:11434"
)
vectorstore = Chroma(
collection_name="course_content",
embedding_function=embeddings,
persist_directory="./chroma_db"
)
Load and chunk documents, then add to the vector store.
from langchain_community.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
loader = TextLoader("./course_notes.txt")
documents = loader.load()
splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
chunks = splitter.split_documents(documents)
vectorstore.add_documents(chunks)
print(f"Indexed {len(chunks)} chunks")
Now build the retrieval and generation chain.
from langchain_ollama import ChatOllama
from langchain.chains import RetrievalQA
from langchain_core.prompts import PromptTemplate
llm = ChatOllama(model="llama3.2", base_url="http://localhost:11434")
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})
prompt = PromptTemplate.from_template("""
Use the following context to answer the question.
Context: {context}
Question: {question}
Answer:""")
qa_chain = RetrievalQA.from_chain_type(
llm=llm,
retriever=retriever,
chain_type="stuff", # Stuff all retrieved docs into prompt
return_source_documents=True
)
result = qa_chain.invoke({"query": "What was the assignment from chapter 3?"})
print(result["result"])
print(f"\nSources: {[doc.metadata for doc in result['source_documents']]}")
The chain_type="stuff" parameter puts all retrieved documents into a single prompt. This works up to ~4 retrieved documents. For larger retrieval sets, use map_reduce which processes documents in batches.
Local verification checkpoint
Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.
Local verification checkpoint
Run the smallest example from this chapter in a local workspace and record the package version, runtime, data path, and observed output. If the result depends on model size, vector count, CPU/GPU backend, or available memory, note that constraint beside the exercise so the lesson remains reproducible.
Create a RAG pipeline using three separate text files as sources. Query it with a question answerable by only one file, then verify the source documents come from that file.