01. What is RAG?
RAG stands for Retrieval-Augmented Generation. The name describes the process: retrieve relevant documents, then generate an answer using those documents as context.
The problem RAG solves is fundamental to LLMs. Models are trained on fixed data. Your product documentation, internal policies, and customer records did not exist in the training data. When a user asks about your specific product, the model has two choices: hallucinate an answer or say it does not know.
RAG gives the model a third option: look up the answer in real documents.
The retrieval-generation pipeline
A RAG system has three stages:
Ingestion: Convert documents into a format the system can process. PDFs become text. HTML pages are parsed. Markdown files are read.
Indexing: Split documents into chunks, embed each chunk, and store embeddings in a vector database with references back to the original text.
Query: When a user asks a question, embed the question, find the most similar chunks in the vector database, and return those chunks as context to the LLM.
The LLM then generates an answer using both the user's question and the retrieved context. This is why RAG answers feel grounded: they are explicitly tied to source documents.
Why not just fine-tune?
Fine-tuning teaches a model new patterns by training on examples. It works for style transfer and task specialization. It does not work well for injecting specific facts that change frequently.
Fine-tuning costs GPU time and takes hours to days. Updating a RAG index takes seconds and costs cents. If your product inventory changes daily, fine-tuning is the wrong tool. RAG lets you update the knowledge base without retraining anything.
Concrete example
User question: "What is the return policy for electronics purchased after January 2024?"
Without RAG: The model might output a generic return policy from 2022 training data.
With RAG: The system retrieves chunks discussing the January 2024 electronics return policy update. The LLM generates an answer citing those specific sections. If the policy changes, you update the index. The model does not need retraining.
The three failure modes
RAG quality depends on three things: document quality, chunk quality, and retrieval quality. If your documents are poorly formatted, chunks will be incoherent. If chunks are too large or too small, retrieval will miss relevant content. If your embedding model does not match your domain, similarity search will return wrong documents.
This course teaches you to control all three.
Install the packages you will use in this course: pip install pymupdf chromadb openai tiktoken. Verify the installation by running python -c "import fitz; print(fits.__version__)". If you see a version number, PyMuPDF installed correctly.