RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Learn
  4. /Courses
  5. /RAG Systems: Part 1
  6. /Ch. 1
RAG Systems: Part 1

01. What is RAG?

Chapter 1 of 22 · 15 min
KEY INSIGHT

RAG gives LLMs real-time access to your documents by retrieving relevant chunks at query time instead of relying on training data. ```python # The three stages in pseudo-code documents = ingest("your/documents/") chunks = chunk(documents) index(chunks) context = retrieve("user query") answer = generate("user query", context) ```

RAG stands for Retrieval-Augmented Generation. The name describes the process: retrieve relevant documents, then generate an answer using those documents as context.

The problem RAG solves is fundamental to LLMs. Models are trained on fixed data. Your product documentation, internal policies, and customer records did not exist in the training data. When a user asks about your specific product, the model has two choices: hallucinate an answer or say it does not know.

RAG gives the model a third option: look up the answer in real documents.

The retrieval-generation pipeline

A RAG system has three stages:

  1. Ingestion: Convert documents into a format the system can process. PDFs become text. HTML pages are parsed. Markdown files are read.

  2. Indexing: Split documents into chunks, embed each chunk, and store embeddings in a vector database with references back to the original text.

  3. Query: When a user asks a question, embed the question, find the most similar chunks in the vector database, and return those chunks as context to the LLM.

The LLM then generates an answer using both the user's question and the retrieved context. This is why RAG answers feel grounded: they are explicitly tied to source documents.

Why not just fine-tune?

Fine-tuning teaches a model new patterns by training on examples. It works for style transfer and task specialization. It does not work well for injecting specific facts that change frequently.

Fine-tuning costs GPU time and takes hours to days. Updating a RAG index takes seconds and costs cents. If your product inventory changes daily, fine-tuning is the wrong tool. RAG lets you update the knowledge base without retraining anything.

Concrete example

User question: "What is the return policy for electronics purchased after January 2024?"

Without RAG: The model might output a generic return policy from 2022 training data.

With RAG: The system retrieves chunks discussing the January 2024 electronics return policy update. The LLM generates an answer citing those specific sections. If the policy changes, you update the index. The model does not need retraining.

The three failure modes

RAG quality depends on three things: document quality, chunk quality, and retrieval quality. If your documents are poorly formatted, chunks will be incoherent. If chunks are too large or too small, retrieval will miss relevant content. If your embedding model does not match your domain, similarity search will return wrong documents.

This course teaches you to control all three.

EXERCISE

Install the packages you will use in this course: pip install pymupdf chromadb openai tiktoken. Verify the installation by running python -c "import fitz; print(fits.__version__)". If you see a version number, PyMuPDF installed correctly.

← Overview
RAG Systems: Part 1
Chapter 2 →
RAG Architecture Overview