RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Learn
  4. /Courses
  5. /RAG Systems: Part 1
COURSE · FND · B013

RAG Systems: Part 1

Learn rag systems: part 1 through RunLocalAI's practical lens: rag, retrieval, chunking and ingestion, hardware fit, runtime settings, verification habits and local-vs-cloud tradeoffs.

22 chapters·10h·Foundations track·By Fredoline Eruo
PREREQUISITES
  • B011
  • B012

Why this course matters

RAG Systems: Part 1 is for new local AI users who need clean mental models before changing settings. It connects rag, retrieval, chunking, ingestion and pipeline to the questions RunLocalAI wants every reader to answer before they install, upgrade or scale a model: will it run, what will it cost in memory, what setting changes the result, and how do you verify the answer instead of trusting a demo?

What you will be able to do

By the end, you should be able to explain the main tradeoffs in plain language, choose a safe next experiment, and use the chapter exercises as a repeatable operator checklist. The course favors local evidence, hardware fit, context limits, latency and failure modes over generic AI vocabulary.

How to use this course

Start at chapter one if the topic is new. If you already have a working stack, scan for chapters such as What is RAG?, RAG Architecture Overview, PDF Ingestion with PyMuPDF and HTML Ingestion and use those lessons as a quality-control pass before changing a workstation, team workflow or production-like local deployment.

CHAPTERS
  1. 01What is RAG?RAG gives LLMs real-time access to your documents by retrieving relevant chunks at query time instead of relying on training data. ```python # The three stages in pseudo-code documents = ingest("your/documents/") chunks = chunk(documents) index(chunks) context = retrieve("user query") answer = generate("user query", context) ```15 min
  2. 02RAG Architecture OverviewRAG architecture flows from documents through loading, chunking, embedding, and storage to retrieval, with metadata tracking provenance at every step. ```python # Class skeleton showing the architecture class RAGPipeline: def __init__(self): self.loader = None # Document loader self.splitter = None # Text splitter self.embedding_model = None # Embedding model self.vector_store = None # ChromaDB def ingest(self, documents): texts = self.loader.load(documents) chunks = self.splitter.split(texts) embeddings = self.embedding_model.embed(chunks) self.vector_store.add(chunks, embeddings) def query(self, user_query, top_k=5): query_embedding = self.embedding_model.embed([user_query]) results = self.vector_store.similarity_search(query_embedding, k=top_k) return results ```20 min
  3. 03PDF Ingestion with PyMuPDFPDF extraction quality depends on layout analysis. Use position-based sorting for multi-column documents and validate output to catch encoding and OCR failures. ```python import fitz # Minimal working example doc = fitz.open("document.pdf") for page in doc: print(page.get_text()) doc.close() ```25 min
  4. 04HTML IngestionHTML's semantic structure lets you extract content by heading boundaries, preserving contextual relationships that PDFs lack. ```python from bs4 import BeautifulSoup # Minimal working example html = "<html><body><h1>Title</h1><p>Content here</p></body></html>" soup = BeautifulSoup(html, "lxml") print(soup.find("p").get_text()) ```25 min
  5. 05Markdown IngestionMarkdown's heading syntax maps directly to document hierarchy, making heading-based chunking natural and semantically coherent. ```python # Minimal working example from pathlib import Path md_content = Path("readme.md").read_text() sections = [line for line in md_content.split("\n") if line.startswith("#")] print(sections) ```25 min
  6. 06Fixed-Size ChunkingFixed-size chunking is fast but ignores semantic boundaries, often splitting paragraphs and separating headers from their content. ```python import tiktoken encoder = tiktoken.get_encoding("cl100k_base") text = "Hello world" tokens = encoder.encode(text) print(f"Token count: {len(tokens)}") # Output: 2 ```25 min
  7. 07Semantic ChunkingSemantic chunking keeps related sentences together by measuring embedding similarity, producing internally coherent chunks even when token counts vary. ```python from sentence_transformers import SentenceTransformer encoder = SentenceTransformer("all-MiniLM-L6-v2") sentences = ["The cat sat on the mat.", "It was a sunny day."] embeddings = encoder.encode(sentences) print(f"Embedding shape: {embeddings.shape}") # (2, 384) ```25 min
  8. 08Recursive Character SplitterRecursive character splitting respects document structure (paragraphs, lines) while guaranteeing chunk sizes stay within limits by falling back to smaller separators. ```python # Minimal working example text = "Paragraph one.\n\nParagraph two.\n\nParagraph three." separator = "\n\n" chunks = text.split(separator) print(chunks) # ['Paragraph one.', 'Paragraph two.', 'Paragraph three.'] ```30 min
  9. 09Document Metadata ExtractionMetadata turns retrieval from keyword matching into intelligent filtering, enabling queries like "only documents from this year" or "only from this section." ```python # Minimal metadata example chunk = { "text": "The return policy...", "metadata": { "source": "policies.pdf", "year": 2024, "section": "electronics" } } ```30 min
  10. 10Embedding PipelineEmbedding quality determines retrieval quality. Batch by token limits to prevent context overflow, and normalize vectors for consistent similarity calculations. ```python from sentence_transformers import SentenceTransformer model = SentenceTransformer("all-MiniLM-L6-v2") emb = model.encode("Hello world") print(f"Embedding: {emb[:5]}... (384 dims)") ```30 min
  11. 11Storing Embeddings in ChromaDBChromaDB stores embeddings alongside metadata, enabling fast similarity search with metadata filtering. Batch insertion and proper indexing are essential for handling large document sets. ```python import chromadb client = chromadb.Client() collection = client.get_or_create_collection("test") collection.add(ids=["1"], embeddings=[[1.0, 2.0]], documents=["hello"]) print(collection.query(query_embeddings=[[1.0, 2.0]], n_results=1)) ```30 min
  12. 12Retrieval StrategiesHybrid search with reranking consistently outperforms any single retrieval method across diverse query types.20 min
  13. 13Dense RetrievalDense retrieval quality depends more on embedding model choice and fine-tuning than on index parameters.20 min
  14. 14Sparse Retrieval (BM25)BM25 excels at exact term queries but requires hybrid pairing with dense retrieval to handle semantic queries effectively.20 min
  15. 15Context AssemblyContext assembly quality matters as much as retrieval quality. Well-organized context prevents hallucination from confusing source ordering.20 min
  16. 16Prompt with Retrieved ContextExplicit citation requirements in prompts reduce hallucination by forcing the model to explicitly attribute claims to retrieved context.25 min
  17. 17Basic Generation PipelinePipeline quality depends on weakest link. Optimize retrieval quality first - generation cannot fix poor context.20 min
  18. 18RAG Evaluation: Hit RateSet hit rate targets based on application tolerance for missed information, not arbitrary thresholds.25 min
  19. 19RAG Evaluation: MRRMRR captures ranking quality. Systems with high hit rate but low MRR retrieve relevant content but rank it poorly - reranking fixes this.25 min
  20. 20Common RAG Failures80% of RAG failures trace to retrieval problems, not generation problems. Debug retrieval first before adjusting prompts or models.25 min
  21. 21RAG Pipeline OptimizationOptimize the bottleneck stage first. For most RAG systems, LLM generation is the bottleneck - switch to faster models first before optimizing retrieval.25 min
  22. 22Part 1 Final ProjectThis capstone integrates all course concepts. A well-organized pipeline with proper configuration management is more maintainable than clever one-liners.30 min
← All coursesStart chapter 1 →