18. Multi-Modal RAG

Chapter 18 of 24 · 25 min
EXERCISE

Extract images from a PDF, use a vision model to describe them, index with text chunks, and run a query that retrieves both text and image content. Verify the generated answer references both. (15 min)