Recall

Recall measures the fraction of relevant items that a retrieval or classification system successfully finds. In local AI, recall appears in two contexts: (1) RAG retrieval, where it tracks how many of the truly relevant documents the retriever returns from a vector database, and (2) classification tasks, where it measures true positives divided by all actual positives. High recall means few relevant items are missed, but it often trades off against precision (more false positives). Operators tune this tradeoff via retrieval parameters (e.g., top_k, similarity threshold) or classification thresholds. Recall matters because a RAG pipeline with low recall misses documents the LLM needs, degrading answer quality.

In a RAG pipeline using ChromaDB with all-MiniLM-L6-v2 embeddings, recall@10 measures how many of the 5 truly relevant documents appear in the top 10 retrieved chunks. If only 3 of 5 are returned, recall is 0.6. Operators adjust top_k from 5 to 20 to improve recall, but this increases context size and latency.

When evaluating a RAG pipeline with ragas, the context_recall metric compares retrieved chunks against a ground-truth set. In LM Studio, operators set the 'Number of chunks to retrieve' slider (top_k) in the RAG settings; raising it from 3 to 10 improves recall but may dilute the LLM's context window.

Reviewed by Fredoline Eruo. See our editorial policy.

When it doesn't work

Practical example

Workflow example