02. Retrieval Metrics: Hit Rate

Chapter 2 of 18 · 15 min

Hit Rate is the simplest retrieval metric. Given a query, the system returns a ranked list of documents. If at least one relevant document appears in that list, the query scores as a hit. Otherwise, it scores as a miss.

The calculation requires labeled data: for each query, someone must mark which documents are relevant. This ground truth determines what counts as success. A query asking about "refund policies" has relevant documents regardless of whether those documents use the word "refund" explicitly. Ground truth must reflect actual information need, not keyword overlap.

def hit_rate(results: list[list[str]], relevance: list[set[str]]) -> float:
    """
    Calculate hit rate across queries.
    
    Args:
        results: List of ranked doc IDs for each query
        relevance: List of sets containing relevant doc IDs for each query
    Returns:
        Fraction of queries with at least one hit
    """
    hits = 0
    for retrieved, relevant in zip(results, relevance):
        if any(doc_id in relevant for doc_id in retrieved):
            hits += 1
    return hits / len(results)


# Example usage
results = [
    ["doc_42", "doc_18", "doc_7"],   # Query 1: hit (doc_42 is relevant)
    ["doc_99", "doc_12", "doc_3"],   # Query 2: miss (no relevant docs)
    ["doc_55", "doc_55", "doc_0"],   # Query 3: hit (doc_55 is relevant)
]

relevance = [
    {"doc_42", "doc_55"},  # Relevant docs for query 1
    {"doc_77"},            # Relevant docs for query 2
    {"doc_55"},            # Relevant docs for query 3
]

print(f"Hit Rate: {hit_rate(results, relevance)}")  # 0.667

Hit Rate treats all hits equally—the first document or the fifth, it does not matter. This works when any relevant document enables answering the query. For FAQ-style systems where one answer exists, hit rate captures most of what matters.

The metric fails when document order matters. A system returning the wrong document first scores identically to one returning the right document first. This limitation matters for RAG, where the first retrieved document often receives disproportionate attention in generation.

Choose Hit Rate when retrieval serves primarily as a recall mechanism and when downstream generation can tolerate or recover from imperfect ranking. Combine it with position-aware metrics when ranking quality matters.

EXERCISE

Implement a function that computes hit rate at different k values (k=1, k=3, k=10) from your retrieval system. Run it on your seed query set and observe how scores change. If k=1 and k=10 scores differ substantially, ranking matters for your use case.