What this does

Metadata filtering narrows vector search results by pre-filtering on fields like date, category, or source before running the similarity search. This improves relevance and reduces latency by limiting the candidate pool.

Steps

Index documents with metadata. Attach metadata when adding documents so filters have fields to match against.

from langchain_community.vectorstores import Chroma
from langchain_community.embeddings import OllamaEmbeddings
from langchain.schema import Document

embeddings = OllamaEmbeddings(model="nomic-embed-text")

docs = [
    Document(page_content="Q4 earnings exceeded expectations.",
             metadata={"year": 2025, "quarter": "Q4", "source": "finance"}),
    Document(page_content="New product launch in March.",
             metadata={"year": 2025, "quarter": "Q1", "source": "product"}),
    Document(page_content="Engineering hiring plan for H2.",
             metadata={"year": 2025, "quarter": "H2", "source": "hr"}),
]

vectorstore = Chroma.from_documents(docs, embeddings)

Apply metadata filter during retrieval. Pass a filter dict to similarity_search.

results = vectorstore.similarity_search(
    "What happened in Q4?",
    k=3,
    filter={"quarter": "Q4"}
)

for r in results:
    print(r.page_content, r.metadata)

Use complex filters with operators. For stores that support it, combine multiple conditions.

# ChromaDB supports $and, $or operators (v0.4+)
filter = {
    "$or": [
        {"source": "finance"},
        {"source": "product"}
    ],
    "year": 2025
}
results = vectorstore.similarity_search("revenue", k=3, filter=filter)

Use as retriever in a chain. Pass the filter when building the retriever object.

retriever = vectorstore.as_retriever(
    search_kwargs={"k": 3, "filter": {"source": "finance"}}
)

Verification

python -c "
from langchain_community.vectorstores import Chroma
from langchain_community.embeddings import OllamaEmbeddings
embeddings = OllamaEmbeddings(model='nomic-embed-text')
vs = Chroma(embedding_function=embeddings)
r = vs.similarity_search('test', k=1, filter={'source': 'finance'})
print(f'Results: {len(r)}')
# Expected: Results: <N> (depends on indexed docs)
"

Common failures

Metadata field mismatch. Filter references a field name that doesn't exist in the stored metadata. Always inspect a sample document first.
Operator syntax varies by store. ChromaDB uses $and/$or, while Qdrant uses must/should. Check your vector store docs.
Empty results from over-filtering. Combining too many filter conditions returns zero matches. Start with one filter and layer incrementally.
Version mismatch - The installed package or runtime differs from the command shown; check the version first and rerun the smallest verification command.
Local environment drift - Another service, virtual environment, model, or path is being used; print the active binary path and configuration before changing the guide steps.

Related guides

How to Build RetrievalQA Chain with Sources
How to Use Vector Store as Agent Memory