RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Learn
  4. /How-to
  5. /How to Apply Metadata Filters to Reduce Search Space
HOW-TO · RAG

How to Apply Metadata Filters to Reduce Search Space

intermediate·15 min·By Fredoline Eruo
Target environment
Ubuntu 24.04 · Ollama 0.4.x
PREREQUISITES

Vector store with metadata support, Python 3.10+

What this does

Metadata filtering narrows vector search results by pre-filtering on fields like date, category, or source before running the similarity search. This improves relevance and reduces latency by limiting the candidate pool.

Steps

  • Index documents with metadata. Attach metadata when adding documents so filters have fields to match against.
from langchain_community.vectorstores import Chroma
from langchain_community.embeddings import OllamaEmbeddings
from langchain.schema import Document

embeddings = OllamaEmbeddings(model="nomic-embed-text")

docs = [
    Document(page_content="Q4 earnings exceeded expectations.",
             metadata={"year": 2025, "quarter": "Q4", "source": "finance"}),
    Document(page_content="New product launch in March.",
             metadata={"year": 2025, "quarter": "Q1", "source": "product"}),
    Document(page_content="Engineering hiring plan for H2.",
             metadata={"year": 2025, "quarter": "H2", "source": "hr"}),
]

vectorstore = Chroma.from_documents(docs, embeddings)
  • Apply metadata filter during retrieval. Pass a filter dict to similarity_search.
results = vectorstore.similarity_search(
    "What happened in Q4?",
    k=3,
    filter={"quarter": "Q4"}
)

for r in results:
    print(r.page_content, r.metadata)
  • Use complex filters with operators. For stores that support it, combine multiple conditions.
# ChromaDB supports $and, $or operators (v0.4+)
filter = {
    "$or": [
        {"source": "finance"},
        {"source": "product"}
    ],
    "year": 2025
}
results = vectorstore.similarity_search("revenue", k=3, filter=filter)
  • Use as retriever in a chain. Pass the filter when building the retriever object.
retriever = vectorstore.as_retriever(
    search_kwargs={"k": 3, "filter": {"source": "finance"}}
)

Verification

python -c "
from langchain_community.vectorstores import Chroma
from langchain_community.embeddings import OllamaEmbeddings
embeddings = OllamaEmbeddings(model='nomic-embed-text')
vs = Chroma(embedding_function=embeddings)
r = vs.similarity_search('test', k=1, filter={'source': 'finance'})
print(f'Results: {len(r)}')
# Expected: Results: <N> (depends on indexed docs)
"

Common failures

  • Metadata field mismatch. Filter references a field name that doesn't exist in the stored metadata. Always inspect a sample document first.
  • Operator syntax varies by store. ChromaDB uses $and/$or, while Qdrant uses must/should. Check your vector store docs.
  • Empty results from over-filtering. Combining too many filter conditions returns zero matches. Start with one filter and layer incrementally.
  • Version mismatch - The installed package or runtime differs from the command shown; check the version first and rerun the smallest verification command.
  • Local environment drift - Another service, virtual environment, model, or path is being used; print the active binary path and configuration before changing the guide steps.

Related guides

  • How to Build RetrievalQA Chain with Sources
  • How to Use Vector Store as Agent Memory
← All how-to guidesCourses →