RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Learn
  4. /How-to
  5. /How to run embedding models for semantic search
HOW-TO · INF

How to run embedding models for semantic search

intermediate·15 min·By Fredoline Eruo
PREREQUISITES

Ollama installed, embedding model pulled

What this does

Embedding models convert text into dense vector representations. These vectors enable semantic search, clustering, and retrieval-augmented generation (RAG) by measuring cosine similarity between queries and documents.

Steps

  1. Generate an embedding for a text query.

    curl -s http://localhost:11434/api/embeddings \
      -d '{"model": "all-minilm", "prompt": "What is machine learning?"}' \
      | jq '.embedding | length'
    

    Expected: Returns a vector of length 384 (all-minilm) or 768 (nomic-embed-text).

  2. Embed multiple documents and compute similarity scores.

    import requests, numpy as np
    
    def embed(text):
        r = requests.post("http://localhost:11434/api/embeddings",
            json={"model": "all-minilm", "prompt": text})
        return np.array(r.json()["embedding"])
    
    docs = ["Python is a programming language", "Cats are mammals",
            "Machine learning uses algorithms"]
    doc_vecs = [embed(d) for d in docs]
    
    query_vec = embed("What language should I learn for AI?")
    scores = [np.dot(query_vec, dv) / (np.linalg.norm(query_vec) * np.linalg.norm(dv)) for dv in doc_vecs]
    best = docs[np.argmax(scores)]
    print(f"Best match: {best} (score: {max(scores):.3f})")
    
  3. Store embeddings in a vector database. Use ChromaDB for local persistence:

    pip install chromadb
    
    import chromadb
    client = chromadb.Client()
    collection = client.create_collection("my_docs")
    collection.add(embeddings=[embed(d).tolist() for d in docs], documents=docs, ids=[f"doc{i}" for i in range(len(docs))])
    results = collection.query(query_embeddings=[query_vec.tolist()], n_results=1)
    print(results["documents"][0])
    
  • Record the local run evidence. Save the exact command, runtime or package version, model name if applicable, and observed output so the result can be reproduced later.

Verification

python semantic_search.py
# Expected output: "Best match: Machine learning uses algorithms (score: 0.87)"
# The most semantically relevant document is returned first

Common failures

  • Model not found: Verify the embedding model is pulled: ollama list. Not all models support the /api/embeddings endpoint.
  • Dimension mismatch: Ensure all vectors come from the same model. Cosine similarity requires same-dimensional vectors.
  • Poor retrieval quality: Some embedding models are better for specific domains. See the comparison guide to select the right one.

Related guides

  • How to compare embedding model performance for your use case
  • How to fine-tune embedding batch sizes for your hardware
RELATED GUIDES
INF
How to compare embedding model performance for your use case
INF
How to fine-tune embedding batch sizes for your hardware
← All how-to guidesCourses →