What this does

Embedding models convert text into dense vector representations. These vectors enable semantic search, clustering, and retrieval-augmented generation (RAG) by measuring cosine similarity between queries and documents.

Steps

Generate an embedding for a text query.

curl -s http://localhost:11434/api/embeddings \
  -d '{"model": "all-minilm", "prompt": "What is machine learning?"}' \
  | jq '.embedding | length'

Expected: Returns a vector of length 384 (all-minilm) or 768 (nomic-embed-text).

Embed multiple documents and compute similarity scores.

import requests, numpy as np

def embed(text):
    r = requests.post("http://localhost:11434/api/embeddings",
        json={"model": "all-minilm", "prompt": text})
    return np.array(r.json()["embedding"])

docs = ["Python is a programming language", "Cats are mammals",
        "Machine learning uses algorithms"]
doc_vecs = [embed(d) for d in docs]

query_vec = embed("What language should I learn for AI?")
scores = [np.dot(query_vec, dv) / (np.linalg.norm(query_vec) * np.linalg.norm(dv)) for dv in doc_vecs]
best = docs[np.argmax(scores)]
print(f"Best match: {best} (score: {max(scores):.3f})")

Store embeddings in a vector database. Use ChromaDB for local persistence:

pip install chromadb

import chromadb
client = chromadb.Client()
collection = client.create_collection("my_docs")
collection.add(embeddings=[embed(d).tolist() for d in docs], documents=docs, ids=[f"doc{i}" for i in range(len(docs))])
results = collection.query(query_embeddings=[query_vec.tolist()], n_results=1)
print(results["documents"][0])

Record the local run evidence. Save the exact command, runtime or package version, model name if applicable, and observed output so the result can be reproduced later.

Verification

python semantic_search.py
# Expected output: "Best match: Machine learning uses algorithms (score: 0.87)"
# The most semantically relevant document is returned first

Common failures

Model not found: Verify the embedding model is pulled: ollama list. Not all models support the /api/embeddings endpoint.
Dimension mismatch: Ensure all vectors come from the same model. Cosine similarity requires same-dimensional vectors.
Poor retrieval quality: Some embedding models are better for specific domains. See the comparison guide to select the right one.

How to run embedding models for semantic search

What this does

Steps

Verification

Common failures

Related guides