Large language models

Embedding (Vector Embedding)

An embedding is a fixed-length vector representation of text, image, or other input — typically 384-3072 dimensions — where semantic similarity corresponds to vector distance. "Cat" and "kitten" land closer in embedding space than "cat" and "airplane".

Embeddings are the backbone of semantic search and RAG. To find documents relevant to a query, you embed both the query and your document chunks, then retrieve the chunks with the smallest cosine distance. This works because the embedding model has been trained to put similar meanings close together.

For local-only RAG: BGE-large (1024-dim), E5-large (1024-dim), or nomic-embed-text-v1.5 (768-dim) all work well, and all run on a 4GB GPU. You don't need a frontier embedding model — the gap between BGE and OpenAI's text-embedding-3 is much smaller than the gap between an 8B and a frontier LLM.

Related terms

Retrieval-Augmented Generation (RAG)

Reviewed by Fredoline Eruo. See our editorial policy.