Embedding & reranker models

Embeddings turn text into vectors a retrieval index can search. Rerankers re-score the top-K hits a vector search returned. Together they're the unglamorous engine behind every local-RAG stack — and historically the catalog buried them as 'just more models' instead of giving them their own surface.

Local embedding models that work today: nomic-embed-text-v1.5 (137M, 8K context, Matryoshka), bge-large-en-v1.5 (335M, 512 context, MTEB anchor), mxbai-embed-large-v1 (335M, 512 context), jina-embeddings-v3 (570M, 8K context, multilingual). For non-English: Arctic-embed-l-v2.0 and multilingual-e5-large-instruct. For long context retrieval: gte-modernbert-base (8K context with the modernbert architecture).

Rerankers are smaller but matter a lot in practice — feeding the top 20 vector hits through bge-reranker-v2-m3 or jina-reranker-v2 typically lifts NDCG@10 by 5-15 points over vector-only retrieval. Each row below calls out embedding dimension, max sequence length, and license.

Other / from-scratch

Building a RAG stack locally?