The RAG building blocks the catalog used to bury inside /models. nomic-embed-text, bge-large, mxbai, jina v3, Arctic Embed L v2, gte-modernbert — plus their reranker companions — listed with embedding dim, max-seq, MTEB score, and license tone.
Embeddings turn text into vectors a retrieval index can search. Rerankers re-score the top-K hits a vector search returned. Together they're the unglamorous engine behind every local-RAG stack — and historically the catalog buried them as 'just more models' instead of giving them their own surface.
Local embedding models that work today: nomic-embed-text-v1.5 (137M, 8K context, Matryoshka), bge-large-en-v1.5 (335M, 512 context, MTEB anchor), mxbai-embed-large-v1 (335M, 512 context), jina-embeddings-v3 (570M, 8K context, multilingual). For non-English: Arctic-embed-l-v2.0 and multilingual-e5-large-instruct. For long context retrieval: gte-modernbert-base (8K context with the modernbert architecture).
Rerankers are smaller but matter a lot in practice — feeding the top 20 vector hits through bge-reranker-v2-m3 or jina-reranker-v2 typically lifts NDCG@10 by 5-15 points over vector-only retrieval. Each row below calls out embedding dimension, max sequence length, and license.
all-MiniLM-L6-v2 is a 22M-parameter sentence-transformers embedder producing 384-dim vectors with a 256-token context, distilled from a larger Microsoft MiniLM teacher and fine-tuned on 1B+ sentence pairs across 32 datas
Nomic Embed Text v1.5 is a 137M-parameter English embedding model with an 8192-token context window, trained with Matryoshka Representation Learning so the 768-dim output can be truncated to 64/128/256/512 dims with mini
BGE Large EN v1.5 is the 335M-parameter English flagship from BAAI's FlagEmbedding family, producing 1024-dim embeddings with a 512-token context window. Released in late 2023 under MIT license, it became the de facto MT
all-mpnet-base-v2 is a 109M-parameter sentence-transformers embedder based on Microsoft's MPNet, producing 768-dim vectors with a 384-token context. Trained on the same 1B+ sentence pairs as all-MiniLM-L6-v2 but with a s
BGE M3 reranker. Cross-encoder for re-ranking RAG candidates; multilingual.
paraphrase-multilingual-MiniLM-L12-v2 is a 118M-parameter multilingual sentence-transformers embedder built on a knowledge-distilled MiniLM-L12, producing 384-dim vectors across 50+ languages. It is the long-standing def
Jina Embeddings v3 is a 572M-parameter multilingual encoder with 8192-token context and five task-specific LoRA adapters (retrieval-query, retrieval-passage, separation, classification, text-matching) selectable at infer
Multilingual E5 Large Instruct is a 560M-parameter XLM-RoBERTa-large encoder fine-tuned by Microsoft's intfloat team with task instructions appended to queries, producing 1024-dim embeddings across 100 languages. It scor
mxbai-embed-large-v1 is a 335M-parameter BERT-large English embedder from Mixedbread AI, producing 1024-dim vectors and supporting both Matryoshka truncation (down to 512/256 dims) and binary/int8 quantization. At releas
Arctic Embed L v2.0 is a 568M-parameter multilingual embedder from Snowflake based on XLM-RoBERTa, producing 1024-dim Matryoshka vectors with an 8192-token context. It is the rare commercial-friendly (Apache-2.0) multili
Jina Reranker v2 Base Multilingual is a 278M-parameter cross-encoder from Jina AI with a 1024-token context, trained on 100+ languages plus code and structured data (function-calling JSON, SQL). It is roughly 6x faster t
GTE ModernBERT Base is a 149M-parameter English embedder built on AnswerDotAI's ModernBERT backbone, producing 768-dim vectors with native 8192-token context via alternating local/global attention. It pairs ModernBERT's
E5-Mistral-7B-Instruct is a 7.11B-parameter decoder-based embedder fine-tuned from Mistral-7B by Microsoft's intfloat team, producing 4096-dim embeddings with the model's native 32K context. It uses task-conditioned inst
mxbai-rerank-large-v2 is a 1.54B-parameter listwise reranker from Mixedbread AI built on Qwen2.5-1.5B, supporting 100+ languages and a 32K-token context with native code and instruction-following retrieval awareness. Pub
Each model page has the GGUF/ONNX availability matrix, recommended runtime (Ollama/llama.cpp embed mode, sentence-transformers, fastembed), and the actual recipe operators use. Pair an embedding model with a reranker from this same hub — see the RAG-with-local-embeddings playbook.