RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
BLK · EMBEDDINGSretrieval · RAG · rerank

Embedding & reranker models

The RAG building blocks the catalog used to bury inside /models. nomic-embed-text, bge-large, mxbai, jina v3, Arctic Embed L v2, gte-modernbert — plus their reranker companions — listed with embedding dim, max-seq, MTEB score, and license tone.

Models curated
14
Vendors
8
Commercial OK
12/14
Benchmarked
0/14

Embeddings turn text into vectors a retrieval index can search. Rerankers re-score the top-K hits a vector search returned. Together they're the unglamorous engine behind every local-RAG stack — and historically the catalog buried them as 'just more models' instead of giving them their own surface.

Local embedding models that work today: nomic-embed-text-v1.5 (137M, 8K context, Matryoshka), bge-large-en-v1.5 (335M, 512 context, MTEB anchor), mxbai-embed-large-v1 (335M, 512 context), jina-embeddings-v3 (570M, 8K context, multilingual). For non-English: Arctic-embed-l-v2.0 and multilingual-e5-large-instruct. For long context retrieval: gte-modernbert-base (8K context with the modernbert architecture).

Rerankers are smaller but matter a lot in practice — feeding the top 20 vector hits through bge-reranker-v2-m3 or jina-reranker-v2 typically lifts NDCG@10 by 5-15 points over vector-only retrieval. Each row below calls out embedding dimension, max sequence length, and license.

FAM · OTHER

Other / from-scratch

14 models
all-MiniLM-L6-v2
22M params · sentence-transformers
▸ Browser-side and edge RAG where 100MB footprint + Apache-2.0 + 384-dim storage win over quality

all-MiniLM-L6-v2 is a 22M-parameter sentence-transformers embedder producing 384-dim vectors with a 256-token context, distilled from a larger Microsoft MiniLM teacher and fine-tuned on 1B+ sentence pairs across 32 datas

License
apache-2.0 · OK
Context
0K
Nomic Embed Text v1.5
137M params · Nomic AI
▸ Long-context RAG retrieval with adjustable 64-768 dim Matryoshka embeddings on edge hardware

Nomic Embed Text v1.5 is a 137M-parameter English embedding model with an 8192-token context window, trained with Matryoshka Representation Learning so the 768-dim output can be truncated to 64/128/256/512 dims with mini

License
apache-2.0 · OK
Context
8K
BGE Large EN v1.5
335M params · BAAI
▸ Short-passage English retrieval where the 1024-dim BGE ecosystem (rerankers, fine-tunes) is already in use

BGE Large EN v1.5 is the 335M-parameter English flagship from BAAI's FlagEmbedding family, producing 1024-dim embeddings with a 512-token context window. Released in late 2023 under MIT license, it became the de facto MT

License
mit · OK
Context
1K
all-mpnet-base-v2
109M params · sentence-transformers
▸ Default 768-dim English embedder for general-purpose retrieval with a vector DB

all-mpnet-base-v2 is a 109M-parameter sentence-transformers embedder based on Microsoft's MPNet, producing 768-dim vectors with a 384-token context. Trained on the same 1B+ sentence pairs as all-MiniLM-L6-v2 but with a s

License
apache-2.0 · OK
Context
0K
BGE Reranker v2 M3
570M params · BAAI
▸ RAG reranker

BGE M3 reranker. Cross-encoder for re-ranking RAG candidates; multilingual.

License
MIT · OK
Context
8K
paraphrase-multilingual-MiniLM-L12-v2
118M params · sentence-transformers
▸ Multilingual semantic similarity in browser or edge where Arctic-l-v2 is too heavy

paraphrase-multilingual-MiniLM-L12-v2 is a 118M-parameter multilingual sentence-transformers embedder built on a knowledge-distilled MiniLM-L12, producing 384-dim vectors across 50+ languages. It is the long-standing def

License
apache-2.0 · OK
Context
0K
Jina Embeddings v3
572M params · Jina AI
▸ Multilingual RAG with task-switched LoRA adapters — research and non-commercial deployments only

Jina Embeddings v3 is a 572M-parameter multilingual encoder with 8192-token context and five task-specific LoRA adapters (retrieval-query, retrieval-passage, separation, classification, text-matching) selectable at infer

License
cc-by-nc-4.0
Context
8K
Multilingual E5 Large Instruct
560M params · Microsoft (intfloat)
▸ Short-passage multilingual RAG with MIT license requirement and chunking pipeline already in place

Multilingual E5 Large Instruct is a 560M-parameter XLM-RoBERTa-large encoder fine-tuned by Microsoft's intfloat team with task instructions appended to queries, producing 1024-dim embeddings across 100 languages. It scor

License
mit · OK
Context
1K
mxbai-embed-large-v1
335M params · Mixedbread AI
▸ English RAG with binary-quantized 1024-dim vectors for max storage efficiency on edge devices

mxbai-embed-large-v1 is a 335M-parameter BERT-large English embedder from Mixedbread AI, producing 1024-dim vectors and supporting both Matryoshka truncation (down to 512/256 dims) and binary/int8 quantization. At releas

License
apache-2.0 · OK
Context
1K
Snowflake Arctic Embed L v2.0
568M params · Snowflake
▸ Commercial multilingual RAG where Apache-2.0 license is required and jina-v3's CC-BY-NC is a blocker

Arctic Embed L v2.0 is a 568M-parameter multilingual embedder from Snowflake based on XLM-RoBERTa, producing 1024-dim Matryoshka vectors with an 8192-token context. It is the rare commercial-friendly (Apache-2.0) multili

License
apache-2.0 · OK
Context
8K
Jina Reranker v2 Base Multilingual
278M params · Jina AI
▸ Latency-sensitive multilingual reranking (including code/function-calling) for non-commercial use

Jina Reranker v2 Base Multilingual is a 278M-parameter cross-encoder from Jina AI with a 1024-token context, trained on 100+ languages plus code and structured data (function-calling JSON, SQL). It is roughly 6x faster t

License
cc-by-nc-4.0
Context
1K
GTE ModernBERT Base
149M params · Alibaba NLP
▸ Edge English RAG where 8K context, 64+ MTEB, and sub-200M parameter footprint are all required

GTE ModernBERT Base is a 149M-parameter English embedder built on AnswerDotAI's ModernBERT backbone, producing 768-dim vectors with native 8192-token context via alternating local/global attention. It pairs ModernBERT's

License
apache-2.0 · OK
Context
8K
E5 Mistral 7B Instruct
7.11B params · Microsoft (intfloat)
▸ Maximum-quality English retrieval where GPU budget is available and instruction-conditioning matters

E5-Mistral-7B-Instruct is a 7.11B-parameter decoder-based embedder fine-tuned from Mistral-7B by Microsoft's intfloat team, producing 4096-dim embeddings with the model's native 32K context. It uses task-conditioned inst

License
mit · OK
Context
32K
mxbai-rerank-large-v2
1.54B params · Mixedbread AI
▸ High-accuracy reranking for English+multilingual RAG when GPU budget allows a 1.5B decoder pass

mxbai-rerank-large-v2 is a 1.54B-parameter listwise reranker from Mixedbread AI built on Qwen2.5-1.5B, supporting 100+ languages and a 32K-token context with native code and instruction-following retrieval awareness. Pub

License
apache-2.0 · OK
Context
32K
COVERAGE

Building a RAG stack locally?

Each model page has the GGUF/ONNX availability matrix, recommended runtime (Ollama/llama.cpp embed mode, sentence-transformers, fastembed), and the actual recipe operators use. Pair an embedding model with a reranker from this same hub — see the RAG-with-local-embeddings playbook.