BGE Reranker v2 M3
BGE M3 reranker. Cross-encoder for re-ranking RAG candidates; multilingual.
Positioning
BAAI's BGE Reranker V2 M3 is the canonical companion reranker to BGE-M3 and the default open-weight cross-encoder reranker for production RAG pipelines in 2026. ~568M parameters (XLM-RoBERTa base, same architecture as BGE-M3 but trained as a cross-encoder), 8K context, multilingual coverage matching BGE-M3 (100+ languages). Released under MIT license — fully permissive commercial use. The model takes (query, document) pairs and outputs a relevance score — used as the second stage in retrieve-then-rerank pipelines after fast first-stage retrieval via BGE-M3 or other dense embedders.
Strengths
- Best-in-class open-weight reranker for multilingual RAG pipelines.
- Tight integration with BGE-M3: same architecture base, same multilingual coverage, designed to chain.
- 8K context handling matches BGE-M3 — long-document chunks rerank without truncation issues.
- MIT license = unconstrained commercial use.
- Small + fast. 568M parameters reranks 100s of (query, doc) pairs per second on a single GPU.
- Real quality lift over no-reranker baseline. Adding BGE Reranker V2 M3 to a BGE-M3 retrieval pipeline typically improves NDCG@10 by 8-15% vs dense-only retrieval.
Limitations
- Cross-encoder inference is more expensive than dense retrieval. Each (query, doc) pair requires a forward pass — only practical for re-ranking the top-N (typically 50-200) candidates from first-stage dense retrieval.
- Not as strong as massive proprietary rerankers on specific English-domain tasks. Cohere Rerank 3, voyage-rerank-2, OpenAI's text-rerank API may win on English-only benchmarks.
- Code reranking is not its strength. For code retrieval reranking, specialized code rerankers win.
- Architecture is conservative. Newer cross-encoders may surpass on specific MTEB reranking benchmarks but BGE Reranker V2 M3 remains the default for "good enough" plus open-weight.
Real-world performance
- vs Cohere Rerank 3 (API): Cohere wins on best-in-class English. BGE Reranker V2 M3 wins on cost (self-hosted), multilingual, and unconstrained commercial use.
- vs voyage-rerank-2 (API): voyage-rerank-2 wins on best English domain quality; BGE Reranker V2 M3 wins on cost + multilingual.
- vs no-reranker dense retrieval: 8-15% NDCG@10 improvement on most retrieval tasks. Worth the inference cost for accuracy-sensitive pipelines.
- vs older bge-reranker-large: Strict upgrade with multilingual + 8K context.
Should you run this locally?
Yes if you have any RAG pipeline where retrieval quality matters. The retrieve-then-rerank pattern (BGE-M3 dense retrieval → BGE Reranker V2 M3 cross-encoder reranking → top-K to LLM context) is the canonical open-weight RAG retrieval architecture in 2026.
Pair with: BGE-M3 for first-stage dense retrieval. The combination is the default open-weight RAG retrieval stack.
How it compares
- vs BGE-M3: Different roles. BGE-M3 is the dense embedder (encoder); Reranker V2 M3 is the cross-encoder reranker. Use both in a retrieve-then-rerank pipeline.
- vs older bge-reranker-large: V2 M3 is the strict upgrade — multilingual, 8K context.
- vs Cohere Rerank 3 (API): API wins on English; BGE wins on cost + multilingual + unconstrained license.
- vs cross-encoder/ms-marco-MiniLM-L-12-v2: Older smaller cross-encoder. BGE Reranker V2 M3 strict upgrade.
Run this yourself
- CPU-only: Functional via SentenceTransformers CrossEncoder API. 10-30 pairs/sec on modern CPU.
- Single GPU: Any modern GPU with 4+ GB VRAM. 100-500 pairs/sec on consumer GPU.
- Production: Text Embeddings Inference (TEI) supports rerankers — same serving infrastructure as embeddings.
- Pipeline pattern: BGE-M3 retrieves 100 candidates → BGE Reranker V2 M3 reranks → top-10 to LLM.
- Vendor: BAAI / Hugging Face: BAAI/bge-reranker-v2-m3.
Overview
BGE M3 reranker. Cross-encoder for re-ranking RAG candidates; multilingual.
Family & lineage
How this model relates to others in its lineage. Family members share architecture and training-data roots; parent / children edges record direct distillation or fine-tune relationships.
Strengths
- MIT
- Multilingual reranker
Weaknesses
- Slower than embedding-only ranking
Quantization variants
Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.
| Quantization | File size | VRAM required |
|---|---|---|
| FP16 | 1.1 GB | 2 GB |
Get the model
HuggingFace
Original weights
Source repository — direct quantization required.
Hardware that runs this
Cards with enough VRAM for at least one quantization of BGE Reranker v2 M3.
Frequently asked
What's the minimum VRAM to run BGE Reranker v2 M3?
Can I use BGE Reranker v2 M3 commercially?
What's the context length of BGE Reranker v2 M3?
Source: huggingface.co/BAAI/bge-reranker-v2-m3
Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.
Related — keep moving
Verify BGE Reranker v2 M3 runs on your specific hardware before committing money.