by BAAI (Beijing Academy of AI)
BAAI's open-weight embedding family. BGE-M3 is the canonical multilingual embedding model in 2026 (100+ languages, 8K context); BGE Reranker V2 M3 is the canonical companion cross-encoder reranker. The default open-weight RAG retrieval stack.
Start with BGE-M3 via sentence-transformers on any GPU — BGE-M3 is the best open-weight multilingual embedding model, generating 1024-dim embeddings that support dense retrieval, sparse (lexical) retrieval, and multi-vector (ColBERT-style) retrieval in a single model. It covers 100+ languages and achieves the highest MTEB retrieval score among open-weight embedding models at its size (568M params). The model is small: FP16 ~1.1 GB VRAM — runs on any GPU with 2 GB+ VRAM including integrated graphics. For English-only retrieval, BGE-large-en-v1.5 (335M params, 1024-dim) outperforms BGE-M3 on English MTEB by ~1.5 points at half the size. For re-ranking, BGE-Reranker-v2-m3 is the companion cross-encoder re-ranker — pair M3 retrieval + M3 re-rank for two-stage retrieval pipelines. MIT license.
For single-user RAG: sentence-transformers + BGE-M3 FP16 on RTX 3060 12GB — ~200 docs/second encoding throughput, 8K token max input. For production serving: Infinity embedding server with BGE-M3 on L4 24 GB — serves ~500 embeddings/second at batch 32 with continuous batching. For CPU-only: llama.cpp embedding server with BGE-M3 GGUF Q8_0 on Apple M3 — ~80 embeddings/second via Metal. For multi-stage RAG: deploy BGE-M3 for first-pass dense retrieval, then BGE-Reranker-v2-m3 as cross-encoder re-ranker on top-100 candidates — this two-stage combo achieves +12% nDCG@10 vs dense-only on BEIR. For sparse retrieval: BGE-M3 outputs sparse token weights natively (no separate BM25 index needed) — use the sparse_vector output for hybrid dense+sparse retrieval (another +5% nDCG@10). BGE-M3 uses the standard BERT-base tokenizer — input truncation at 8192 tokens.
Models in this family with our verdicts
Verify BGE (BAAI General Embedding) runs on your specific hardware before committing money.