other
0.57B parameters
Commercial OK
Reviewed June 2026

BGE M3

BAAI's multilingual embedding flagship. Dense + sparse + ColBERT-style multi-vector. The de-facto open multilingual embedding pick.

License: MIT·Released Jan 30, 2024·Context: 8,192 tokens
BLK · VERDICT

Our verdict

OP · Fredoline Eruo|VERIFIED JUN 12, 2026
unrated

Positioning

BAAI's BGE-M3 (Multi-Functionality, Multi-Linguality, Multi-Granularity) is the canonical open-weight embedding model in 2026 — the model that essentially replaced OpenAI text-embedding-ada-002 as the default for self-hosted RAG pipelines. ~568M parameters (XLM-RoBERTa base architecture), 8192 token context, supports 100+ languages. Released under MIT license — fully permissive commercial use. The model produces three output formats simultaneously: dense embeddings (1024-dim), multi-vector embeddings (ColBERT-style late interaction), and sparse lexical embeddings — making it uniquely flexible for hybrid retrieval pipelines.

Strengths

  • Best-in-class multilingual retrieval. Genuinely strong on 100+ languages — Arabic, Chinese, Japanese, Korean, Russian, Spanish, French, German, Hindi all well-supported.
  • 8K context is uncommon for embeddings. Most open-weight embedders cap at 512 tokens; BGE-M3's 8K window enables long-document chunk retrieval without aggressive splitting.
  • Three retrieval modes simultaneously. Dense + multi-vector + sparse from one forward pass — your pipeline can hybrid-rank without running multiple models.
  • MIT license = unconstrained commercial use.
  • Small + fast. 568M parameters runs at 1000+ docs/second on single CPU + GPU, no expensive serving infrastructure needed.
  • Strong on the MTEB benchmark for retrieval, similarity, and classification — competitive with much larger embedding models.

Limitations

  • Not as strong as massive embedding models on specific English-only domain tasks. OpenAI text-embedding-3-large + Cohere embed-english-v3.0 still win on MTEB English subset.
  • Code embeddings are not its strength. For code retrieval, voyage-code-3 or specialized code embedders win.
  • Reranker is a separate model. BGE Reranker V2 M3 is the canonical companion reranker — pipelines need both for best results.
  • Older XLM-RoBERTa base means architecture is conservative — newer transformer-based embedders may surpass on specific benchmarks.

Real-world performance

  • vs OpenAI text-embedding-3-small (API): BGE-M3 is competitive on multilingual + comparable on English at ~free self-hosted vs $0.02/1M tokens API. Self-hosted economics dominate at any scale.
  • vs Cohere embed-multilingual-v3.0 (API): Comparable multilingual quality, BGE-M3 wins on cost (self-hosted) and 8K context.
  • vs e5-large-v2: Older open-weight embedder. BGE-M3 strict upgrade on multilingual + context length.
  • vs voyage-3-lite (API): Voyage AI wins on English domain-specific quality but BGE-M3 wins on cost + multilingual + flexibility.

Should you run this locally?

Yes if you have any RAG / search / similarity / classification pipeline. BGE-M3 is the canonical answer for "what embedding model should I self-host" in 2026 — there is essentially no scenario where you should pay OpenAI / Cohere embedding API fees instead of running BGE-M3 unless you specifically need the very-best English-only performance and money is no object.

Pair with: BGE Reranker V2 M3 for retrieve-then-rerank pipelines. The combination is the canonical open-weight RAG retrieval stack.

How it compares

  • vs BGE Reranker V2 M3: Different roles. BGE-M3 is the encoder/embedder; Reranker V2 is the cross-encoder reranker. Use both in a retrieve-then-rerank pipeline.
  • vs older bge-large-en: BGE-M3 is the strict upgrade — multilingual, longer context, three modes simultaneously.
  • vs e5-mistral-7b-instruct: e5-mistral-7b is a 7B-parameter LLM-based embedder — much heavier inference, marginal quality wins.
  • vs OpenAI text-embedding-3-large (API): API wins on best English quality; BGE-M3 wins on cost + multilingual + open-weight.

Run this yourself

  • CPU-only: Functional via llama.cpp or SentenceTransformers. ~50-150 docs/sec on modern CPU.
  • Single GPU: Any modern GPU with 4+ GB VRAM. ~1000-3000 docs/sec on consumer GPU.
  • vLLM not the right tool — embeddings serve well via Text Embeddings Inference (TEI) by Hugging Face.
  • Production: TEI server + your favorite vector DB (Qdrant, pgvector, Weaviate).
  • Vendor: BAAI / Hugging Face: BAAI/bge-m3.

Overview

BAAI's multilingual embedding flagship. Dense + sparse + ColBERT-style multi-vector. The de-facto open multilingual embedding pick.

Family & lineage

How this model relates to others in its lineage. Family members share architecture and training-data roots; parent / children edges record direct distillation or fine-tune relationships.

Family siblings (bge)
BGE Reranker v2 M30.57B
Edge
BGE M30.57B
You are here
Distilled / fine-tuned from this

Strengths

  • MIT license
  • Multilingual
  • Dense + sparse + multi-vector

Weaknesses

  • No instruction-tuned variant

Quantization variants

Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.

QuantizationFile sizeVRAM required
FP161.1 GB2 GB

Get the model

HuggingFace

Original weights

huggingface.co/BAAI/bge-m3

Source repository — direct quantization required.

Hardware that runs this

Cards with enough VRAM for at least one quantization of BGE M3.

Compare alternatives

Models worth comparing

Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.

Step down
Smaller — faster, runs on weaker hardware
No verdicted models in the next tier down yet.

Frequently asked

What's the minimum VRAM to run BGE M3?

2GB of VRAM is enough to run BGE M3 at the FP16 quantization (file size 1.1 GB). Higher-quality quantizations need more.

Can I use BGE M3 commercially?

Yes — BGE M3 ships under the MIT, which permits commercial use. Always read the license text before deployment.

What's the context length of BGE M3?

BGE M3 supports a context window of 8,192 tokens (about 8K).

Source: huggingface.co/BAAI/bge-m3

Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.

Related — keep moving

Alternatives
Before you buy

Verify BGE M3 runs on your specific hardware before committing money.