mxbai-embed-large-v1
mxbai-embed-large-v1 is a 335M-parameter BERT-large English embedder from Mixedbread AI, producing 1024-dim vectors and supporting both Matryoshka truncation (down to 512/256 dims) and binary/int8 quantization. At release it topped MTEB for sub-1B models with a score of ~64.7 and ships with first-class GGUF, ONNX, and OpenVINO artifacts in the repo.
The under-recognized BGE alternative — same parameter count, slightly better MTEB, and crucially ships GGUF in the repo so llama.cpp pipelines work out of the box. Pair Matryoshka with binary quantization and you get 256-bit vectors that still retrieve respectably. The 512-token ceiling is the only real reason not to default to it for English work.
Overview
mxbai-embed-large-v1 is a 335M-parameter BERT-large English embedder from Mixedbread AI, producing 1024-dim vectors and supporting both Matryoshka truncation (down to 512/256 dims) and binary/int8 quantization. At release it topped MTEB for sub-1B models with a score of ~64.7 and ships with first-class GGUF, ONNX, and OpenVINO artifacts in the repo.
Strengths
- 1024-dim with Matryoshka truncation AND binary quantization — 32x storage reduction at minor quality cost
- MTEB ~64.7 — above BGE-large at the same parameter count
- Ships GGUF + ONNX + OpenVINO in the repo, ready for llama.cpp / DirectML / ONNX Runtime
- Apache-2.0 with no usage restrictions — true commercial-friendly weights
Weaknesses
- 512-token context inherited from BERT-large base — chunking required for documents
- English-only; no multilingual variant in the v1 line
- Newer mxbai-embed-2d/-2 models exist but this v1 is still the canonical mxbai endpoint
- Requires the documented query prefix or recall drops noticeably
Quantization variants
Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.
| Quantization | File size | VRAM required |
|---|---|---|
| Q4_K_M | 0.2 GB | 1 GB |
Get the model
HuggingFace
Original weights
Source repository — direct quantization required.
Hardware that runs this
Cards with enough VRAM for at least one quantization of mxbai-embed-large-v1.
Models worth comparing
Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.
Frequently asked
What's the minimum VRAM to run mxbai-embed-large-v1?
Can I use mxbai-embed-large-v1 commercially?
What's the context length of mxbai-embed-large-v1?
Source: huggingface.co/mixedbread-ai/mxbai-embed-large-v1
Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.
Related — keep moving
Verify mxbai-embed-large-v1 runs on your specific hardware before committing money.