other
0.335B parameters
Commercial OK
Reviewed May 2026

mxbai-embed-large-v1

mxbai-embed-large-v1 is a 335M-parameter BERT-large English embedder from Mixedbread AI, producing 1024-dim vectors and supporting both Matryoshka truncation (down to 512/256 dims) and binary/int8 quantization. At release it topped MTEB for sub-1B models with a score of ~64.7 and ships with first-class GGUF, ONNX, and OpenVINO artifacts in the repo.

License: apache-2.0·Context: 512 tokens
BLK · VERDICT

Our verdict

OP · Fredoline Eruo|VERIFIED MAY 29, 2026
unrated

The under-recognized BGE alternative — same parameter count, slightly better MTEB, and crucially ships GGUF in the repo so llama.cpp pipelines work out of the box. Pair Matryoshka with binary quantization and you get 256-bit vectors that still retrieve respectably. The 512-token ceiling is the only real reason not to default to it for English work.

Overview

mxbai-embed-large-v1 is a 335M-parameter BERT-large English embedder from Mixedbread AI, producing 1024-dim vectors and supporting both Matryoshka truncation (down to 512/256 dims) and binary/int8 quantization. At release it topped MTEB for sub-1B models with a score of ~64.7 and ships with first-class GGUF, ONNX, and OpenVINO artifacts in the repo.

Strengths

  • 1024-dim with Matryoshka truncation AND binary quantization — 32x storage reduction at minor quality cost
  • MTEB ~64.7 — above BGE-large at the same parameter count
  • Ships GGUF + ONNX + OpenVINO in the repo, ready for llama.cpp / DirectML / ONNX Runtime
  • Apache-2.0 with no usage restrictions — true commercial-friendly weights

Weaknesses

  • 512-token context inherited from BERT-large base — chunking required for documents
  • English-only; no multilingual variant in the v1 line
  • Newer mxbai-embed-2d/-2 models exist but this v1 is still the canonical mxbai endpoint
  • Requires the documented query prefix or recall drops noticeably

Quantization variants

Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.

QuantizationFile sizeVRAM required
Q4_K_M0.2 GB1 GB

Get the model

HuggingFace

Original weights

huggingface.co/mixedbread-ai/mxbai-embed-large-v1

Source repository — direct quantization required.

Hardware that runs this

Cards with enough VRAM for at least one quantization of mxbai-embed-large-v1.

Compare alternatives

Models worth comparing

Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.

Step down
Smaller — faster, runs on weaker hardware
No verdicted models in the next tier down yet.

Frequently asked

What's the minimum VRAM to run mxbai-embed-large-v1?

1GB of VRAM is enough to run mxbai-embed-large-v1 at the Q4_K_M quantization (file size 0.2 GB). Higher-quality quantizations need more.

Can I use mxbai-embed-large-v1 commercially?

Yes — mxbai-embed-large-v1 ships under the apache-2.0, which permits commercial use. Always read the license text before deployment.

What's the context length of mxbai-embed-large-v1?

mxbai-embed-large-v1 supports a context window of 512 tokens (about 1K).

Source: huggingface.co/mixedbread-ai/mxbai-embed-large-v1

Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.

Related — keep moving

Before you buy

Verify mxbai-embed-large-v1 runs on your specific hardware before committing money.