other
0.022B parameters
Commercial OK
Reviewed May 2026

all-MiniLM-L6-v2

all-MiniLM-L6-v2 is a 22M-parameter sentence-transformers embedder producing 384-dim vectors with a 256-token context, distilled from a larger Microsoft MiniLM teacher and fine-tuned on 1B+ sentence pairs across 32 datasets. It is the canonical default embedder for browser-side RAG, Transformers.js demos, and Chroma's quick-start — the most-downloaded sentence-transformers model on HuggingFace by a wide margin.

License: apache-2.0·Context: 256 tokens
BLK · VERDICT

Our verdict

OP · Fredoline Eruo|VERIFIED MAY 29, 2026
unrated

The 'just embed something' default. all-MiniLM-L6-v2 is what every quick-start tutorial uses for a reason: it fits in 100MB, it runs in any runtime including the browser, and the Apache license has zero friction. For production RAG with quality demands, upgrade to nomic-embed-text-v1.5 or mxbai-embed-large-v1. But for prototypes, in-browser demos, and any deployment where storage cost dominates, this is still the right answer in 2026.

Overview

all-MiniLM-L6-v2 is a 22M-parameter sentence-transformers embedder producing 384-dim vectors with a 256-token context, distilled from a larger Microsoft MiniLM teacher and fine-tuned on 1B+ sentence pairs across 32 datasets. It is the canonical default embedder for browser-side RAG, Transformers.js demos, and Chroma's quick-start — the most-downloaded sentence-transformers model on HuggingFace by a wide margin.

Strengths

  • 22M params — sub-100MB footprint, runs in a browser tab or on a Raspberry Pi Zero
  • 384-dim output is 1/3 the storage cost of 1024-dim BGE/mxbai vectors
  • Apache-2.0 with no acceptable-use restrictions
  • Ubiquitous: Chroma, LangChain, LlamaIndex, Transformers.js, fastembed all ship it as default
  • ONNX, CoreML, and Transformers.js artifacts maintained by Xenova and others

Weaknesses

  • MTEB English score (~56.3) trails BGE/mxbai-large by ~8 points — quality ceiling is real
  • 256-token context is the shortest in the embedder catalog — chunking is mandatory
  • English-only; for multilingual use paraphrase-multilingual-MiniLM-L12-v2
  • Symmetric similarity only — no query/document distinction (use multi-qa-MiniLM if you need that)

Quantization variants

Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.

QuantizationFile sizeVRAM required
Q4_K_M0.0 GB1 GB

Get the model

HuggingFace

Original weights

huggingface.co/sentence-transformers/all-MiniLM-L6-v2

Source repository — direct quantization required.

Hardware that runs this

Cards with enough VRAM for at least one quantization of all-MiniLM-L6-v2.

Compare alternatives

Models worth comparing

Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.

Step up
More capable — bigger memory footprint
Step down
Smaller — faster, runs on weaker hardware
No verdicted models in the next tier down yet.

Frequently asked

What's the minimum VRAM to run all-MiniLM-L6-v2?

1GB of VRAM is enough to run all-MiniLM-L6-v2 at the Q4_K_M quantization (file size 0.0 GB). Higher-quality quantizations need more.

Can I use all-MiniLM-L6-v2 commercially?

Yes — all-MiniLM-L6-v2 ships under the apache-2.0, which permits commercial use. Always read the license text before deployment.

What's the context length of all-MiniLM-L6-v2?

all-MiniLM-L6-v2 supports a context window of 256 tokens (about 0K).

Source: huggingface.co/sentence-transformers/all-MiniLM-L6-v2

Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.

Related — keep moving

Before you buy

Verify all-MiniLM-L6-v2 runs on your specific hardware before committing money.