other

0.118B parameters

Commercial OK

Reviewed May 2026

paraphrase-multilingual-MiniLM-L12-v2

paraphrase-multilingual-MiniLM-L12-v2 is a 118M-parameter multilingual sentence-transformers embedder built on a knowledge-distilled MiniLM-L12, producing 384-dim vectors across 50+ languages. It is the long-standing default for multilingual semantic similarity work and the multilingual companion to all-MiniLM-L6-v2.

License: apache-2.0·Context: 128 tokens

BLK · VERDICT

Our verdict

OP · Fredoline Eruo|VERIFIED MAY 29, 2026

unrated

The pragmatic pick when 'multilingual embedding' meets 'sub-150M params.' MTEB-multilingual quality is mediocre by 2026 standards, but the 128-token context and 384-dim output keep it useful for short-passage multilingual RAG on storage-constrained hardware. For commercial multilingual quality, ship Arctic-embed-l-v2 instead.

Overview

Strengths

50+ language coverage at sub-150M params — best edge-tier multilingual embedder
384-dim output, same as MiniLM-L6 — drop-in replacement for English-only pipelines that need to add languages
Apache-2.0 with no restrictions
ONNX and Transformers.js exports widely available

Weaknesses

128-token context is the shortest of any embedder we list
MTEB multilingual score (~50.4) trails Arctic-embed-l-v2 and jina-v3 by 10+ points
Older paraphrase-objective training — less optimized for asymmetric query/doc retrieval
No Matryoshka support — full 384 dims always required at storage

Quantization variants

Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.

Quantization	File size	VRAM required
Q4_K_M	0.1 GB	1 GB

Get the model

HuggingFace

Original weights

huggingface.co/sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2

Source repository — direct quantization required.

Hardware that runs this

Cards with enough VRAM for at least one quantization of paraphrase-multilingual-MiniLM-L12-v2.

NVIDIA B300 (Blackwell Ultra)

Compare alternatives

Models worth comparing

Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.

Same tier

Models in the same parameter band as this one

Step up

More capable — bigger memory footprint

Step down

Smaller — faster, runs on weaker hardware

No verdicted models in the next tier down yet.

Frequently asked

What's the minimum VRAM to run paraphrase-multilingual-MiniLM-L12-v2?

1GB of VRAM is enough to run paraphrase-multilingual-MiniLM-L12-v2 at the Q4_K_M quantization (file size 0.1 GB). Higher-quality quantizations need more.

Can I use paraphrase-multilingual-MiniLM-L12-v2 commercially?

Yes — paraphrase-multilingual-MiniLM-L12-v2 ships under the apache-2.0, which permits commercial use. Always read the license text before deployment.

What's the context length of paraphrase-multilingual-MiniLM-L12-v2?

paraphrase-multilingual-MiniLM-L12-v2 supports a context window of 128 tokens (about 0K).

Source: huggingface.co/sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2

Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.

Related — keep moving

Compare hardware

Buyer guides

When it doesn't work

Recommended hardware

Before you buy

Verify paraphrase-multilingual-MiniLM-L12-v2 runs on your specific hardware before committing money.

Will it run on my hardware? →Custom hardware comparison →GPU recommender (4 questions) →