Jina Embeddings v3
Jina Embeddings v3 is a 572M-parameter multilingual encoder with 8192-token context and five task-specific LoRA adapters (retrieval-query, retrieval-passage, separation, classification, text-matching) selectable at inference. It produces 1024-dim Matryoshka embeddings truncatable to 32 dims and covers 89 languages, with the weights gated behind CC-BY-NC-4.0 for non-commercial use only.
Technically the best open multilingual encoder of its size, but the CC-BY-NC license is a hard stop for any monetized product. Use it for prototyping or internal tools; for commercial multilingual RAG, ship Arctic-embed-l-v2 or multilingual-e5-large-instruct instead. The LoRA adapter trick is genuinely novel and worth the integration cost when license allows.
Overview
Jina Embeddings v3 is a 572M-parameter multilingual encoder with 8192-token context and five task-specific LoRA adapters (retrieval-query, retrieval-passage, separation, classification, text-matching) selectable at inference. It produces 1024-dim Matryoshka embeddings truncatable to 32 dims and covers 89 languages, with the weights gated behind CC-BY-NC-4.0 for non-commercial use only.
Strengths
- Top-tier multilingual MTEB score (~65.5) across 89 languages, including strong Chinese/Japanese/Arabic
- Five LoRA task adapters let one model swap between query-doc retrieval, clustering, and classification
- 1024-dim Matryoshka output truncatable to 32 dims for binary-tier vector storage
- Native 8K context with RoPE — no chunking for typical RAG passages
Weaknesses
- CC-BY-NC-4.0 license — commercial use requires a paid Jina API/license, not deployable in revenue products from weights
- Custom architecture requires trust_remote_code=True and the jina_embeddings_v3 Python package
- 572M is the largest in the sub-1B tier — slower per-token than nomic-v1.5 or arctic-l-v2
- No official GGUF; llama.cpp inference is community-maintained only
Quantization variants
Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.
| Quantization | File size | VRAM required |
|---|---|---|
| Q4_K_M | 0.3 GB | 1 GB |
Get the model
HuggingFace
Original weights
Source repository — direct quantization required.
Hardware that runs this
Cards with enough VRAM for at least one quantization of Jina Embeddings v3.
Models worth comparing
Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.
Frequently asked
What's the minimum VRAM to run Jina Embeddings v3?
Can I use Jina Embeddings v3 commercially?
What's the context length of Jina Embeddings v3?
Source: huggingface.co/jinaai/jina-embeddings-v3
Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.
Related — keep moving
Verify Jina Embeddings v3 runs on your specific hardware before committing money.