other
0.572B parameters
Restricted
Reviewed May 2026

Jina Embeddings v3

Jina Embeddings v3 is a 572M-parameter multilingual encoder with 8192-token context and five task-specific LoRA adapters (retrieval-query, retrieval-passage, separation, classification, text-matching) selectable at inference. It produces 1024-dim Matryoshka embeddings truncatable to 32 dims and covers 89 languages, with the weights gated behind CC-BY-NC-4.0 for non-commercial use only.

License: cc-by-nc-4.0·Context: 8,192 tokens
BLK · VERDICT

Our verdict

OP · Fredoline Eruo|VERIFIED MAY 29, 2026
unrated

Technically the best open multilingual encoder of its size, but the CC-BY-NC license is a hard stop for any monetized product. Use it for prototyping or internal tools; for commercial multilingual RAG, ship Arctic-embed-l-v2 or multilingual-e5-large-instruct instead. The LoRA adapter trick is genuinely novel and worth the integration cost when license allows.

Overview

Jina Embeddings v3 is a 572M-parameter multilingual encoder with 8192-token context and five task-specific LoRA adapters (retrieval-query, retrieval-passage, separation, classification, text-matching) selectable at inference. It produces 1024-dim Matryoshka embeddings truncatable to 32 dims and covers 89 languages, with the weights gated behind CC-BY-NC-4.0 for non-commercial use only.

Strengths

  • Top-tier multilingual MTEB score (~65.5) across 89 languages, including strong Chinese/Japanese/Arabic
  • Five LoRA task adapters let one model swap between query-doc retrieval, clustering, and classification
  • 1024-dim Matryoshka output truncatable to 32 dims for binary-tier vector storage
  • Native 8K context with RoPE — no chunking for typical RAG passages

Weaknesses

  • CC-BY-NC-4.0 license — commercial use requires a paid Jina API/license, not deployable in revenue products from weights
  • Custom architecture requires trust_remote_code=True and the jina_embeddings_v3 Python package
  • 572M is the largest in the sub-1B tier — slower per-token than nomic-v1.5 or arctic-l-v2
  • No official GGUF; llama.cpp inference is community-maintained only

Quantization variants

Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.

QuantizationFile sizeVRAM required
Q4_K_M0.3 GB1 GB

Get the model

HuggingFace

Original weights

huggingface.co/jinaai/jina-embeddings-v3

Source repository — direct quantization required.

Hardware that runs this

Cards with enough VRAM for at least one quantization of Jina Embeddings v3.

Compare alternatives

Models worth comparing

Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.

Step down
Smaller — faster, runs on weaker hardware
No verdicted models in the next tier down yet.

Frequently asked

What's the minimum VRAM to run Jina Embeddings v3?

1GB of VRAM is enough to run Jina Embeddings v3 at the Q4_K_M quantization (file size 0.3 GB). Higher-quality quantizations need more.

Can I use Jina Embeddings v3 commercially?

Jina Embeddings v3 is released under the cc-by-nc-4.0, which has restrictions for commercial use. Review the license terms before using it in a product.

What's the context length of Jina Embeddings v3?

Jina Embeddings v3 supports a context window of 8,192 tokens (about 8K).

Source: huggingface.co/jinaai/jina-embeddings-v3

Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.

Related — keep moving

Before you buy

Verify Jina Embeddings v3 runs on your specific hardware before committing money.