other

0.572B parameters

Restricted

Reviewed May 2026

Jina Embeddings v3

Jina Embeddings v3 is a 572M-parameter multilingual encoder with 8192-token context and five task-specific LoRA adapters (retrieval-query, retrieval-passage, separation, classification, text-matching) selectable at inference. It produces 1024-dim Matryoshka embeddings truncatable to 32 dims and covers 89 languages, with the weights gated behind CC-BY-NC-4.0 for non-commercial use only.

License: cc-by-nc-4.0·Context: 8,192 tokens

BLK · VERDICT

Our verdict

OP · Fredoline Eruo|VERIFIED MAY 29, 2026

unrated

Technically the best open multilingual encoder of its size, but the CC-BY-NC license is a hard stop for any monetized product. Use it for prototyping or internal tools; for commercial multilingual RAG, ship Arctic-embed-l-v2 or multilingual-e5-large-instruct instead. The LoRA adapter trick is genuinely novel and worth the integration cost when license allows.

Overview

Strengths

Top-tier multilingual MTEB score (~65.5) across 89 languages, including strong Chinese/Japanese/Arabic
Five LoRA task adapters let one model swap between query-doc retrieval, clustering, and classification
1024-dim Matryoshka output truncatable to 32 dims for binary-tier vector storage
Native 8K context with RoPE — no chunking for typical RAG passages

Weaknesses

CC-BY-NC-4.0 license — commercial use requires a paid Jina API/license, not deployable in revenue products from weights
Custom architecture requires trust_remote_code=True and the jina_embeddings_v3 Python package
572M is the largest in the sub-1B tier — slower per-token than nomic-v1.5 or arctic-l-v2
No official GGUF; llama.cpp inference is community-maintained only

Quantization variants

Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.

Quantization	File size	VRAM required
Q4_K_M	0.3 GB	1 GB

Get the model

HuggingFace

Original weights

huggingface.co/jinaai/jina-embeddings-v3

Source repository — direct quantization required.

Hardware that runs this

Cards with enough VRAM for at least one quantization of Jina Embeddings v3.

NVIDIA B300 (Blackwell Ultra)

Compare alternatives

Models worth comparing

Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.

Same tier

Models in the same parameter band as this one

Step up

More capable — bigger memory footprint

Step down

Smaller — faster, runs on weaker hardware

No verdicted models in the next tier down yet.

Frequently asked

What's the minimum VRAM to run Jina Embeddings v3?

1GB of VRAM is enough to run Jina Embeddings v3 at the Q4_K_M quantization (file size 0.3 GB). Higher-quality quantizations need more.

Can I use Jina Embeddings v3 commercially?

Jina Embeddings v3 is released under the cc-by-nc-4.0, which has restrictions for commercial use. Review the license terms before using it in a product.

What's the context length of Jina Embeddings v3?

Jina Embeddings v3 supports a context window of 8,192 tokens (about 8K).

Source: huggingface.co/jinaai/jina-embeddings-v3

Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.

Related — keep moving

Compare hardware

Buyer guides

When it doesn't work

Recommended hardware

Before you buy

Verify Jina Embeddings v3 runs on your specific hardware before committing money.

Will it run on my hardware? →Custom hardware comparison →GPU recommender (4 questions) →