other

7.85B parameters

Restricted

Reviewed May 2026

NV-Embed v2

NVIDIA's research-grade embedding model. Mistral-7B base. Top of MTEB at release.

License: CC-BY-NC 4.0·Released Sep 9, 2024·Context: 32,768 tokens

Overview

NVIDIA's research-grade embedding model. Mistral-7B base. Top of MTEB at release.

Strengths

MTEB leader at release

Weaknesses

Non-commercial license

Quantization variants

Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.

Quantization	File size	VRAM required
FP16	15.0 GB	18 GB

Get the model

HuggingFace

Original weights

huggingface.co/nvidia/NV-Embed-v2

Source repository — direct quantization required.

Hardware that runs this

Cards with enough VRAM for at least one quantization of NV-Embed v2.

Frequently asked

What's the minimum VRAM to run NV-Embed v2?

18GB of VRAM is enough to run NV-Embed v2 at the FP16 quantization (file size 15.0 GB). Higher-quality quantizations need more.

Can I use NV-Embed v2 commercially?

NV-Embed v2 is released under the CC-BY-NC 4.0, which has restrictions for commercial use. Review the license terms before using it in a product.

What's the context length of NV-Embed v2?

NV-Embed v2 supports a context window of 32,768 tokens (about 33K).

Source: huggingface.co/nvidia/NV-Embed-v2

Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.

Related — keep moving

Compare hardware

Buyer guides

When it doesn't work

Recommended hardware

Before you buy

Verify NV-Embed v2 runs on your specific hardware before committing money.

Will it run on my hardware? →Custom hardware comparison →GPU recommender (4 questions) →