other
7.85B parameters
Restricted
Reviewed June 2026

NV-Embed v2

NVIDIA's research-grade embedding model. Mistral-7B base. Top of MTEB at release.

License: CC-BY-NC 4.0·Released Sep 9, 2024·Context: 32,768 tokens
BLK · VERDICT

Our verdict

OP · Fredoline Eruo|VERIFIED JUN 12, 2026
unrated

Positioning

NV-Embed v2 is a research-grade embedding model released by NVIDIA under the CC-BY-NC 4.0 license, which permits non-commercial use. Based on the Mistral-7B architecture, it is a dense 7.85B parameter model with a context length of 32,768 tokens. At the time of its release, it achieved top scores on the MTEB benchmark, making it a notable entry in the open-weight embedding landscape for research purposes.

Strengths

  • State-of-the-art embedding performance: NV-Embed v2 was reported as top of the MTEB leaderboard at release, indicating strong retrieval and embedding capabilities for research evaluation.
  • Large context window: With 32,768 tokens of context, it can process longer documents or passages than many embedding models, which typically support 512 or 2048 tokens.
  • Efficient deployment class: As a 7.85B dense model, it fits within the consumer deployment class, meaning it can run on single GPUs with 12–24 GB VRAM when quantized.
  • Permissive research license: The CC-BY-NC 4.0 license allows free use for non-commercial research, making it accessible for academic projects.

Limitations

  • Non-commercial license only: The CC-BY-NC 4.0 license prohibits commercial use, limiting deployment in production or revenue-generating applications.
  • No community benchmarks available: We do not have independent, community-reported benchmark results for this model. Published vendor metrics should be treated as best-case.
  • Dense architecture at 7.85B: Unlike Mixture-of-Experts models, all parameters are active during inference, meaning compute cost is proportional to the full 7.85B parameters.
  • Research-grade stability: As a research model, it may lack the robustness and support of production-oriented embedding models.

What it takes to run this locally

NV-Embed v2 has 7.85B parameters. Disk space requirements for common quantizations:

  • FP16: ~16 GB
  • Q8_0: ~8 GB
  • Q6_K: ~6.5 GB
  • Q5_K_M: ~5.6 GB
  • Q4_K_M: ~4.4 GB
  • Q3_K_M: ~3.8 GB
  • Q2_K: ~2.6 GB

Add approximately 30–50% for KV cache and framework overhead at typical context lengths. The model is in the consumer deployment class: a single GPU with 12–24 GB VRAM (e.g., RTX 3090/4090) can run it with appropriate quantization. For full FP16 precision, a 24 GB GPU is recommended.

Should you run this locally?

Yes if: You are conducting non-commercial research on text embeddings and want a model that was top of MTEB at release, with a large context window and permissive research license.

No if: You need commercial use rights, require production-grade stability, or prefer a smaller embedding model for faster inference on limited hardware.

Catalog cross-links

  • Mistral 7B – the base architecture for NV-Embed v2.
  • Consumer GPU Guide – hardware recommendations for running 7B-class models.
  • Embedding Models Overview – compare other embedding models.

Overview

NVIDIA's research-grade embedding model. Mistral-7B base. Top of MTEB at release.

Strengths

  • MTEB leader at release

Weaknesses

  • Non-commercial license

Quantization variants

Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.

QuantizationFile sizeVRAM required
FP1615.0 GB18 GB

Get the model

HuggingFace

Original weights

huggingface.co/nvidia/NV-Embed-v2

Source repository — direct quantization required.

Hardware that runs this

Cards with enough VRAM for at least one quantization of NV-Embed v2.

Compare alternatives

Models worth comparing

Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.

Frequently asked

What's the minimum VRAM to run NV-Embed v2?

18GB of VRAM is enough to run NV-Embed v2 at the FP16 quantization (file size 15.0 GB). Higher-quality quantizations need more.

Can I use NV-Embed v2 commercially?

NV-Embed v2 is released under the CC-BY-NC 4.0, which has restrictions for commercial use. Review the license terms before using it in a product.

What's the context length of NV-Embed v2?

NV-Embed v2 supports a context window of 32,768 tokens (about 33K).

Source: huggingface.co/nvidia/NV-Embed-v2

Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.

Related — keep moving

Before you buy

Verify NV-Embed v2 runs on your specific hardware before committing money.