NV-Embed v2
NVIDIA's research-grade embedding model. Mistral-7B base. Top of MTEB at release.
Positioning
NV-Embed v2 is a research-grade embedding model released by NVIDIA under the CC-BY-NC 4.0 license, which permits non-commercial use. Based on the Mistral-7B architecture, it is a dense 7.85B parameter model with a context length of 32,768 tokens. At the time of its release, it achieved top scores on the MTEB benchmark, making it a notable entry in the open-weight embedding landscape for research purposes.
Strengths
- State-of-the-art embedding performance: NV-Embed v2 was reported as top of the MTEB leaderboard at release, indicating strong retrieval and embedding capabilities for research evaluation.
- Large context window: With 32,768 tokens of context, it can process longer documents or passages than many embedding models, which typically support 512 or 2048 tokens.
- Efficient deployment class: As a 7.85B dense model, it fits within the consumer deployment class, meaning it can run on single GPUs with 12–24 GB VRAM when quantized.
- Permissive research license: The CC-BY-NC 4.0 license allows free use for non-commercial research, making it accessible for academic projects.
Limitations
- Non-commercial license only: The CC-BY-NC 4.0 license prohibits commercial use, limiting deployment in production or revenue-generating applications.
- No community benchmarks available: We do not have independent, community-reported benchmark results for this model. Published vendor metrics should be treated as best-case.
- Dense architecture at 7.85B: Unlike Mixture-of-Experts models, all parameters are active during inference, meaning compute cost is proportional to the full 7.85B parameters.
- Research-grade stability: As a research model, it may lack the robustness and support of production-oriented embedding models.
What it takes to run this locally
NV-Embed v2 has 7.85B parameters. Disk space requirements for common quantizations:
- FP16: ~16 GB
- Q8_0: ~8 GB
- Q6_K: ~6.5 GB
- Q5_K_M: ~5.6 GB
- Q4_K_M: ~4.4 GB
- Q3_K_M: ~3.8 GB
- Q2_K: ~2.6 GB
Add approximately 30–50% for KV cache and framework overhead at typical context lengths. The model is in the consumer deployment class: a single GPU with 12–24 GB VRAM (e.g., RTX 3090/4090) can run it with appropriate quantization. For full FP16 precision, a 24 GB GPU is recommended.
Should you run this locally?
Yes if: You are conducting non-commercial research on text embeddings and want a model that was top of MTEB at release, with a large context window and permissive research license.
No if: You need commercial use rights, require production-grade stability, or prefer a smaller embedding model for faster inference on limited hardware.
Catalog cross-links
- Mistral 7B – the base architecture for NV-Embed v2.
- Consumer GPU Guide – hardware recommendations for running 7B-class models.
- Embedding Models Overview – compare other embedding models.
Overview
NVIDIA's research-grade embedding model. Mistral-7B base. Top of MTEB at release.
Strengths
- MTEB leader at release
Weaknesses
- Non-commercial license
Quantization variants
Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.
| Quantization | File size | VRAM required |
|---|---|---|
| FP16 | 15.0 GB | 18 GB |
Get the model
HuggingFace
Original weights
Source repository — direct quantization required.
Hardware that runs this
Cards with enough VRAM for at least one quantization of NV-Embed v2.
Models worth comparing
Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.
Frequently asked
What's the minimum VRAM to run NV-Embed v2?
Can I use NV-Embed v2 commercially?
What's the context length of NV-Embed v2?
Source: huggingface.co/nvidia/NV-Embed-v2
Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.
Related — keep moving
Verify NV-Embed v2 runs on your specific hardware before committing money.