other
7.85B parameters
Restricted
Reviewed May 2026NV-Embed v2
NVIDIA's research-grade embedding model. Mistral-7B base. Top of MTEB at release.
License: CC-BY-NC 4.0·Released Sep 9, 2024·Context: 32,768 tokens
Overview
NVIDIA's research-grade embedding model. Mistral-7B base. Top of MTEB at release.
Strengths
- MTEB leader at release
Weaknesses
- Non-commercial license
Quantization variants
Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.
| Quantization | File size | VRAM required |
|---|---|---|
| FP16 | 15.0 GB | 18 GB |
Get the model
HuggingFace
Original weights
huggingface.co/nvidia/NV-Embed-v2
Source repository — direct quantization required.
Hardware that runs this
Cards with enough VRAM for at least one quantization of NV-Embed v2.
Frequently asked
What's the minimum VRAM to run NV-Embed v2?
18GB of VRAM is enough to run NV-Embed v2 at the FP16 quantization (file size 15.0 GB). Higher-quality quantizations need more.
Can I use NV-Embed v2 commercially?
NV-Embed v2 is released under the CC-BY-NC 4.0, which has restrictions for commercial use. Review the license terms before using it in a product.
What's the context length of NV-Embed v2?
NV-Embed v2 supports a context window of 32,768 tokens (about 33K).
Source: huggingface.co/nvidia/NV-Embed-v2
Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.
Related — keep moving
Compare hardware
Buyer guides
Before you buy
Verify NV-Embed v2 runs on your specific hardware before committing money.