Llama 3.1 Nemotron 70B Instruct
NVIDIA's HelpSteer2-tuned Llama 3.1 70B. Topped Arena Hard at release. The pre-Nemotron-3 NVIDIA reference open weights.
Overview
NVIDIA's HelpSteer2-tuned Llama 3.1 70B. Topped Arena Hard at release. The pre-Nemotron-3 NVIDIA reference open weights.
Strengths
- Top instruction-following at release
- HelpSteer2 tuning
Weaknesses
- Now historical
- 48GB+ VRAM
Quantization variants
Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.
| Quantization | File size | VRAM required |
|---|---|---|
| Q4_K_M | 40.0 GB | 48 GB |
Get the model
Ollama
One-line install
ollama run nemotron:70bRead our Ollama review →HuggingFace
Original weights
Source repository — direct quantization required.
Hardware that runs this
Cards with enough VRAM for at least one quantization of Llama 3.1 Nemotron 70B Instruct.
Models worth comparing
Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.
Frequently asked
What's the minimum VRAM to run Llama 3.1 Nemotron 70B Instruct?
Can I use Llama 3.1 Nemotron 70B Instruct commercially?
What's the context length of Llama 3.1 Nemotron 70B Instruct?
How do I install Llama 3.1 Nemotron 70B Instruct with Ollama?
Source: huggingface.co/nvidia/Llama-3.1-Nemotron-70B-Instruct
Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.