Tulu 3 70B
Tulu 3 at 70B. AI2's fully-open instruct fine-tune — research transparency at scale.
Overview
Tulu 3 at 70B. AI2's fully-open instruct fine-tune — research transparency at scale.
How to run it
Tulu 3 70B is Ai2's instruction-tuned 70B model based on Llama 3.1 70B. Tulu is Ai2's research fine-tune focused on improving instruction-following with a curated dataset mix (open-source post-training pipeline). Run at Q4_K_M via Ollama (ollama pull tulu3:70b) or llama.cpp with -ngl 999 -fa -c 8192. Q4_K_M file size ~40 GB on disk. Minimum VRAM: 48 GB — RTX A6000 (48GB) at Q4_K_M for 4K context. RTX 4090 24GB: Q3_K_M with KV offload. Recommended: A100 80GB at AWQ-INT4 for serving. Throughput: ~15-25 tok/s on A6000 at Q4_K_M (4K context); ~30-45 tok/s on A100. Standard Llama architecture — dropp-in compatible with any Llama inference stack. Tulu 3 is instruction-tuned (chat/agent focus). Use for: general chat, instruction-following, agent tasks, knowledge work. Ai2's license is permissive (usually ODC-By or Apache 2.0 for Tulu). Context: Llama 3.1-level (128K, practical 8-16K on 48 GB).
Hardware guidance
Minimum: RTX 3090 24GB at Q3_K_M with KV offload (4K). Recommended: RTX A6000 48GB at Q4_K_M (8K). Optimal: A100 80GB at AWQ-INT4. VRAM math: 70B dense, Q4_K_M ≈ 40 GB. KV cache at 8K: ~10 GB. Total: ~50 GB at 8K. A6000 48GB: borderline — trim context to 4K. RTX 4090 24GB: Q3_K_M ≈ 30 GB + KV offload. RTX 5090 32GB: Q4_K_M 40 GB — must offload KV. Dual RTX 4090 48 GB: Q4 at 8K — viable. Mac Studio M4 Max 64GB: Q4_K_M at 5-10 tok/s. Cloud: A100 80GB at $5-10/hr. AWQ-INT4 on A100 enables 32K context.
What breaks first
- Tulu chat template. Tulu 3 uses Ai2's chat template, which differs slightly from standard Llama 3.1. Using the Llama 3.1 default template may produce subtly worse instruction-following. Use Tulu's template from tokenizer_config.json. 2. Benchmark overfitting. Tulu 3's training uses public benchmarks in the data mix. Performance on exact benchmark prompts may overstate real-world quality. Test on your own tasks. 3. Q3 quality on instruction-following. Tulu's instruction-tuning is relatively shallow compared to base Llama training. At Q3, instruction adherence degrades more than base knowledge — the fine-tuned behavior is more quant-sensitive. 4. Ollama tag freshness. Tulu 3 may not be in Ollama's default catalog. Check huggingface.co/allenai for GGUF availability or convert from hf.
Runtime recommendation
Common beginner mistakes
Mistake: Using Llama 3.1's default chat template with Tulu 3. Fix: Tulu 3 uses Ai2's template. Check tokenizer_config.json for exact format or use the model card's recommended template. Mistake: Assuming Tulu 3 matches Llama 3.3 70B quality. Fix: Tulu 3 is fine-tuned on Llama 3.1 70B, not 3.3. It's a different base model. Expect quality similar to Llama 3.1 70B with improved instruction-following. Mistake: Expecting Tulu 3 to follow system prompts as aggressively as command-r models. Fix: Tulu 3 is instruction-tuned but not specifically system-prompt-optimized. Longer system prompts may be ignored or partially followed. Mistake: Running at 128K context on consumer hardware. Fix: Same as all ~70B models — KV cache at 128K is 80+ GB. Keep context 4-8K on 24-48 GB GPUs.
Family & lineage
How this model relates to others in its lineage. Family members share architecture and training-data roots; parent / children edges record direct distillation or fine-tune relationships.
Strengths
- Fully-open recipe at 70B
Weaknesses
- Llama Community license inherited
Quantization variants
Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.
| Quantization | File size | VRAM required |
|---|---|---|
| Q4_K_M | 40.0 GB | 48 GB |
Get the model
HuggingFace
Original weights
Source repository — direct quantization required.
Hardware that runs this
Cards with enough VRAM for at least one quantization of Tulu 3 70B.
Frequently asked
What's the minimum VRAM to run Tulu 3 70B?
Can I use Tulu 3 70B commercially?
What's the context length of Tulu 3 70B?
Source: huggingface.co/allenai/Llama-3.1-Tulu-3-70B
Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.
Related — keep moving
Verify Tulu 3 70B runs on your specific hardware before committing money.