Command R+ 104B
Cohere's flagship — RAG-tuned, multilingual. Open weights but non-commercial.
Command R+ is Cohere's enterprise-aimed open-weight flagship — 104B parameters with explicit training for retrieval-augmented generation and tool-call workflows. The model to pick when RAG is the primary workload and you can afford workstation-class hardware.
Strengths- Best-in-class RAG behavior — explicitly trained on citation-aware retrieval.
- Strong multilingual — better than Llama on European + Asian languages.
- Mature tool-use format — Cohere's API conventions translate cleanly.
- CC-BY-NC license for non-commercial only — commercial deployment requires a separate Cohere license.
- 104B VRAM cost — ~58 GB at Q4, partial-offload required on 24 GB cards.
- General-chat quality below Llama 3.3 70B despite the size advantage.
- Q4_K_M (58 GB) — heavy offload: 12–18 tok/s, 64+ GB system RAM required
- Q5_K_M (68 GB) — workstation only
- Q8_0 (104 GB) — workstation only
Yes, for RAG-heavy non-commercial work, or commercial deployments with the Cohere license. No, for general-purpose chat — Llama 3.3 70B is more memory-efficient and equally good.
How it compares- vs Llama 3.3 70B → Llama wins on general use + license; Command R+ wins on RAG specifically.
- vs Command R 35B → 104B is meaningfully smarter; 35B is the practical-VRAM Cohere pick.
- vs DeepSeek R1 Distill Llama 70B → R1 Distill wins on reasoning; Command R+ wins on RAG/tool-use behavior.
ollama pull command-r-plus:104b-q4_K_M
ollama run command-r-plus:104b-q4_K_M
Settings: Q4_K_M GGUF, 16384 ctx, --n-gpu-layers ~30, RTX 4090 + 96 GB DDR5
›Why this rating
7.8/10 — Cohere's enterprise-tuned 104B with built-in RAG and tool-use scaffolding. Excellent at structured RAG workflows, but eclipsed by Llama 3.3 70B for general use. Loses points on license restrictiveness and VRAM cost.
Overview
Cohere's flagship — RAG-tuned, multilingual. Open weights but non-commercial.
Strengths
- Best open RAG model at release
- Multilingual
Weaknesses
- Non-commercial
- 70GB+ VRAM
Quantization variants
Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.
| Quantization | File size | VRAM required |
|---|---|---|
| Q4_K_M | 60.0 GB | 70 GB |
Get the model
Ollama
One-line install
ollama run command-r-plus:104bRead our Ollama review →HuggingFace
Original weights
Source repository — direct quantization required.
Hardware that runs this
Cards with enough VRAM for at least one quantization of Command R+ 104B.
Models worth comparing
Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.
Frequently asked
What's the minimum VRAM to run Command R+ 104B?
Can I use Command R+ 104B commercially?
What's the context length of Command R+ 104B?
How do I install Command R+ 104B with Ollama?
Source: huggingface.co/CohereForAI/c4ai-command-r-plus
Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.