Command R+ 104B

Positioning

Command R+ is Cohere's enterprise-aimed open-weight flagship — 104B parameters with explicit training for retrieval-augmented generation and tool-call workflows. The model to pick when RAG is the primary workload and you can afford workstation-class hardware.

Strengths

Best-in-class RAG behavior — explicitly trained on citation-aware retrieval.
Strong multilingual — better than Llama on European + Asian languages.
Mature tool-use format — Cohere's API conventions translate cleanly.

Limitations

CC-BY-NC license for non-commercial only — commercial deployment requires a separate Cohere license.
104B VRAM cost — ~58 GB at Q4, partial-offload required on 24 GB cards.
General-chat quality below Llama 3.3 70B despite the size advantage.

Real-world performance on RTX 4090

Q4_K_M (58 GB) — heavy offload: 12–18 tok/s, 64+ GB system RAM required
Q5_K_M (68 GB) — workstation only
Q8_0 (104 GB) — workstation only

Should you run this locally?

Yes, for RAG-heavy non-commercial work, or commercial deployments with the Cohere license. No, for general-purpose chat — Llama 3.3 70B is more memory-efficient and equally good.

How it compares

vs Llama 3.3 70B → Llama wins on general use + license; Command R+ wins on RAG specifically.
vs Command R 35B → 104B is meaningfully smarter; 35B is the practical-VRAM Cohere pick.
vs DeepSeek R1 Distill Llama 70B → R1 Distill wins on reasoning; Command R+ wins on RAG/tool-use behavior.

Run this yourself

ollama pull command-r-plus:104b-q4_K_M
ollama run command-r-plus:104b-q4_K_M

Settings: Q4_K_M GGUF, 16384 ctx, --n-gpu-layers ~30, RTX 4090 + 96 GB DDR5

Quantization	File size	VRAM required
Q4_K_M	60.0 GB	70 GB

Quantization

File size

VRAM required

Q4_K_M

60.0 GB

70 GB

Frequently asked

What's the minimum VRAM to run Command R+ 104B?

70GB of VRAM is enough to run Command R+ 104B at the Q4_K_M quantization (file size 60.0 GB). Higher-quality quantizations need more.

Can I use Command R+ 104B commercially?

Command R+ 104B is released under the CC BY-NC 4.0, which has restrictions for commercial use. Review the license terms before using it in a product.

What's the context length of Command R+ 104B?

Command R+ 104B supports a context window of 131,072 tokens (about 131K).

How do I install Command R+ 104B with Ollama?

Run `ollama pull command-r-plus:104b` to download, then `ollama run command-r-plus:104b` to start a chat session. The default quantization is Q4_K_M.

Overview

Strengths

Weaknesses

Quantization variants

Get the model

Ollama

HuggingFace

Hardware that runs this

Models worth comparing

Frequently asked

What's the minimum VRAM to run Command R+ 104B?

Can I use Command R+ 104B commercially?

What's the context length of Command R+ 104B?

How do I install Command R+ 104B with Ollama?