RUNLOCALAIv38
→WILL IT RUNBEST GPUCOMPARETROUBLESHOOTSTARTPULSEMODELSHARDWARETOOLSBENCH
RUNLOCALAI

Operator-grade instrument for local-AI hardware intelligence. Hand-written verdicts. Real benchmarks. Reproducible commands.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
  • Will it run?
GUIDES
  • Best GPU
  • Best laptop
  • Best Mac
  • Best used GPU
  • Best budget GPU
  • Best GPU for Ollama
  • Best GPU for SD
  • AI PC build $2K
  • CUDA vs ROCm
  • 16 vs 24 GB
  • Compare hardware
  • Custom compare
REF
  • Systems
  • Ecosystem maps
  • Pillar guides
  • Methodology
  • Glossary
  • Errors KB
  • Troubleshooting
  • Resources
  • Public API
EDITOR
  • About
  • About the author
  • Changelog
  • Latest
  • Updates
  • Submit benchmark
  • Send feedback
  • Trust
  • Editorial policy
  • How we make money
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

SYS · ONLINEUPTIME · 100%2026 · operator-owned
RUNLOCALAI · v38
  1. >
  2. Home
  3. /Models
  4. /Command R+ (Aug 2024)
command-r
104B parameters
Restricted
·Reviewed May 2026

Command R+ (Aug 2024)

Cohere's August 2024 Command R+ refresh. RAG-optimized; non-commercial license. Strong tool-calling and citation discipline.

License: CC-BY-NC-4.0·Released Aug 30, 2024·Context: 131,072 tokens
BLK · VERDICT

Our verdict

OP · Fredoline Eruo|VERIFIED MAY 8, 2026
unrated

Positioning

Cohere Command R+ (08-2024) is the open-weight refresh of Cohere's flagship retrieval-augmented model and one of the few 100B+ class open-weight models with explicit RAG / tool-use tuning. 104 billion parameters dense, 128K context, released under a research-and-non-commercial license (CC-BY-NC-4.0 + Cohere Acceptable Use). The 08-2024 update brought meaningful improvements over the original April 2024 release: better retrieval grounding accuracy, expanded multilingual coverage (10+ languages well-supported including Arabic, Korean, Hebrew), and stronger tool-use chain-of-thought.

Strengths

  • Retrieval grounding is a genuine differentiator. Command R+ was trained with explicit RAG document-citation as a first-class capability — citation accuracy on multi-document QA is meaningfully better than equal-parameter Llama 3 / Qwen 3 models that bolt RAG on after training.
  • Multilingual coverage is real. Genuinely useful for Arabic, Korean, Hebrew, Indonesian, Vietnamese — languages where Llama 3 lags.
  • Tool-use is well-tuned. Function calling + multi-step tool use shows up clearly in agentic benchmarks.
  • 128K context with stable degradation curve. Performance at 64K-128K context is closer to short-context performance than typical 100B-class models.

Limitations

  • License is non-commercial. CC-BY-NC-4.0 + Cohere AUP — production commercial deployments require Cohere licensing. This is the single biggest practical limitation vs Llama 3.3 / Qwen 3 (which ship under permissive open-weight licenses).
  • Compute requirements are real. 104B FP16 needs ~210 GB; 104B Q4 needs ~55-60 GB. Won't fit a single consumer GPU. MI300X / PRO 6000 Blackwell / Mac Studio M3 Ultra is the floor.
  • Reasoning is not class-leading. DeepSeek V3 and Qwen 3 Reasoning beat Command R+ on math/code/logic benchmarks.
  • Latency is workstation-tier. Decode at ~25-40 tok/s on PRO 6000 Blackwell at Q4. Production multi-tenant serving needs proper serving stack on H100/H200 cluster.

Real-world performance

  • Llama 3 70B vs Command R+ 104B: Llama 3 70B is faster (smaller, better optimized) and equally capable on English-only tasks. Command R+ wins clearly on multilingual + RAG citation + tool-use chains.
  • Qwen 3 235B-A22B MoE vs Command R+ 104B dense: Qwen 3 235B is faster (MoE active params ~22B) and stronger on reasoning. Command R+ wins on retrieval grounding + multilingual.
  • Claude 3.5 Sonnet via API vs Command R+ self-hosted: API is faster and stronger on most tasks but data sovereignty + RAG + on-prem deployment is impossible. Pick by deployment requirements.

Should you run this locally?

Yes if you specifically need on-prem multilingual RAG at the 100B-class capability tier and your deployment context is research / non-commercial / Cohere-licensed. Production-grade RAG with strong citation accuracy is genuinely a Command R+ strength. Pick Mac Studio M3 Ultra (192 GB) or MI300X (192 GB) for single-card deployment.

No if you need permissive open-weight licensing (pick Llama 3.3 70B or Qwen 3 235B-A22B), reasoning-heavy workloads (pick DeepSeek V3 / Qwen 3 Reasoning), or production at scale where commercial licensing cost beats self-hosting math.

How it compares

  • vs Llama 3.3 70B: Smaller, faster, permissive license. Llama wins for production commercial; Command R+ wins for multilingual RAG.
  • vs Qwen 3 235B-A22B: Larger params + stronger reasoning + permissive license. Qwen 3 wins on most metrics; Command R+ wins on retrieval grounding citation accuracy.
  • vs DeepSeek V3 (671B MoE): DeepSeek V3 is dramatically larger MoE with stronger reasoning. Command R+ wins on dense-model deployment simplicity.
  • vs command-r-35b: Smaller Cohere sibling — same retrieval focus at lower parameter count. Pick R+ for full capability, R for cheaper inference.

Run this yourself

  • Single-card workstation: PRO 6000 Blackwell (96 GB) at Q4 — fits comfortably with full context.
  • Single-card AMD: MI300X (192 GB) at FP16 with full 128K context.
  • Mac Studio: Mac Studio M3 Ultra (192 GB) at FP16 via MLX or llama.cpp Metal.
  • Datacenter: 2× H100 PCIe NVLinked at FP8 production serving.
  • Cloud rental: Runpod / Lambda H100 PCIe ~$2.50-3.50/hr.

Overview

Cohere's August 2024 Command R+ refresh. RAG-optimized; non-commercial license. Strong tool-calling and citation discipline.

Family & lineage

How this model relates to others in its lineage. Family members share architecture and training-data roots; parent / children edges record direct distillation or fine-tune relationships.

Parent / base model
Command R+ 104B104B
Datacenter
Family siblings (command-r)
Command R 35B35B
Workstation
Command R+ (Aug 2024)104B
You are here
Command R+ 104B104B
Datacenter

Strengths

  • Strongest open RAG-tuned model in 2024
  • Citation discipline

Weaknesses

  • Non-commercial license blocks production deployment

Quantization variants

Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.

QuantizationFile sizeVRAM required
AWQ-INT460.0 GB72 GB

Get the model

HuggingFace

Original weights

huggingface.co/CohereForAI/c4ai-command-r-plus-08-2024

Source repository — direct quantization required.

Hardware that runs this

Cards with enough VRAM for at least one quantization of Command R+ (Aug 2024).

NVIDIA GB200 NVL72
13824GB · nvidia
AMD Instinct MI355X
288GB · amd
AMD Instinct MI325X
256GB · amd
AMD Instinct MI300X
192GB · amd
NVIDIA B200
192GB · nvidia
NVIDIA H100 NVL
188GB · nvidia
NVIDIA H200
141GB · nvidia
AMD Instinct MI300A (APU)
128GB · amd

Frequently asked

What's the minimum VRAM to run Command R+ (Aug 2024)?

72GB of VRAM is enough to run Command R+ (Aug 2024) at the AWQ-INT4 quantization (file size 60.0 GB). Higher-quality quantizations need more.

Can I use Command R+ (Aug 2024) commercially?

Command R+ (Aug 2024) is released under the CC-BY-NC-4.0, which has restrictions for commercial use. Review the license terms before using it in a product.

What's the context length of Command R+ (Aug 2024)?

Command R+ (Aug 2024) supports a context window of 131,072 tokens (about 131K).

Source: huggingface.co/CohereForAI/c4ai-command-r-plus-08-2024

Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.

Related — keep moving

Compare hardware
  • Dual 3090 vs RTX 5090 (48 GB or 32 GB) →
  • RTX 3090 vs RTX 4090 →
Buyer guides
  • 16 GB vs 24 GB VRAM — what 70B-class models need →
  • Best GPU for local AI →
  • Best laptop for local AI →
  • Best Mac for local AI →
  • Best used GPU for local AI →
When it doesn't work
  • CUDA out of memory →
  • Ollama running slowly →
  • ROCm not detected →
  • Model keeps crashing →
Recommended hardware
  • NVIDIA GB200 NVL72 →
  • AMD Instinct MI355X →
  • AMD Instinct MI325X →
  • AMD Instinct MI300X →
  • NVIDIA B200 →
Alternatives
Command R+ 104BCommand R 35B
Before you buy

Verify Command R+ (Aug 2024) runs on your specific hardware before committing money.

Will it run on my hardware? →Custom hardware comparison →GPU recommender (4 questions) →
Compare alternatives

Models worth comparing

Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.

Same tier
Models in the same parameter band as this one
  • DeepSeek V4 Pro (1.6T MoE)
    deepseek · 1600B
    unrated
  • Qwen 3.5 235B-A17B (MoE)
    qwen · 397B
    unrated
  • Qwen 3 235B-A22B
    qwen · 235B
    unrated
  • DeepSeek V4 Flash (284B MoE)
    deepseek · 284B
    unrated
Step up
More capable — bigger memory footprint
No verdicted models in the next tier up yet.
Step down
Smaller — faster, runs on weaker hardware
  • Llama 3.3 70B Instruct
    llama · 70B
    9.1/10
  • DeepSeek R1 Distill Llama 70B
    deepseek · 70B
    9.0/10