command-r
32B parameters
Restricted
Reviewed June 2026

Aya Expanse 32B

Cohere's multilingual Aya at 32B. Covers 23 languages; strongest open-weight multilingual model in late 2024 — Apache-2.0 alternative is Qwen 2.5 32B but Aya has deeper coverage on the long-tail languages.

License: CC-BY-NC-4.0·Released Oct 22, 2024·Context: 8,192 tokens
BLK · VERDICT

Our verdict

OP · Fredoline Eruo|VERIFIED JUN 12, 2026
unrated

Positioning

Cohere Aya Expanse 32B is the latest in Cohere For AI's multilingual research lineage — 32 billion parameters dense, instruction-tuned for 23+ languages with explicit balance across Arabic, Chinese, Japanese, Korean, Turkish, Russian, Spanish, French, German, and 14+ others. Released under CC-BY-NC-4.0 (research/non-commercial). The model is trained from a Llama-style base with Cohere's Aya multilingual pretraining + instruction-tuning recipe — the canonical "open-weight 30B-class multilingual model" in 2026.

Strengths

  • Multilingual coverage is genuinely best-in-class for the parameter tier. 23+ languages with balanced quality is meaningfully better than Llama 3 / Qwen 3 at the same parameter count, which lean English-heavy.
  • Strong on under-served languages. Arabic, Korean, Hebrew, Turkish, Vietnamese — languages where Llama 3 lags meaningfully.
  • 32B parameter dense fits cleanly on a single 48 GB GPU at FP16 (RTX 6000 Ada, L40S) or 24 GB at Q4-Q5 (RTX 4090 / RTX 5090).
  • Instruction-tuning is conservative and predictable. Doesn't have the "personality" RLHF of Llama 3.x but is reliable for production translation + multilingual chat workflows.

Limitations

  • License is non-commercial. CC-BY-NC-4.0 — production commercial deployments require Cohere licensing. Single biggest practical limitation.
  • Reasoning is not class-leading. DeepSeek V3 and Qwen 3 dramatically beat Aya on math/code/logic.
  • English-only quality is below Llama 3.1 70B / Qwen 3 32B. The multilingual-balanced training trades English performance for cross-language consistency.
  • Tool-use / function-calling is basic. Pre-trained for chat, not optimized for agentic workflows.
  • No long-context strength. 8K context standard, with degradation at 16K+.

Real-world performance

  • vs Llama 3.1 8B / Llama 3.1 70B: Llama wins for English-only at the parameter-equivalent tier. Aya Expanse 32B wins clearly on Arabic/Korean/Japanese/Vietnamese.
  • vs Qwen 3 32B: Qwen 3 32B is stronger overall + has Chinese-English balance. Aya Expanse 32B has wider language coverage but weaker per-language depth.
  • vs Command R+ 104B: Command R+ is the larger Cohere sibling with retrieval-grounding focus. Aya Expanse 32B is the cheaper-to-serve multilingual chat option.
  • vs Google Gemma 2 27B: Comparable parameter tier. Gemma stronger on English; Aya stronger on multilingual.

Should you run this locally?

Yes if you specifically need 30B-class multilingual chat for research / non-commercial use, your target language mix includes underserved languages (Arabic, Korean, Vietnamese, Hebrew, Turkish), and your deployment is research / academic / non-commercial.

No if you need permissive commercial licensing (pick Llama 3.1 70B or Qwen 3 32B), reasoning-heavy workloads (pick DeepSeek/Qwen 3), or English-only workflows (Llama / Qwen win).

How it compares

  • vs aya-23-35b: Aya Expanse is the architectural successor with refined instruction-tuning.
  • vs aya-23-8b: Aya 8B is the smaller sibling for cheaper inference at lower capability tier.
  • vs Command R 35B: Command R is RAG-tuned; Aya is multilingual-tuned. Different specializations.
  • vs Google Gemma 2 27B: Gemma stronger English; Aya stronger multilingual.

Run this yourself

Overview

Cohere's multilingual Aya at 32B. Covers 23 languages; strongest open-weight multilingual model in late 2024 — Apache-2.0 alternative is Qwen 2.5 32B but Aya has deeper coverage on the long-tail languages.

Family & lineage

How this model relates to others in its lineage. Family members share architecture and training-data roots; parent / children edges record direct distillation or fine-tune relationships.

Family siblings (aya)
Aya 23 8B8B
Consumer
Aya Expanse 32B32B
You are here
Aya 23 35B35B
Workstation

Strengths

  • 23-language coverage
  • Strong multilingual baseline

Weaknesses

  • CC-BY-NC license blocks commercial deployment

Quantization variants

Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.

QuantizationFile sizeVRAM required
AWQ-INT419.0 GB22 GB

Get the model

HuggingFace

Original weights

huggingface.co/CohereForAI/aya-expanse-32b

Source repository — direct quantization required.

Hardware that runs this

Cards with enough VRAM for at least one quantization of Aya Expanse 32B.

Compare alternatives

Models worth comparing

Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.

Frequently asked

What's the minimum VRAM to run Aya Expanse 32B?

22GB of VRAM is enough to run Aya Expanse 32B at the AWQ-INT4 quantization (file size 19.0 GB). Higher-quality quantizations need more.

Can I use Aya Expanse 32B commercially?

Aya Expanse 32B is released under the CC-BY-NC-4.0, which has restrictions for commercial use. Review the license terms before using it in a product.

What's the context length of Aya Expanse 32B?

Aya Expanse 32B supports a context window of 8,192 tokens (about 8K).

Source: huggingface.co/CohereForAI/aya-expanse-32b

Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.

Related — keep moving

Before you buy

Verify Aya Expanse 32B runs on your specific hardware before committing money.