RUNLOCALAIv38
→WILL IT RUNBEST GPUCOMPARETROUBLESHOOTSTARTPULSEMODELSHARDWARETOOLSBENCH
  1. >
  2. Home
  3. /Models
  4. /Aya Expanse 32B
command-r
32B parameters
Restricted
·Reviewed May 2026

Aya Expanse 32B

Cohere's multilingual Aya at 32B. Covers 23 languages; strongest open-weight multilingual model in late 2024 — Apache-2.0 alternative is Qwen 2.5 32B but Aya has deeper coverage on the long-tail languages.

License: CC-BY-NC-4.0·Released Oct 22, 2024·Context: 8,192 tokens
BLK · VERDICT

Our verdict

OP · Fredoline Eruo|VERIFIED MAY 8, 2026
unrated

Positioning

Cohere Aya Expanse 32B is the latest in Cohere For AI's multilingual research lineage — 32 billion parameters dense, instruction-tuned for 23+ languages with explicit balance across Arabic, Chinese, Japanese, Korean, Turkish, Russian, Spanish, French, German, and 14+ others. Released under CC-BY-NC-4.0 (research/non-commercial). The model is trained from a Llama-style base with Cohere's Aya multilingual pretraining + instruction-tuning recipe — the canonical "open-weight 30B-class multilingual model" in 2026.

Strengths

  • Multilingual coverage is genuinely best-in-class for the parameter tier. 23+ languages with balanced quality is meaningfully better than Llama 3 / Qwen 3 at the same parameter count, which lean English-heavy.
  • Strong on under-served languages. Arabic, Korean, Hebrew, Turkish, Vietnamese — languages where Llama 3 lags meaningfully.
  • 32B parameter dense fits cleanly on a single 48 GB GPU at FP16 (RTX 6000 Ada, L40S) or 24 GB at Q4-Q5 (RTX 4090 / RTX 5090).
  • Instruction-tuning is conservative and predictable. Doesn't have the "personality" RLHF of Llama 3.x but is reliable for production translation + multilingual chat workflows.

Limitations

  • License is non-commercial. CC-BY-NC-4.0 — production commercial deployments require Cohere licensing. Single biggest practical limitation.
  • Reasoning is not class-leading. DeepSeek V3 and Qwen 3 dramatically beat Aya on math/code/logic.
  • English-only quality is below Llama 3.1 70B / Qwen 3 32B. The multilingual-balanced training trades English performance for cross-language consistency.
  • Tool-use / function-calling is basic. Pre-trained for chat, not optimized for agentic workflows.
  • No long-context strength. 8K context standard, with degradation at 16K+.

Real-world performance

  • vs Llama 3.1 8B / Llama 3.1 70B: Llama wins for English-only at the parameter-equivalent tier. Aya Expanse 32B wins clearly on Arabic/Korean/Japanese/Vietnamese.
  • vs Qwen 3 32B: Qwen 3 32B is stronger overall + has Chinese-English balance. Aya Expanse 32B has wider language coverage but weaker per-language depth.
  • vs Command R+ 104B: Command R+ is the larger Cohere sibling with retrieval-grounding focus. Aya Expanse 32B is the cheaper-to-serve multilingual chat option.
  • vs Google Gemma 2 27B: Comparable parameter tier. Gemma stronger on English; Aya stronger on multilingual.

Should you run this locally?

Yes if you specifically need 30B-class multilingual chat for research / non-commercial use, your target language mix includes underserved languages (Arabic, Korean, Vietnamese, Hebrew, Turkish), and your deployment is research / academic / non-commercial.

No if you need permissive commercial licensing (pick Llama 3.1 70B or Qwen 3 32B), reasoning-heavy workloads (pick DeepSeek/Qwen 3), or English-only workflows (Llama / Qwen win).

How it compares

  • vs aya-23-35b: Aya Expanse is the architectural successor with refined instruction-tuning.
  • vs aya-23-8b: Aya 8B is the smaller sibling for cheaper inference at lower capability tier.
  • vs Command R 35B: Command R is RAG-tuned; Aya is multilingual-tuned. Different specializations.
  • vs Google Gemma 2 27B: Gemma stronger English; Aya stronger multilingual.

Run this yourself

  • Single 24 GB GPU at Q4-Q5: RTX 4090, RTX 5090, used 3090.
  • Single 48 GB workstation at FP16: RTX 6000 Ada, L40S.
  • Apple Silicon at FP16: Mac Studio M3 Ultra / MacBook Pro M4 Max (96+ GB).
  • vLLM serving: vllm serve CohereForAI/aya-expanse-32b --max-model-len 8192.
  • Cloud rental: Runpod / Lambda L40S ~$1.50-2.50/hr.

Overview

Cohere's multilingual Aya at 32B. Covers 23 languages; strongest open-weight multilingual model in late 2024 — Apache-2.0 alternative is Qwen 2.5 32B but Aya has deeper coverage on the long-tail languages.

Family & lineage

How this model relates to others in its lineage. Family members share architecture and training-data roots; parent / children edges record direct distillation or fine-tune relationships.

Family siblings (aya)
Aya 23 8B8B
Consumer
Aya Expanse 32B32B
You are here
Aya 23 35B35B
Workstation

Strengths

  • 23-language coverage
  • Strong multilingual baseline

Weaknesses

  • CC-BY-NC license blocks commercial deployment

Quantization variants

Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.

QuantizationFile sizeVRAM required
AWQ-INT419.0 GB22 GB

Get the model

HuggingFace

Original weights

huggingface.co/CohereForAI/aya-expanse-32b

Source repository — direct quantization required.

Hardware that runs this

Cards with enough VRAM for at least one quantization of Aya Expanse 32B.

NVIDIA GB200 NVL72
13824GB · nvidia
AMD Instinct MI355X
288GB · amd
AMD Instinct MI325X
256GB · amd
AMD Instinct MI300X
192GB · amd
NVIDIA B200
192GB · nvidia
NVIDIA H100 NVL
188GB · nvidia
NVIDIA H200
141GB · nvidia
Intel Gaudi 3
128GB · intel

Frequently asked

What's the minimum VRAM to run Aya Expanse 32B?

22GB of VRAM is enough to run Aya Expanse 32B at the AWQ-INT4 quantization (file size 19.0 GB). Higher-quality quantizations need more.

Can I use Aya Expanse 32B commercially?

Aya Expanse 32B is released under the CC-BY-NC-4.0, which has restrictions for commercial use. Review the license terms before using it in a product.

What's the context length of Aya Expanse 32B?

Aya Expanse 32B supports a context window of 8,192 tokens (about 8K).

Source: huggingface.co/CohereForAI/aya-expanse-32b

Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.

Related — keep moving

Compare hardware
  • RTX 3090 vs RTX 5080 (24 vs 16 GB) →
  • Used 3090 vs 4090 →
Buyer guides
  • Best GPU for local AI — 32B-class models →
  • Best laptop for local AI →
  • Best Mac for local AI →
  • Best used GPU for local AI →
  • Will it run on my hardware? →
When it doesn't work
  • CUDA out of memory →
  • Ollama running slowly →
  • ROCm not detected →
  • Model keeps crashing →
Recommended hardware
  • NVIDIA GB200 NVL72 →
  • AMD Instinct MI355X →
  • AMD Instinct MI325X →
  • AMD Instinct MI300X →
  • NVIDIA B200 →
Alternatives
Aya 23 8BAya 23 35B
Before you buy

Verify Aya Expanse 32B runs on your specific hardware before committing money.

Will it run on my hardware? →Custom hardware comparison →GPU recommender (4 questions) →
RUNLOCALAI

Operator-grade instrument for local-AI hardware intelligence. Hand-written verdicts. Real benchmarks. Reproducible commands.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
  • Will it run?
GUIDES
  • Best GPU
  • Best laptop
  • Best Mac
  • Best used GPU
  • Best budget GPU
  • Best GPU for Ollama
  • Best GPU for SD
  • AI PC build $2K
  • CUDA vs ROCm
  • 16 vs 24 GB
  • Compare hardware
  • Custom compare
REF
  • Systems
  • Ecosystem maps
  • Pillar guides
  • Methodology
  • Glossary
  • Errors KB
  • Troubleshooting
  • Resources
  • Public API
EDITOR
  • About
  • About the author
  • Changelog
  • Latest
  • Updates
  • Submit benchmark
  • Send feedback
  • Trust
  • Editorial policy
  • How we make money
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

SYS · ONLINEUPTIME · 100%2026 · operator-owned
RUNLOCALAI · v38
Compare alternatives

Models worth comparing

Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.

Same tier
Models in the same parameter band as this one
  • Qwen 3 30B-A3B
    qwen · 30B
    unrated
  • Gemma 4 31B Dense
    gemma · 31B
    unrated
  • Nemotron 3 Nano (30B-A3B)
    other · 30B
    unrated
  • DeepSeek Coder V3
    deepseek · 33B
    unrated
Step up
More capable — bigger memory footprint
  • Llama 3.3 70B Instruct
    llama · 70B
    9.1/10
  • DeepSeek R1 Distill Llama 70B
    deepseek · 70B
    9.0/10
Step down
Smaller — faster, runs on weaker hardware
  • DeepSeek V3 Lite (16B MoE)
    deepseek · 16B
    unrated
  • Mistral Small 3 24B
    mistral · 24B
    8.4/10