RUNLOCALAIv38
→WILL IT RUNBEST GPUCOMPARETROUBLESHOOTSTARTPULSEMODELSHARDWARETOOLSBENCH
RUNLOCALAI

Operator-grade instrument for local-AI hardware intelligence. Hand-written verdicts. Real benchmarks. Reproducible commands.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
  • Will it run?
GUIDES
  • Best GPU
  • Best laptop
  • Best Mac
  • Best used GPU
  • Best budget GPU
  • Best GPU for Ollama
  • Best GPU for SD
  • AI PC build $2K
  • CUDA vs ROCm
  • 16 vs 24 GB
  • Compare hardware
  • Custom compare
REF
  • Systems
  • Ecosystem maps
  • Pillar guides
  • Methodology
  • Glossary
  • Errors KB
  • Troubleshooting
  • Resources
  • Public API
EDITOR
  • About
  • About the author
  • Changelog
  • Latest
  • Updates
  • Submit benchmark
  • Send feedback
  • Trust
  • Editorial policy
  • How we make money
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

SYS · ONLINEUPTIME · 100%2026 · operator-owned
RUNLOCALAI · v38
Families/Text & Reasoning/Aya
Text & Reasoning
Open-weight
CC-BY-NC-4.0

Aya

by Cohere For AI

Cohere For AI's multilingual research family. Aya 23 + Aya Expanse cover 23+ languages with explicit balance — the strongest open-weight multilingual chat models for underserved languages (Arabic, Korean, Hebrew, Vietnamese).

Best entry point for local use

Start with Aya Expanse 8B at Q4_K_M via Ollama — fits on single RTX 3060 12GB at 5 GB VRAM. Aya is the only open-weight family that covers 101 languages with native-level quality — it was trained on the Aya Dataset (513M prompts across 114 languages) and Aya Collection (multilingual instruction data). If your use case is non-English text generation, Aya Expanse 8B outperforms Llama 3.1 8B and Qwen 3 8B on low-resource languages by 30-50% on native-speaker evaluations. For higher quality, Aya Expanse 32B Q4 (20 GB) fits on RTX 4090 24 GB. Skip Aya 23 35B — the Aya Expanse generation outperforms it in every language category. Aya uses Apache 2.0 license — no commercial restrictions, no MAU cap, full dataset available for reproduction.

Deployment guidance

For single-user local: Ollama + aya-expanse:8b Q4_K_M on RTX 3060 12GB or Apple M3 via MLX-LM. Aya uses standard Llama-compatible dense transformer architecture — any engine that runs Llama runs Aya. For multi-user serving: vLLM 0.6.0+ with AWQ 4-bit on 2× L4 24 GB — deploy separate instances per language group if traffic patterns vary by region. For translation pipelines: pair Aya Expanse with faster-whisper for speech-to-text in 101 languages — the tokenizer handles multilingual input natively. Aya's tokenizer is a Cohere Command R-derived 256K vocab — it is more token-efficient for non-English languages than Llama's English-optimized vocab. For deployment in regulated multilingual environments (government, healthcare, legal), Aya's Apache 2.0 license + full dataset transparency makes it the strongest compliance choice.

Featured models

Models in this family with our verdicts

Aya 23 8BAya Expanse 32BAya 23 35B

Recommended runtimes

vLLM

Related families

Command R

Related — keep moving

Compare hardware
  • RTX 3090 vs RTX 4090 →
  • RTX 4090 vs RTX 5090 →
Buyer guides
  • Best GPU for local AI →
  • Best laptop for local AI →
  • Best Mac for local AI →
  • Best used GPU for local AI →
  • Will it run on my hardware? →
When it doesn't work
  • CUDA out of memory →
  • Ollama running slowly →
  • ROCm not detected →
  • Model keeps crashing →
Runtimes that fit
  • vLLM →
Alternatives
Command R
Before you buy

Verify Aya runs on your specific hardware before committing money.

Will it run on my hardware? →Custom hardware comparison →GPU recommender (4 questions) →