Aya 23 8B

Positioning

Aya 23 8B is a dense 8-billion-parameter multilingual model released by Cohere For AI under the CC-BY-NC 4.0 license, which restricts use to research and non-commercial applications. With a context window of 8,192 tokens, it covers 23 languages, making it a specialized tool for multilingual NLP research. Its relatively small size and dense architecture place it firmly in the consumer deployment class, accessible to operators with modest hardware.

Strengths

Multilingual coverage: Supports 23 languages, making it a rare open-weight option for cross-lingual research without requiring multiple models.
Consumer-friendly size: At 8B parameters, quantized versions fit comfortably on single consumer GPUs (e.g., Q4_K_M ~4.5 GB on disk).
Dense architecture simplicity: Unlike mixture-of-experts models, dense models have predictable memory and compute requirements, simplifying deployment.
Research-focused licensing: CC-BY-NC encourages academic exploration and reproducibility, though commercial use is prohibited.

Limitations

Non-commercial license only: CC-BY-NC prohibits commercial deployment, limiting its use in production or revenue-generating applications.
Modest context length: 8,192 tokens may be insufficient for long-document tasks or extended conversations.
No community benchmarks available: We do not have verified operator measurements for this model; published vendor metrics should be treated as best-case.
Single-language performance unknown: While multilingual, its per-language quality may vary; operators should evaluate for their specific target languages.

What it takes to run this locally

At 8B parameters, quantized model file sizes range from 16 GB (FP16) down to ~2.6 GB (Q2_K). For practical use, a Q4_K_M (4.5 GB) or Q5_K_M (~5.7 GB) quant offers a good balance of quality and memory footprint. Add ~30-50% for KV cache and framework overhead at typical context lengths. This fits comfortably on a single consumer GPU with 8-12 GB VRAM (e.g., RTX 3060 or 4060). No specific tokens-per-second claims are available.

Should you run this locally?

Yes if you are conducting multilingual NLP research and need a permissively licensed (for research) model that runs on consumer hardware. Its small size and dense architecture make it easy to experiment with on a single GPU.

No if you need commercial deployment rights, require longer context windows, or need state-of-the-art performance in a single language. The CC-BY-NC license and 8K context limit may be dealbreakers for production use.

Catalog cross-links

Aya 23 35B – larger sibling with more parameters
Cohere For AI – vendor page
Consumer GPU guide

Family & lineage

How this model relates to others in its lineage. Family members share architecture and training-data roots; parent / children edges record direct distillation or fine-tune relationships.

Quantization	File size	VRAM required
Q4_K_M	5.0 GB	8 GB

Quantization

File size

VRAM required

Q4_K_M

5.0 GB

8 GB

Frequently asked

What's the minimum VRAM to run Aya 23 8B?

8GB of VRAM is enough to run Aya 23 8B at the Q4_K_M quantization (file size 5.0 GB). Higher-quality quantizations need more.

Can I use Aya 23 8B commercially?

Aya 23 8B is released under the CC-BY-NC 4.0, which has restrictions for commercial use. Review the license terms before using it in a product.

What's the context length of Aya 23 8B?

Aya 23 8B supports a context window of 8,192 tokens (about 8K).

Our verdict

Positioning

Strengths

Limitations

What it takes to run this locally

Should you run this locally?

Catalog cross-links

Overview

Family & lineage

Strengths

Weaknesses

Quantization variants

Get the model

HuggingFace

Hardware that runs this

Models worth comparing

Frequently asked

What's the minimum VRAM to run Aya 23 8B?

Can I use Aya 23 8B commercially?

What's the context length of Aya 23 8B?

Related — keep moving