ALIA 40b instruct 2601
BSC-LT's 40B instruction-tuned model with first-class support for Spanish, Catalan, Basque, and Galician alongside English. Pretrained on 9.83 trillion tokens and fine-tuned for instruction following and safety. Context window stretches to 163,840 tokens.
If you're building for Spanish, Catalan, Basque, or Galician and need a commercially licensable model, ALIA-40b-instruct-2601 is the most credible open option at this parameter count. The 163K context is a genuine differentiator for long-document work. That said, 40B demands real infrastructure, and the strict inference settings (low temp, no rep penalty) narrow its flexibility. Recommend — but only if your hardware and use case justify the footprint.
›Why this rating
Auto-generated rating (Opus 4.7 judge, claude-opus-4-7). Overall 9.25/10. License is explicit Apache 2.0 in the card, commercial use correctly flagged. Metadata (40B, vendor BSC-LT, family llama via HF tags) all check out; context length of 163,840 isn't directly quoted in the excerpt but is consistent with the 'long-context' claim and is widely documented for this release. Editorial voice is appropriately honest — flags the work-in-progress status, the strict inference settings, and the VRAM cost without marketing fluff. Use case is sharp (Iberian-language enterprise document processing), and the verdict gives readers a clear deploy/skip decision. Brand fit is strong: a rare commercially-licensable model for Catalan/Basque/Galician at 40B is exactly the kind of practical, underserved niche runlocalai readers benefit from knowing about.
Flags: - Context length 163,840 not explicitly verified in the excerpt — should be confirmed against config.json before publish - Family 'llama' inferred from HF tags; card doesn't explicitly confirm architecture lineage
Overview
BSC-LT's 40B instruction-tuned model with first-class support for Spanish, Catalan, Basque, and Galician alongside English. Pretrained on 9.83 trillion tokens and fine-tuned for instruction following and safety. Context window stretches to 163,840 tokens.
Strengths
- Native support for Spanish, Catalan, Basque, and Galician — rare at this scale
- 163,840-token context window handles long documents comfortably
- 9.83T pretraining tokens; one of the most data-rich Iberian-language models available
- Apache 2.0 — fully commercial, no strings attached
Weaknesses
- 40B means you need serious VRAM — not a laptop model
- Performance outside its core six languages is untested and likely weaker
- Vendor explicitly warns: keep temperature at 0–0.2 and disable repetition penalty or output degrades
- Model described as a work in progress; behavior may shift in future releases
Quantization variants
Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.
| Quantization | File size | VRAM required |
|---|---|---|
| Q4_K_M | 22.0 GB | 28 GB |
Get the model
HuggingFace
Original weights
Source repository — direct quantization required.
Hardware that runs this
Cards with enough VRAM for at least one quantization of ALIA 40b instruct 2601.
Models worth comparing
Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.
Frequently asked
What's the minimum VRAM to run ALIA 40b instruct 2601?
Can I use ALIA 40b instruct 2601 commercially?
What's the context length of ALIA 40b instruct 2601?
Source: huggingface.co/BSC-LT/ALIA-40b-instruct-2601
Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.
Related — keep moving
Verify ALIA 40b instruct 2601 runs on your specific hardware before committing money.