EXAONE 4.0.1 32B
EXAONE 4.0.1 is a 32B model from LG AI Research with a 131K context window and a hybrid sliding-window/full-attention architecture. It runs in either standard chat mode or an explicit reasoning mode, and handles English, Korean, and Spanish. Tool-use for agentic pipelines is built in.
If you need strong Korean language handling plus a reasoning mode in one 32B package, EXAONE 4.0.1 is technically interesting. The hybrid attention and 131K context are real advantages for long-document work. That said, the commercial restriction is a hard blocker for most operators — this is research / internal tooling territory only. Hedge: worth testing if you're running non-commercial Korean pipelines on vLLM, but don't build a product on it until you have written clearance from LG.
›Why this rating
Auto-generated rating (Opus 4.7 judge, claude-opus-4-7). Overall 9.15/10. License is explicit 'exaone' custom license, correctly flagged non-commercial with honest weakness about needing written clearance. All metadata (32B, 131K context, hybrid attention, Korean/English/Spanish) is directly verifiable from the card. Editorial voice is operator-grade — names the patch nature, calls out narrow inference support, and hedges appropriately. Use case is specific (Korean-English bilingual reasoning + agentic, non-commercial). Brand fit is slightly weaker since the commercial restriction limits the runlocalai audience to internal/research users, but the row handles this honestly rather than hiding it.
Overview
EXAONE 4.0.1 is a 32B model from LG AI Research with a 131K context window and a hybrid sliding-window/full-attention architecture. It runs in either standard chat mode or an explicit reasoning mode, and handles English, Korean, and Spanish. Tool-use for agentic pipelines is built in.
Strengths
- Switchable reasoning mode — no separate model needed for chain-of-thought tasks
- 131K token context window via hybrid attention (sliding window + full attention)
- Native Korean support alongside English and Spanish
- Agentic tool-use built into the base model
Weaknesses
- Not commercially usable without explicit permission from LG AI Research — check the EXAONE license before any production deployment
- Inference engine support is narrow: vLLM and TensorRT-LLM confirmed, others untested
- Low community adoption so far (6.6K downloads, 27 likes) — limited real-world reports to draw on
- 4.0.1 is a patch release specifically to reduce unintended outputs; the underlying issue is mitigated, not fully resolved
Quantization variants
Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.
| Quantization | File size | VRAM required |
|---|---|---|
| Q4_K_M | 17.6 GB | 23 GB |
Get the model
HuggingFace
Original weights
Source repository — direct quantization required.
Hardware that runs this
Cards with enough VRAM for at least one quantization of EXAONE 4.0.1 32B.
Models worth comparing
Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.
Frequently asked
What's the minimum VRAM to run EXAONE 4.0.1 32B?
Can I use EXAONE 4.0.1 32B commercially?
What's the context length of EXAONE 4.0.1 32B?
Source: huggingface.co/LGAI-EXAONE/EXAONE-4.0.1-32B
Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.
Related — keep moving
Verify EXAONE 4.0.1 32B runs on your specific hardware before committing money.