GLM-5.2
GLM-5.2 is Zhipu AI's (Z.ai) flagship open-weight LLM, released on Hugging Face as `zai-org/GLM-5.2` (2026-06). It is a Mixture-of-Experts decoder-only transformer with a new "IndexShare" sparse-attention design (one indexer shared across every four sparse-attention layers). Hugging Face safetensors metadata reports 753.3B total parameters (BF16); secondary sources cite ~744B total / ~40B active. `config.json` sets a 1,048,576-token (1M) context window. The weights are MIT-licensed (commercial use permitted) and the model is text-only (English + Chinese). The model card publishes a benchmark table versus GLM-5.1, Claude Opus 4.8, and GPT-5.5 — all figures vendor-reported and not independently verified by RunLocalAI.
Positioning
GLM-5.2 is Zhipu AI's June 2026 flagship open-weight MoE — a 753B-parameter model (Hugging Face safetensors metadata; secondary sources cite ~744B total / ~40B active) under a clean MIT license with a 1M-token context. It is the successor to GLM-5 and, per the Artificial Analysis Intelligence Index, the strongest open-weight model of the month (reported #1 open / #4 overall, behind only the closed frontier).
What stands out
The headline is its IndexShare sparse attention — one indexer shared across every four sparse-attention layers — which the vendor reports cuts long-context FLOPs substantially, making 1M context cheaper to serve than dense attention. The genuinely permissive MIT license is the real differentiator: unlike many "open" models with usage caveats, GLM-5.2's weights are clean for commercial use.
Honest caveats
Every benchmark on the model card is vendor-reported — Z.ai published no third-party numbers at launch, and some commentators flag possible benchmark-tuning. We have not independently reproduced them. At 753B parameters (1.5 TB in BF16) this is multi-GPU server-class, not a single-consumer-GPU model. Realistic self-hosting means vLLM or SGLang across multiple datacenter cards, or an API.
Verdict
Run it if you need the most capable permissively-licensed open model for long-horizon agentic coding and you have server-class hardware (or use a host). Skip local self-hosting on consumer hardware — reach for a smaller GLM or a distill. The MIT license plus 1M context make it a strong base for teams that need to own their stack.
Overview
GLM-5.2 is Zhipu AI's (Z.ai) flagship open-weight LLM, released on Hugging Face as `zai-org/GLM-5.2` (2026-06). It is a Mixture-of-Experts decoder-only transformer with a new "IndexShare" sparse-attention design (one indexer shared across every four sparse-attention layers). Hugging Face safetensors metadata reports 753.3B total parameters (BF16); secondary sources cite ~744B total / ~40B active. `config.json` sets a 1,048,576-token (1M) context window. The weights are MIT-licensed (commercial use permitted) and the model is text-only (English + Chinese). The model card publishes a benchmark table versus GLM-5.1, Claude Opus 4.8, and GPT-5.5 — all figures vendor-reported and not independently verified by RunLocalAI.
Family & lineage
How this model relates to others in its lineage. Family members share architecture and training-data roots; parent / children edges record direct distillation or fine-tune relationships.
Strengths
Weaknesses
Quantization variants
Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.
| Quantization | File size | VRAM required |
|---|
Get the model
HuggingFace
Original weights
Source repository — direct quantization required.
Hardware that runs this
Cards with enough VRAM for at least one quantization of GLM-5.2.
Models worth comparing
Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.
Frequently asked
Can I use GLM-5.2 commercially?
What's the context length of GLM-5.2?
Source: huggingface.co/zai-org/GLM-5.2
Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.
Related — keep moving
Verify GLM-5.2 runs on your specific hardware before committing money.