glm
753B parameters
Commercial OK
Reviewed June 2026

GLM-5.2

GLM-5.2 is Zhipu AI's (Z.ai) flagship open-weight LLM, released on Hugging Face as `zai-org/GLM-5.2` (2026-06). It is a Mixture-of-Experts decoder-only transformer with a new "IndexShare" sparse-attention design (one indexer shared across every four sparse-attention layers). Hugging Face safetensors metadata reports 753.3B total parameters (BF16); secondary sources cite ~744B total / ~40B active. `config.json` sets a 1,048,576-token (1M) context window. The weights are MIT-licensed (commercial use permitted) and the model is text-only (English + Chinese). The model card publishes a benchmark table versus GLM-5.1, Claude Opus 4.8, and GPT-5.5 — all figures vendor-reported and not independently verified by RunLocalAI.

License: MIT·Released Jun 16, 2026·Context: 1,048,576 tokens
BLK · VERDICT

Our verdict

OP · Fredoline Eruo|VERIFIED JUN 29, 2026
unrated

Positioning

GLM-5.2 is Zhipu AI's June 2026 flagship open-weight MoE — a 753B-parameter model (Hugging Face safetensors metadata; secondary sources cite ~744B total / ~40B active) under a clean MIT license with a 1M-token context. It is the successor to GLM-5 and, per the Artificial Analysis Intelligence Index, the strongest open-weight model of the month (reported #1 open / #4 overall, behind only the closed frontier).

What stands out

The headline is its IndexShare sparse attention — one indexer shared across every four sparse-attention layers — which the vendor reports cuts long-context FLOPs substantially, making 1M context cheaper to serve than dense attention. The genuinely permissive MIT license is the real differentiator: unlike many "open" models with usage caveats, GLM-5.2's weights are clean for commercial use.

Honest caveats

Every benchmark on the model card is vendor-reported — Z.ai published no third-party numbers at launch, and some commentators flag possible benchmark-tuning. We have not independently reproduced them. At 753B parameters (1.5 TB in BF16) this is multi-GPU server-class, not a single-consumer-GPU model. Realistic self-hosting means vLLM or SGLang across multiple datacenter cards, or an API.

Verdict

Run it if you need the most capable permissively-licensed open model for long-horizon agentic coding and you have server-class hardware (or use a host). Skip local self-hosting on consumer hardware — reach for a smaller GLM or a distill. The MIT license plus 1M context make it a strong base for teams that need to own their stack.

Overview

GLM-5.2 is Zhipu AI's (Z.ai) flagship open-weight LLM, released on Hugging Face as `zai-org/GLM-5.2` (2026-06). It is a Mixture-of-Experts decoder-only transformer with a new "IndexShare" sparse-attention design (one indexer shared across every four sparse-attention layers). Hugging Face safetensors metadata reports 753.3B total parameters (BF16); secondary sources cite ~744B total / ~40B active. `config.json` sets a 1,048,576-token (1M) context window. The weights are MIT-licensed (commercial use permitted) and the model is text-only (English + Chinese). The model card publishes a benchmark table versus GLM-5.1, Claude Opus 4.8, and GPT-5.5 — all figures vendor-reported and not independently verified by RunLocalAI.

Family & lineage

How this model relates to others in its lineage. Family members share architecture and training-data roots; parent / children edges record direct distillation or fine-tune relationships.

Family siblings (glm)
GLM-5200B
Frontier
GLM-5.2753B
You are here

Strengths

    Weaknesses

      Quantization variants

      Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.

      QuantizationFile sizeVRAM required

      Get the model

      HuggingFace

      Original weights

      huggingface.co/zai-org/GLM-5.2

      Source repository — direct quantization required.

      Hardware that runs this

      Cards with enough VRAM for at least one quantization of GLM-5.2.

      Compare alternatives

      Models worth comparing

      Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.

      Step up
      More capable — bigger memory footprint
      No verdicted models in the next tier up yet.

      Frequently asked

      Can I use GLM-5.2 commercially?

      Yes — GLM-5.2 ships under the MIT, which permits commercial use. Always read the license text before deployment.

      What's the context length of GLM-5.2?

      GLM-5.2 supports a context window of 1,048,576 tokens (about 1049K).

      Source: huggingface.co/zai-org/GLM-5.2

      Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.

      Related — keep moving

      Alternatives
      Before you buy

      Verify GLM-5.2 runs on your specific hardware before committing money.