glm
13.9B parameters
Restricted
Multimodal
Reviewed June 2026

GLM-4V 9B

GLM-4 with vision encoder. Strong on Chinese document Q&A; restricted commercial license.

License: GLM License·Released Jun 4, 2024·Context: 8,192 tokens
BLK · VERDICT

Our verdict

OP · Fredoline Eruo|VERIFIED JUN 12, 2026
unrated

Positioning

GLM-4V 9B is a dense vision-language model from Zhipu AI, a Chinese vendor, released under the GLM License — a restricted commercial license that permits use but may impose limitations. With 13.9B parameters and an 8,192-token context window, it is designed primarily for Chinese document Q&A, integrating a vision encoder for multimodal understanding. Its distinct value in the open-weight landscape is its strong focus on Chinese-language document processing, making it a specialized tool rather than a general-purpose VLM.

Strengths

  • Chinese document VLM specialization: GLM-4V 9B is purpose-built for Chinese document Q&A, offering strong performance on tasks like OCR, table extraction, and document comprehension in Chinese.
  • Dense architecture for predictable scaling: As a dense 13.9B-parameter model, inference cost scales linearly with parameter count, making resource requirements straightforward to estimate.
  • Consumer-deployment class: With quantized sizes as low as ~4.5 GB (Q2_K) and a 8K context, the model can run on single consumer GPUs with 8–12 GB VRAM at lower quants, enabling local deployment for many users.
  • Vision-language capability: The integrated vision encoder allows processing of images and documents, extending beyond pure text to handle visual inputs like scanned documents and charts.

Limitations

  • Restricted commercial license: The GLM License is not fully open-source; commercial use may require additional permissions or fees, limiting deployment flexibility.
  • Short context window: At 8,192 tokens, the context is shorter than many modern models (e.g., 128K+), restricting handling of long documents or multi-turn conversations with large context.
  • No community benchmarks available: We do not have independent, community-reported benchmark results for this model. Published vendor metrics should be treated as best-case.
  • Niche focus: While strong on Chinese documents, performance on English or general-purpose vision tasks may be less competitive; operators should verify fit for their specific use case.

What it takes to run this locally

At FP16, the model requires 28 GB of disk space and roughly 28 GB of VRAM, placing it beyond most consumer GPUs. Quantization reduces requirements significantly: Q8_0 (15 GB) fits on a single 24 GB GPU; Q6_K (11.5 GB) and Q5_K_M (9.9 GB) fit on 16 GB GPUs; Q4_K_M (~7.8 GB) and lower quants fit on 12 GB or even 8 GB GPUs. Add 30–50% overhead for KV cache and framework at typical context lengths. Deployment class is consumer: a single GPU with 12–24 GB VRAM can run quantized versions, while FP16 requires a workstation or datacenter GPU.

Should you run this locally?

Yes if you need a VLM specialized for Chinese document Q&A and can accept the restricted GLM License for your use case. The model’s quantized sizes make it accessible on consumer hardware.

No if you require a fully open license (Apache 2.0 or MIT), need longer context (e.g., 32K+), or are targeting general-purpose English vision tasks where broader models may be more suitable.

Catalog cross-links

Overview

GLM-4 with vision encoder. Strong on Chinese document Q&A; restricted commercial license.

Family & lineage

How this model relates to others in its lineage. Family members share architecture and training-data roots; parent / children edges record direct distillation or fine-tune relationships.

Parent / base model
GLM-4 9B9B
Consumer

Strengths

  • Chinese document Q&A
  • Vision-capable GLM

Weaknesses

  • Restricted license

Quantization variants

Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.

QuantizationFile sizeVRAM required
Q4_K_M8.5 GB12 GB

Get the model

HuggingFace

Original weights

huggingface.co/THUDM/glm-4v-9b

Source repository — direct quantization required.

Hardware that runs this

Cards with enough VRAM for at least one quantization of GLM-4V 9B.

Compare alternatives

Models worth comparing

Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.

Frequently asked

What's the minimum VRAM to run GLM-4V 9B?

12GB of VRAM is enough to run GLM-4V 9B at the Q4_K_M quantization (file size 8.5 GB). Higher-quality quantizations need more.

Can I use GLM-4V 9B commercially?

GLM-4V 9B is released under the GLM License, which has restrictions for commercial use. Review the license terms before using it in a product.

What's the context length of GLM-4V 9B?

GLM-4V 9B supports a context window of 8,192 tokens (about 8K).

Does GLM-4V 9B support images?

Yes — GLM-4V 9B is multimodal and accepts text + vision inputs. Vision support requires a runner that handles its image-conditioning architecture.

Source: huggingface.co/THUDM/glm-4v-9b

Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.

Related — keep moving

Before you buy

Verify GLM-4V 9B runs on your specific hardware before committing money.