GLM-4V 9B
GLM-4 with vision encoder. Strong on Chinese document Q&A; restricted commercial license.
Positioning
GLM-4V 9B is a dense vision-language model from Zhipu AI, a Chinese vendor, released under the GLM License — a restricted commercial license that permits use but may impose limitations. With 13.9B parameters and an 8,192-token context window, it is designed primarily for Chinese document Q&A, integrating a vision encoder for multimodal understanding. Its distinct value in the open-weight landscape is its strong focus on Chinese-language document processing, making it a specialized tool rather than a general-purpose VLM.
Strengths
- Chinese document VLM specialization: GLM-4V 9B is purpose-built for Chinese document Q&A, offering strong performance on tasks like OCR, table extraction, and document comprehension in Chinese.
- Dense architecture for predictable scaling: As a dense 13.9B-parameter model, inference cost scales linearly with parameter count, making resource requirements straightforward to estimate.
- Consumer-deployment class: With quantized sizes as low as ~4.5 GB (Q2_K) and a 8K context, the model can run on single consumer GPUs with 8–12 GB VRAM at lower quants, enabling local deployment for many users.
- Vision-language capability: The integrated vision encoder allows processing of images and documents, extending beyond pure text to handle visual inputs like scanned documents and charts.
Limitations
- Restricted commercial license: The GLM License is not fully open-source; commercial use may require additional permissions or fees, limiting deployment flexibility.
- Short context window: At 8,192 tokens, the context is shorter than many modern models (e.g., 128K+), restricting handling of long documents or multi-turn conversations with large context.
- No community benchmarks available: We do not have independent, community-reported benchmark results for this model. Published vendor metrics should be treated as best-case.
- Niche focus: While strong on Chinese documents, performance on English or general-purpose vision tasks may be less competitive; operators should verify fit for their specific use case.
What it takes to run this locally
At FP16, the model requires 28 GB of disk space and roughly 28 GB of VRAM, placing it beyond most consumer GPUs. Quantization reduces requirements significantly: Q8_0 (15 GB) fits on a single 24 GB GPU; Q6_K (11.5 GB) and Q5_K_M (9.9 GB) fit on 16 GB GPUs; Q4_K_M (~7.8 GB) and lower quants fit on 12 GB or even 8 GB GPUs. Add 30–50% overhead for KV cache and framework at typical context lengths. Deployment class is consumer: a single GPU with 12–24 GB VRAM can run quantized versions, while FP16 requires a workstation or datacenter GPU.
Should you run this locally?
Yes if you need a VLM specialized for Chinese document Q&A and can accept the restricted GLM License for your use case. The model’s quantized sizes make it accessible on consumer hardware.
No if you require a fully open license (Apache 2.0 or MIT), need longer context (e.g., 32K+), or are targeting general-purpose English vision tasks where broader models may be more suitable.
Catalog cross-links
Overview
GLM-4 with vision encoder. Strong on Chinese document Q&A; restricted commercial license.
Family & lineage
How this model relates to others in its lineage. Family members share architecture and training-data roots; parent / children edges record direct distillation or fine-tune relationships.
Strengths
- Chinese document Q&A
- Vision-capable GLM
Weaknesses
- Restricted license
Quantization variants
Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.
| Quantization | File size | VRAM required |
|---|---|---|
| Q4_K_M | 8.5 GB | 12 GB |
Get the model
HuggingFace
Original weights
Source repository — direct quantization required.
Hardware that runs this
Cards with enough VRAM for at least one quantization of GLM-4V 9B.
Models worth comparing
Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.
Frequently asked
What's the minimum VRAM to run GLM-4V 9B?
Can I use GLM-4V 9B commercially?
What's the context length of GLM-4V 9B?
Does GLM-4V 9B support images?
Source: huggingface.co/THUDM/glm-4v-9b
Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.
Related — keep moving
Verify GLM-4V 9B runs on your specific hardware before committing money.