SigLIP SO400M (patch14-384)
428M-parameter Shape-Optimized vision-language encoder trained with the sigmoid (not softmax) contrastive loss on WebLI. Hits ~83% zero-shot ImageNet-1k top-1 at 384px — the strongest open contrastive encoder in its size class and the de facto vision tower for PaliGemma, Idefics, and most modern open VLMs.
The default open contrastive encoder. Unless you specifically need SigLIP 2 features or a tiny patch-16 variant, this is the one to reach for. Almost every open VLM you've heard of uses it as the eyes.
Overview
428M-parameter Shape-Optimized vision-language encoder trained with the sigmoid (not softmax) contrastive loss on WebLI. Hits ~83% zero-shot ImageNet-1k top-1 at 384px — the strongest open contrastive encoder in its size class and the de facto vision tower for PaliGemma, Idefics, and most modern open VLMs.
Strengths
- Best-in-class zero-shot ImageNet: ~83% top-1 at 384px with only 428M params
- Sigmoid loss enables stable training at large batch sizes — outperforms equivalent-size CLIP
- Apache-2.0, no usage strings
- SO400M 'shape-optimized' arch — Pareto-better params-vs-quality than ViT-L/H
- Universal embedder: powers PaliGemma, Idefics3, Mantis, MiniCPM-V and many open VLMs
Weaknesses
- Pure encoder — no generative head, you build the downstream task
- Pre-tokenizer text tower caps at 64 tokens — short captions only
- Patch-14 is heavier than the patch-16 variant at the same resolution
- Superseded for some tasks by SigLIP 2 (released later) — check before committing
Quantization variants
Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.
| Quantization | File size | VRAM required |
|---|---|---|
| Q4_K_M | 0.3 GB | 1 GB |
Get the model
HuggingFace
Original weights
Source repository — direct quantization required.
Hardware that runs this
Cards with enough VRAM for at least one quantization of SigLIP SO400M (patch14-384).
Models worth comparing
Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.
Frequently asked
What's the minimum VRAM to run SigLIP SO400M (patch14-384)?
Can I use SigLIP SO400M (patch14-384) commercially?
What's the context length of SigLIP SO400M (patch14-384)?
Source: huggingface.co/google/siglip-so400m-patch14-384
Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.
Related — keep moving
Verify SigLIP SO400M (patch14-384) runs on your specific hardware before committing money.