other

0.428B parameters

Commercial OK

Reviewed May 2026

SigLIP SO400M (patch14-384)

428M-parameter Shape-Optimized vision-language encoder trained with the sigmoid (not softmax) contrastive loss on WebLI. Hits ~83% zero-shot ImageNet-1k top-1 at 384px — the strongest open contrastive encoder in its size class and the de facto vision tower for PaliGemma, Idefics, and most modern open VLMs.

License: apache-2.0·Context: 0 tokens

BLK · VERDICT

Our verdict

OP · Eruo Fredoline|VERIFIED MAY 29, 2026

unrated

The default open contrastive encoder. Unless you specifically need SigLIP 2 features or a tiny patch-16 variant, this is the one to reach for. Almost every open VLM you've heard of uses it as the eyes.

Overview

428M-parameter Shape-Optimized vision-language encoder trained with the sigmoid (not softmax) contrastive loss on WebLI. Hits ~83% zero-shot ImageNet-1k top-1 at 384px — the strongest open contrastive encoder in its size class and the de facto vision tower for PaliGemma, Idefics, and most modern open VLMs.

Strengths

Best-in-class zero-shot ImageNet: ~83% top-1 at 384px with only 428M params
Sigmoid loss enables stable training at large batch sizes — outperforms equivalent-size CLIP
Apache-2.0, no usage strings
SO400M 'shape-optimized' arch — Pareto-better params-vs-quality than ViT-L/H
Universal embedder: powers PaliGemma, Idefics3, Mantis, MiniCPM-V and many open VLMs

Weaknesses

Pure encoder — no generative head, you build the downstream task
Pre-tokenizer text tower caps at 64 tokens — short captions only
Patch-14 is heavier than the patch-16 variant at the same resolution
Superseded for some tasks by SigLIP 2 (released later) — check before committing

Quantization variants

Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.

Quantization	File size	VRAM required
Q4_K_M	0.3 GB	1 GB

Get the model

HuggingFace

Original weights

huggingface.co/google/siglip-so400m-patch14-384

Source repository — direct quantization required.

Hardware that runs this

Cards with enough VRAM for at least one quantization of SigLIP SO400M (patch14-384).

NVIDIA GB200 NVL72

13824GB · nvidia

AMD Instinct MI350X

NVIDIA B300 (Blackwell Ultra)

288GB · nvidia

AMD Instinct MI355X

AMD Instinct MI325X

AMD Instinct MI300X

192GB · nvidia

NVIDIA H100 NVL

188GB · nvidia

Compare alternatives

Models worth comparing

Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.

Same tier

Models in the same parameter band as this one

Step up

More capable — bigger memory footprint

Step down

Smaller — faster, runs on weaker hardware

No verdicted models in the next tier down yet.

Frequently asked

What's the minimum VRAM to run SigLIP SO400M (patch14-384)?

1GB of VRAM is enough to run SigLIP SO400M (patch14-384) at the Q4_K_M quantization (file size 0.3 GB). Higher-quality quantizations need more.

Can I use SigLIP SO400M (patch14-384) commercially?

Yes — SigLIP SO400M (patch14-384) ships under the apache-2.0, which permits commercial use. Always read the license text before deployment.

What's the context length of SigLIP SO400M (patch14-384)?

SigLIP SO400M (patch14-384) supports a context window of 0 tokens (about 0K).

Source: huggingface.co/google/siglip-so400m-patch14-384

Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.

Related — keep moving

Compare hardware

Buyer guides

When it doesn't work

Recommended hardware

Before you buy

Verify SigLIP SO400M (patch14-384) runs on your specific hardware before committing money.

Will it run on my hardware? →Custom hardware comparison →GPU recommender (4 questions) →