Florence-2 Large
770M-parameter unified vision foundation model with a DaViT image encoder and BART-style seq2seq decoder. One model, one set of weights — handles captioning, OCR, region/grounding, segmentation, and dense detection via task-prompt tokens. Trained on FLD-5B (5.4B annotations over 126M images).
Absurd value for 770M params and the most under-rated vision model Microsoft has shipped. Use it when you need many vision tasks on cheap hardware and a chat interface is not the point.
Overview
770M-parameter unified vision foundation model with a DaViT image encoder and BART-style seq2seq decoder. One model, one set of weights — handles captioning, OCR, region/grounding, segmentation, and dense detection via task-prompt tokens. Trained on FLD-5B (5.4B annotations over 126M images).
Strengths
- One 770M checkpoint for caption / detailed-caption / OCR / OCR-with-region / grounding / detection / segmentation
- Outperforms many task-specialist models 10x its size on COCO, RefCOCO, TextVQA
- MIT license, no usage restrictions
- Tiny by VLM standards — runs in <2GB VRAM at FP16, viable on CPU and edge devices
- Task-prompt API: <CAPTION>, <OD>, <OCR>, <REFERRING_EXPRESSION_SEGMENTATION>, etc.
Weaknesses
- Not a conversational VLM — pure task-prompt, no free-form chat
- Caption outputs are short and factual; verbose narration weaker than Qwen2.5-VL
- OCR is good for English print but lags GOT-OCR2 on formulas, complex tables, CJK
- Trust-remote-code required in transformers — extra friction for locked-down deployments
Quantization variants
Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.
| Quantization | File size | VRAM required |
|---|---|---|
| Q4_K_M | 0.4 GB | 1 GB |
Get the model
HuggingFace
Original weights
Source repository — direct quantization required.
Hardware that runs this
Cards with enough VRAM for at least one quantization of Florence-2 Large.
Models worth comparing
Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.
Frequently asked
What's the minimum VRAM to run Florence-2 Large?
Can I use Florence-2 Large commercially?
What's the context length of Florence-2 Large?
Source: huggingface.co/microsoft/Florence-2-large
Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.
Related — keep moving
Verify Florence-2 Large runs on your specific hardware before committing money.