stepfun

0.58B parameters

Commercial OK

Reviewed May 2026

GOT-OCR 2.0

580M-parameter end-to-end OCR-2.0 model: a vision encoder paired with a Qwen-based decoder, trained specifically for general OCR including math formulas (LaTeX out), tables (Markdown/HTML out), sheet music, geometric shapes, and dense multi-column documents.

License: apache-2.0·Context: 0 tokens

BLK · VERDICT

Our verdict

OP · Eruo Fredoline|VERIFIED MAY 29, 2026

unrated

The open-source answer for formula and table OCR — beats Nougat decisively and runs on a potato. Standard transformers integration is rough; commit to the custom inference path before adopting.

Overview

580M-parameter end-to-end OCR-2.0 model: a vision encoder paired with a Qwen-based decoder, trained specifically for general OCR including math formulas (LaTeX out), tables (Markdown/HTML out), sheet music, geometric shapes, and dense multi-column documents.

Strengths

End-to-end formula OCR — outputs LaTeX directly, beats Nougat and most pipelines
Table OCR straight to Markdown/HTML, preserving structure
Apache-2.0, fully commercial-friendly
Only 580M params — sub-2GB VRAM at FP16, viable on edge/CPU
Multilingual including CJK; supports interactive region/color-based OCR

Weaknesses

Requires custom inference code (trust-remote-code) — not a vanilla transformers pipeline
Single-purpose: only OCR, no captioning, VQA, or chat
Quality on noisy phone-camera photos lags commercial OCR (Azure Document Intelligence, Textract)
Limited fine-tuning recipes published — adapting to a new domain is non-obvious

Quantization variants

Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.

Quantization	File size	VRAM required
Q4_K_M	0.3 GB	1 GB

Get the model

HuggingFace

Original weights

huggingface.co/stepfun-ai/GOT-OCR2_0

Source repository — direct quantization required.

Hardware that runs this

Cards with enough VRAM for at least one quantization of GOT-OCR 2.0.

NVIDIA GB200 NVL72

13824GB · nvidia

AMD Instinct MI350X

NVIDIA B300 (Blackwell Ultra)

288GB · nvidia

AMD Instinct MI355X

AMD Instinct MI325X

AMD Instinct MI300X

192GB · nvidia

NVIDIA H100 NVL

188GB · nvidia

Compare alternatives

Models worth comparing

Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.

Same tier

Models in the same parameter band as this one

Step up

More capable — bigger memory footprint

Step down

Smaller — faster, runs on weaker hardware

No verdicted models in the next tier down yet.

Frequently asked

What's the minimum VRAM to run GOT-OCR 2.0?

1GB of VRAM is enough to run GOT-OCR 2.0 at the Q4_K_M quantization (file size 0.3 GB). Higher-quality quantizations need more.

Can I use GOT-OCR 2.0 commercially?

Yes — GOT-OCR 2.0 ships under the apache-2.0, which permits commercial use. Always read the license text before deployment.

What's the context length of GOT-OCR 2.0?

GOT-OCR 2.0 supports a context window of 0 tokens (about 0K).

Source: huggingface.co/stepfun-ai/GOT-OCR2_0

Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.

Related — keep moving

Compare hardware

Buyer guides

When it doesn't work

Recommended hardware

Before you buy

Verify GOT-OCR 2.0 runs on your specific hardware before committing money.

Will it run on my hardware? →Custom hardware comparison →GPU recommender (4 questions) →