stepfun
0.58B parameters
Commercial OK
Reviewed May 2026

GOT-OCR 2.0

580M-parameter end-to-end OCR-2.0 model: a vision encoder paired with a Qwen-based decoder, trained specifically for general OCR including math formulas (LaTeX out), tables (Markdown/HTML out), sheet music, geometric shapes, and dense multi-column documents.

License: apache-2.0·Context: 0 tokens
BLK · VERDICT

Our verdict

OP · Fredoline Eruo|VERIFIED MAY 29, 2026
unrated

The open-source answer for formula and table OCR — beats Nougat decisively and runs on a potato. Standard transformers integration is rough; commit to the custom inference path before adopting.

Overview

580M-parameter end-to-end OCR-2.0 model: a vision encoder paired with a Qwen-based decoder, trained specifically for general OCR including math formulas (LaTeX out), tables (Markdown/HTML out), sheet music, geometric shapes, and dense multi-column documents.

Strengths

  • End-to-end formula OCR — outputs LaTeX directly, beats Nougat and most pipelines
  • Table OCR straight to Markdown/HTML, preserving structure
  • Apache-2.0, fully commercial-friendly
  • Only 580M params — sub-2GB VRAM at FP16, viable on edge/CPU
  • Multilingual including CJK; supports interactive region/color-based OCR

Weaknesses

  • Requires custom inference code (trust-remote-code) — not a vanilla transformers pipeline
  • Single-purpose: only OCR, no captioning, VQA, or chat
  • Quality on noisy phone-camera photos lags commercial OCR (Azure Document Intelligence, Textract)
  • Limited fine-tuning recipes published — adapting to a new domain is non-obvious

Quantization variants

Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.

QuantizationFile sizeVRAM required
Q4_K_M0.3 GB1 GB

Get the model

HuggingFace

Original weights

huggingface.co/stepfun-ai/GOT-OCR2_0

Source repository — direct quantization required.

Hardware that runs this

Cards with enough VRAM for at least one quantization of GOT-OCR 2.0.

Compare alternatives

Models worth comparing

Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.

Step up
More capable — bigger memory footprint
Step down
Smaller — faster, runs on weaker hardware
No verdicted models in the next tier down yet.

Frequently asked

What's the minimum VRAM to run GOT-OCR 2.0?

1GB of VRAM is enough to run GOT-OCR 2.0 at the Q4_K_M quantization (file size 0.3 GB). Higher-quality quantizations need more.

Can I use GOT-OCR 2.0 commercially?

Yes — GOT-OCR 2.0 ships under the apache-2.0, which permits commercial use. Always read the license text before deployment.

What's the context length of GOT-OCR 2.0?

GOT-OCR 2.0 supports a context window of 0 tokens (about 0K).

Source: huggingface.co/stepfun-ai/GOT-OCR2_0

Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.

Related — keep moving

Before you buy

Verify GOT-OCR 2.0 runs on your specific hardware before committing money.