VibeThinker-3B
VibeThinker-3B is a compact open-weight reasoning model from WeiboAI (Sina Weibo), fine-tuned from Qwen2.5-Coder-3B (Hugging Face `WeiboAI/VibeThinker-3B`, 2026-06). It is a dense ~3B-parameter model (runs in roughly 6.7GB VRAM) specialized for verifiable math, coding, and STEM reasoning — not a general-purpose assistant. `config.json` exposes a 131,072-token context, though the authors describe a ~64K effective training window. MIT-licensed (commercial use permitted). Author-reported benchmarks (arXiv 2606.16140) cite strong AIME / LiveCodeBench / IMO-AnswerBench scores — vendor-reported and not independently verified. Its small size makes it the most consumer- and edge-local reasoning model of the June 2026 batch.
Positioning
VibeThinker-3B is the standout small model of June 2026 — a dense ~3B reasoning specialist from WeiboAI, fine-tuned from Qwen2.5-Coder-3B, under a clean MIT license. Its entire pitch is verifiable reasoning (math, coding, STEM) on hardware anyone owns.
What stands out
It runs in ~6.7 GB VRAM — a single 8 GB consumer GPU, or even CPU — yet the authors report frontier-class math/coding scores (AIME, LiveCodeBench, IMO-AnswerBench) for its size. For offline, air-gapped, or edge reasoning, a 3B MIT model that does real chain-of-thought is genuinely useful where a 70B will not fit. This is the most consumer- and edge-local item of the month — check whether your GPU clears it on will-it-run.
Honest caveats
It is a specialist, not a generalist — no tool/agent training, weaker on broad knowledge (GPQA-Diamond ~70). All scores are author-reported (arXiv 2606.16140) and third-party-unverified. The config exposes 128K context but the authors describe a ~64K effective training window, so treat very-long-context use cautiously.
Verdict
Run it if you want local, offline, verifiable math/coding reasoning on a single consumer GPU and you keep it to its lane. Do not expect a general assistant or agentic tool use. For its size and license it is the easiest "reasoning model on my own machine" entry point in the catalog — pair it with Ollama and a small quant.
Overview
VibeThinker-3B is a compact open-weight reasoning model from WeiboAI (Sina Weibo), fine-tuned from Qwen2.5-Coder-3B (Hugging Face `WeiboAI/VibeThinker-3B`, 2026-06). It is a dense ~3B-parameter model (runs in roughly 6.7GB VRAM) specialized for verifiable math, coding, and STEM reasoning — not a general-purpose assistant. `config.json` exposes a 131,072-token context, though the authors describe a ~64K effective training window. MIT-licensed (commercial use permitted). Author-reported benchmarks (arXiv 2606.16140) cite strong AIME / LiveCodeBench / IMO-AnswerBench scores — vendor-reported and not independently verified. Its small size makes it the most consumer- and edge-local reasoning model of the June 2026 batch.
Strengths
Weaknesses
Quantization variants
Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.
| Quantization | File size | VRAM required |
|---|
Get the model
HuggingFace
Original weights
Source repository — direct quantization required.
Hardware that runs this
Cards with enough VRAM for at least one quantization of VibeThinker-3B.
Models worth comparing
Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.
Frequently asked
Can I use VibeThinker-3B commercially?
What's the context length of VibeThinker-3B?
Source: huggingface.co/WeiboAI/VibeThinker-3B
Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.
Related — keep moving
Verify VibeThinker-3B runs on your specific hardware before committing money.