LLM-jp 4 8B Thinking
LLM-jp 4 8B Thinking is an 8B-parameter Japanese-English model from NII's Large Language Model R&D Center, post-trained with SFT and DPO to improve reasoning. It offers a generous 65,536-token context window and is free for commercial use under Apache 2.0. This is a research release — safety tuning has not been applied.
If you need a commercially licensed, long-context Japanese model with reasoning post-training, LLM-jp 4 8B Thinking is worth a look — especially at 8B where VRAM cost is manageable. The 11.7T token pre-training base is serious, and the DPO reasoning work is a real differentiator at this size. That said, the missing safety tuning makes it a skip for anything customer-facing without extra work on your end. Hedge: test it internally first, and budget time for tokenizer compatibility checks.
›Why this rating
Auto-generated rating (Opus 4.7 judge, claude-opus-4-7). Overall 9.50/10. License is explicitly apache-2.0 on the HF card and commercial use is correctly flagged. Metadata (8B params, 65,536 context, llama family, Japanese/English) all match the card's architecture table verbatim. The description is honest and operator-voiced, properly flagging the lack of safety tuning, the tokenizer incompatibility with openai-harmony (a real footgun pulled directly from the card), and undisclosed training data portions. Best use case is specific enough (Japanese-English reasoning and document pipelines), and weaknesses give a reader real signal before committing. Brand fit is strong: a commercially-licensed bilingual reasoning model at 8B is exactly the runlocalai sweet spot.
Overview
LLM-jp 4 8B Thinking is an 8B-parameter Japanese-English model from NII's Large Language Model R&D Center, post-trained with SFT and DPO to improve reasoning. It offers a generous 65,536-token context window and is free for commercial use under Apache 2.0. This is a research release — safety tuning has not been applied.
Strengths
- Pre-trained on 11.7T tokens across multi-stage and mid-training phases
- 65,536-token context — handles long documents and conversations
- Bilingual Japanese and English support
- Apache 2.0 license: commercial use permitted with no royalties
Weaknesses
- Explicitly not safety-tuned — not suitable for public-facing deployments without your own guardrails
- Portions of training data are undisclosed due to third-party licensing restrictions
- Tokenizer is incompatible with the openai-harmony library — check your stack before integrating
- Low community traction so far (19K downloads, 38 likes) — limited third-party evaluations available
Quantization variants
Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.
| Quantization | File size | VRAM required |
|---|---|---|
| Q4_K_M | 4.4 GB | 6 GB |
Get the model
HuggingFace
Original weights
Source repository — direct quantization required.
Hardware that runs this
Cards with enough VRAM for at least one quantization of LLM-jp 4 8B Thinking.
Models worth comparing
Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.
Frequently asked
What's the minimum VRAM to run LLM-jp 4 8B Thinking?
Can I use LLM-jp 4 8B Thinking commercially?
What's the context length of LLM-jp 4 8B Thinking?
Source: huggingface.co/llm-jp/llm-jp-4-8b-thinking
Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.
Related — keep moving
Verify LLM-jp 4 8B Thinking runs on your specific hardware before committing money.