llama
8B parameters
Commercial OK
Reviewed May 2026

LLM-jp 4 8B Thinking

LLM-jp 4 8B Thinking is an 8B-parameter Japanese-English model from NII's Large Language Model R&D Center, post-trained with SFT and DPO to improve reasoning. It offers a generous 65,536-token context window and is free for commercial use under Apache 2.0. This is a research release — safety tuning has not been applied.

License: apache-2.0·Context: 65,536 tokens
BLK · VERDICT

Our verdict

OP · Fredoline Eruo|VERIFIED MAY 28, 2026
9.5/10

If you need a commercially licensed, long-context Japanese model with reasoning post-training, LLM-jp 4 8B Thinking is worth a look — especially at 8B where VRAM cost is manageable. The 11.7T token pre-training base is serious, and the DPO reasoning work is a real differentiator at this size. That said, the missing safety tuning makes it a skip for anything customer-facing without extra work on your end. Hedge: test it internally first, and budget time for tokenizer compatibility checks.

Why this rating

Auto-generated rating (Opus 4.7 judge, claude-opus-4-7). Overall 9.50/10. License is explicitly apache-2.0 on the HF card and commercial use is correctly flagged. Metadata (8B params, 65,536 context, llama family, Japanese/English) all match the card's architecture table verbatim. The description is honest and operator-voiced, properly flagging the lack of safety tuning, the tokenizer incompatibility with openai-harmony (a real footgun pulled directly from the card), and undisclosed training data portions. Best use case is specific enough (Japanese-English reasoning and document pipelines), and weaknesses give a reader real signal before committing. Brand fit is strong: a commercially-licensed bilingual reasoning model at 8B is exactly the runlocalai sweet spot.

Overview

LLM-jp 4 8B Thinking is an 8B-parameter Japanese-English model from NII's Large Language Model R&D Center, post-trained with SFT and DPO to improve reasoning. It offers a generous 65,536-token context window and is free for commercial use under Apache 2.0. This is a research release — safety tuning has not been applied.

Strengths

  • Pre-trained on 11.7T tokens across multi-stage and mid-training phases
  • 65,536-token context — handles long documents and conversations
  • Bilingual Japanese and English support
  • Apache 2.0 license: commercial use permitted with no royalties

Weaknesses

  • Explicitly not safety-tuned — not suitable for public-facing deployments without your own guardrails
  • Portions of training data are undisclosed due to third-party licensing restrictions
  • Tokenizer is incompatible with the openai-harmony library — check your stack before integrating
  • Low community traction so far (19K downloads, 38 likes) — limited third-party evaluations available

Quantization variants

Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.

QuantizationFile sizeVRAM required
Q4_K_M4.4 GB6 GB

Get the model

HuggingFace

Original weights

huggingface.co/llm-jp/llm-jp-4-8b-thinking

Source repository — direct quantization required.

Hardware that runs this

Cards with enough VRAM for at least one quantization of LLM-jp 4 8B Thinking.

Compare alternatives

Models worth comparing

Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.

Frequently asked

What's the minimum VRAM to run LLM-jp 4 8B Thinking?

6GB of VRAM is enough to run LLM-jp 4 8B Thinking at the Q4_K_M quantization (file size 4.4 GB). Higher-quality quantizations need more.

Can I use LLM-jp 4 8B Thinking commercially?

Yes — LLM-jp 4 8B Thinking ships under the apache-2.0, which permits commercial use. Always read the license text before deployment.

What's the context length of LLM-jp 4 8B Thinking?

LLM-jp 4 8B Thinking supports a context window of 65,536 tokens (about 66K).

Source: huggingface.co/llm-jp/llm-jp-4-8b-thinking

Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.

Related — keep moving

Before you buy

Verify LLM-jp 4 8B Thinking runs on your specific hardware before committing money.