Swallow 7B
Swallow 7B is a Japanese-English base model built by continual pre-training on top of Llama 2 7B with additional Japanese text. TokyoTech-LLM also expanded the tokenizer vocabulary to represent Japanese more efficiently, which reduces token count and speeds up inference. This is a raw base model — it has no instruction tuning or chat formatting out of the box.
If you need a Japanese-capable 7B base to fine-tune on your own data, Swallow is a credible starting point with a real tokenizer improvement over raw Llama 2. That said, the Llama 2 license rules out commercial deployment entirely, which is a hard stop for most production use cases. For research or internal tooling where licensing is not a concern, it's worth a look. If you need something you can ship commercially, skip this and look elsewhere.
›Why this rating
Auto-generated rating (Opus 4.7 judge, claude-opus-4-7). Overall 9.03/10. License is correctly identified as llama2 and matches the HF card; the licenseCommercialOk: false call is defensible given Llama 2's restrictions and is explicitly flagged in weaknesses and verdict. Metadata (7B params, llama family, vendor, context 4096) aligns with the Llama 2 base. Description is honest and operator-voiced, correctly noting this is a raw base model with no instruction tuning, tokenizer expansion benefit, and slight English regression — all consistent with the Swallow paper. Best use case is appropriately narrow (Japanese fine-tuning base). Brand fit is moderate: it's a base model with no GGUF mentioned and Llama 2 license blocks commercial use, which limits the runlocalai audience, but it's still a legitimate fine-tuning starting point worth cataloging.
Flags: - Llama 2 commercial use is technically permitted under 700M MAU threshold — 'blocks commercial use entirely' is slightly overstated, though acceptable as a conservative editorial stance - No GGUF/quantization availability mentioned for local deployment path
Overview
Swallow 7B is a Japanese-English base model built by continual pre-training on top of Llama 2 7B with additional Japanese text. TokyoTech-LLM also expanded the tokenizer vocabulary to represent Japanese more efficiently, which reduces token count and speeds up inference. This is a raw base model — it has no instruction tuning or chat formatting out of the box.
Strengths
- Continual pre-training on Japanese data measurably improves Japanese benchmark scores over base Llama 2 7B
- Expanded Japanese vocabulary tokenizer lowers token count for Japanese text, improving inference throughput
- Bilingual — handles both English and Japanese
Weaknesses
- No instruction tuning — not usable as a chat assistant without further fine-tuning
- English performance regresses slightly versus base Llama 2 on several standard benchmarks
- Llama 2 license blocks commercial use
- 4096-token context is tight by current standards
Quantization variants
Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.
| Quantization | File size | VRAM required |
|---|---|---|
| Q4_K_M | 3.9 GB | 5 GB |
Get the model
HuggingFace
Original weights
Source repository — direct quantization required.
Hardware that runs this
Cards with enough VRAM for at least one quantization of Swallow 7B.
Models worth comparing
Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.
Frequently asked
What's the minimum VRAM to run Swallow 7B?
Can I use Swallow 7B commercially?
What's the context length of Swallow 7B?
Source: huggingface.co/tokyotech-llm/Swallow-7b-hf
Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.
Related — keep moving
Verify Swallow 7B runs on your specific hardware before committing money.