llama
7B parameters
Restricted
Reviewed May 2026

Swallow 7B

Swallow 7B is a Japanese-English base model built by continual pre-training on top of Llama 2 7B with additional Japanese text. TokyoTech-LLM also expanded the tokenizer vocabulary to represent Japanese more efficiently, which reduces token count and speeds up inference. This is a raw base model — it has no instruction tuning or chat formatting out of the box.

License: llama2·Context: 4,096 tokens
BLK · VERDICT

Our verdict

OP · Fredoline Eruo|VERIFIED MAY 28, 2026
9.0/10

If you need a Japanese-capable 7B base to fine-tune on your own data, Swallow is a credible starting point with a real tokenizer improvement over raw Llama 2. That said, the Llama 2 license rules out commercial deployment entirely, which is a hard stop for most production use cases. For research or internal tooling where licensing is not a concern, it's worth a look. If you need something you can ship commercially, skip this and look elsewhere.

Why this rating

Auto-generated rating (Opus 4.7 judge, claude-opus-4-7). Overall 9.03/10. License is correctly identified as llama2 and matches the HF card; the licenseCommercialOk: false call is defensible given Llama 2's restrictions and is explicitly flagged in weaknesses and verdict. Metadata (7B params, llama family, vendor, context 4096) aligns with the Llama 2 base. Description is honest and operator-voiced, correctly noting this is a raw base model with no instruction tuning, tokenizer expansion benefit, and slight English regression — all consistent with the Swallow paper. Best use case is appropriately narrow (Japanese fine-tuning base). Brand fit is moderate: it's a base model with no GGUF mentioned and Llama 2 license blocks commercial use, which limits the runlocalai audience, but it's still a legitimate fine-tuning starting point worth cataloging.

Flags: - Llama 2 commercial use is technically permitted under 700M MAU threshold — 'blocks commercial use entirely' is slightly overstated, though acceptable as a conservative editorial stance - No GGUF/quantization availability mentioned for local deployment path

Overview

Swallow 7B is a Japanese-English base model built by continual pre-training on top of Llama 2 7B with additional Japanese text. TokyoTech-LLM also expanded the tokenizer vocabulary to represent Japanese more efficiently, which reduces token count and speeds up inference. This is a raw base model — it has no instruction tuning or chat formatting out of the box.

Strengths

  • Continual pre-training on Japanese data measurably improves Japanese benchmark scores over base Llama 2 7B
  • Expanded Japanese vocabulary tokenizer lowers token count for Japanese text, improving inference throughput
  • Bilingual — handles both English and Japanese

Weaknesses

  • No instruction tuning — not usable as a chat assistant without further fine-tuning
  • English performance regresses slightly versus base Llama 2 on several standard benchmarks
  • Llama 2 license blocks commercial use
  • 4096-token context is tight by current standards

Quantization variants

Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.

QuantizationFile sizeVRAM required
Q4_K_M3.9 GB5 GB

Get the model

HuggingFace

Original weights

huggingface.co/tokyotech-llm/Swallow-7b-hf

Source repository — direct quantization required.

Hardware that runs this

Cards with enough VRAM for at least one quantization of Swallow 7B.

Compare alternatives

Models worth comparing

Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.

Frequently asked

What's the minimum VRAM to run Swallow 7B?

5GB of VRAM is enough to run Swallow 7B at the Q4_K_M quantization (file size 3.9 GB). Higher-quality quantizations need more.

Can I use Swallow 7B commercially?

Swallow 7B is released under the llama2, which has restrictions for commercial use. Review the license terms before using it in a product.

What's the context length of Swallow 7B?

Swallow 7B supports a context window of 4,096 tokens (about 4K).

Source: huggingface.co/tokyotech-llm/Swallow-7b-hf

Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.

Related — keep moving

Before you buy

Verify Swallow 7B runs on your specific hardware before committing money.