Swallow 7B

Swallow 7B

Swallow 7B is a Japanese-English base model built by continual pre-training on top of Llama 2 7B with additional Japanese text. TokyoTech-LLM also expanded the tokenizer vocabulary to represent Japanese more efficiently, which reduces token count and speeds up inference. This is a raw base model — it has no instruction tuning or chat formatting out of the box.

License: llama2·Context: 4,096 tokens

Auto-generated rating (Opus 4.7 judge, claude-opus-4-7). Overall 9.03/10. License is correctly identified as llama2 and matches the HF card; the licenseCommercialOk: false call is defensible given Llama 2's restrictions and is explicitly flagged in weaknesses and verdict. Metadata (7B params, llama family, vendor, context 4096) aligns with the Llama 2 base. Description is honest and operator-voiced, correctly noting this is a raw base model with no instruction tuning, tokenizer expansion benefit, and slight English regression — all consistent with the Swallow paper. Best use case is appropriately narrow (Japanese fine-tuning base). Brand fit is moderate: it's a base model with no GGUF mentioned and Llama 2 license blocks commercial use, which limits the runlocalai audience, but it's still a legitimate fine-tuning starting point worth cataloging.

Flags: - Llama 2 commercial use is technically permitted under 700M MAU threshold — 'blocks commercial use entirely' is slightly overstated, though acceptable as a conservative editorial stance - No GGUF/quantization availability mentioned for local deployment path

Overview

Quantization	File size	VRAM required
Q4_K_M	3.9 GB	5 GB

Quantization

File size

VRAM required

Q4_K_M

3.9 GB

5 GB

Frequently asked

What's the minimum VRAM to run Swallow 7B?

5GB of VRAM is enough to run Swallow 7B at the Q4_K_M quantization (file size 3.9 GB). Higher-quality quantizations need more.

Can I use Swallow 7B commercially?

Swallow 7B is released under the llama2, which has restrictions for commercial use. Review the license terms before using it in a product.

What's the context length of Swallow 7B?

Swallow 7B supports a context window of 4,096 tokens (about 4K).

Our verdict

Overview

Strengths

Weaknesses

Quantization variants

Get the model

HuggingFace

Hardware that runs this

Models worth comparing

Frequently asked

What's the minimum VRAM to run Swallow 7B?

Can I use Swallow 7B commercially?

What's the context length of Swallow 7B?

Related — keep moving