InternLM 3 8B
Shanghai AI Lab's open-research line. InternLM 3 at 8B; strong on Chinese-language tasks.
Positioning
InternLM 3 8B is a dense 8-billion-parameter model released by Shanghai AI Lab under the InternLM License. With a 32,768-token context window, it is designed for Chinese-language consumer workloads. As part of the open-research InternLM line, it offers a permissive license for commercial use while targeting strong performance on Chinese-language tasks.
Strengths
- Chinese-language focus: Built and optimized for Chinese-language workloads, making it a strong choice for applications requiring native Chinese understanding.
- Permissive license: The InternLM License allows commercial deployment, giving operators flexibility for proprietary use.
- Consumer-friendly size: At 8B parameters, the model fits within consumer GPU memory constraints, especially with quantization.
- Long context: A 32K context window supports extended conversations or document processing without truncation.
Limitations
- No community benchmarks available: We do not have independent measurements for this model. Operators should treat published vendor metrics as best-case until verified.
- Niche language strength: While strong on Chinese, its performance on English or other languages is unverified and may be weaker than general-purpose models.
- Dense architecture: Unlike Mixture-of-Experts models, all 8B parameters are active per forward pass, meaning compute cost scales linearly with parameter count.
- License restrictions: The InternLM License may have specific terms that differ from Apache 2.0 or MIT; operators should review the full license text before deployment.
What it takes to run this locally
At FP16, the model requires ~16 GB of disk space. Quantized versions reduce this significantly: Q8_0 ~9 GB, Q6_K ~6.6 GB, Q5_K_M ~5.7 GB, Q4_K_M ~4.5 GB, Q3_K_M ~3.9 GB, Q2_K ~2.6 GB. Add ~30-50% for KV cache and framework overhead at typical context lengths. This places the model in the consumer deployment class: a single GPU with 12-24 GB VRAM can run Q4_K_M or lower quantizations comfortably.
Should you run this locally?
Yes if you need a permissively licensed model optimized for Chinese-language tasks and have a consumer GPU with at least 8 GB VRAM (for Q4_K_M or lower). The 32K context window is beneficial for Chinese document processing or long-form dialogue.
No if your primary language is English or you require verified benchmark performance. Without community benchmarks, the model's capabilities are uncertain. Consider a more widely tested model if reproducibility is critical.
Catalog cross-links
- InternLM 2 7B
- Qwen 2.5 7B
- Consumer GPU guide
Overview
Shanghai AI Lab's open-research line. InternLM 3 at 8B; strong on Chinese-language tasks.
Strengths
- Chinese-language strength
- Active research lineage
Weaknesses
- Commercial use restricted
Quantization variants
Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.
| Quantization | File size | VRAM required |
|---|---|---|
| Q4_K_M | 4.7 GB | 6 GB |
Get the model
HuggingFace
Original weights
Source repository — direct quantization required.
Hardware that runs this
Cards with enough VRAM for at least one quantization of InternLM 3 8B.
Models worth comparing
Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.
Frequently asked
What's the minimum VRAM to run InternLM 3 8B?
Can I use InternLM 3 8B commercially?
What's the context length of InternLM 3 8B?
Source: huggingface.co/internlm/internlm3-8b-instruct
Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.
Related — keep moving
Verify InternLM 3 8B runs on your specific hardware before committing money.