yi
34B parameters
Commercial OK

Yi 1.5 34B

01.AI's 34B model. Solid bilingual EN/ZH performance, Apache 2.0.

License: Apache 2.0·Released May 12, 2024·Context: 16,384 tokens
Our verdict
By Fredoline Eruo·Last verified May 6, 2026
7.4/10
Positioning

A capable 34B-class generalist with good multilingual support, especially Chinese. Right pick for users who specifically want 01.AI's training distribution, or as a known-quantity baseline.

Strengths
  • 34B fits 24 GB at Q4_K_M — full GPU on a 4090.
  • Apache 2.0 license — clean commercial terms.
  • Strong Chinese-English — better than Qwen 2.5 32B on Chinese-specific tasks.
Limitations
  • Beaten by Qwen 3 32B on most general benchmarks.
  • Long-context recall weaker than spec.
  • Knowledge cutoff dated — 2024-era data.
Real-world performance on RTX 4090
  • Q4_K_M (20.7 GB): 65–80 tok/s decode — full GPU
  • Q5_K_M (24.4 GB): partial offload, 22–30 tok/s
  • Q8_0 (37 GB): workstation territory
Should you run this locally?

Yes, for Chinese-English-specific work, or as an Apache-licensed alternative to Qwen. No, for general English work — Qwen 3 32B is stronger at the same VRAM.

How it compares
  • vs Qwen 3 32B → Qwen wins on general capability; Yi has cleaner license.
  • vs Mistral Small 3 24B → Mistral wins on instruction polish; Yi has slight edge on Chinese.
  • vs Llama 3.3 70B → Llama 3.3 70B is much smarter; Yi 34B is the full-GPU pick.
Run this yourself
ollama pull yi:34b-chat-v1.5-q4_K_M
ollama run yi:34b-chat-v1.5-q4_K_M
Settings: Q4_K_M GGUF, 8192 ctx, full GPU on RTX 4090
Why this rating

7.4/10 — 01.AI's 34B that fits 24 GB at Q4. Solid, but mostly eclipsed by Qwen 3 32B and Mistral Small 3 24B. Loses points by sitting in an awkward middle without a clear differentiator.

Overview

01.AI's 34B model. Solid bilingual EN/ZH performance, Apache 2.0.

Strengths

  • Apache 2.0
  • Bilingual

Weaknesses

  • Outpaced by Qwen 2.5 32B

Quantization variants

Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.

QuantizationFile sizeVRAM required
Q4_K_M20.0 GB24 GB

Get the model

Ollama

One-line install

HuggingFace

Original weights

huggingface.co/01-ai/Yi-1.5-34B-Chat

Source repository — direct quantization required.

Hardware that runs this

Cards with enough VRAM for at least one quantization of Yi 1.5 34B.

Compare alternatives

Models worth comparing

Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.

Frequently asked

What's the minimum VRAM to run Yi 1.5 34B?

24GB of VRAM is enough to run Yi 1.5 34B at the Q4_K_M quantization (file size 20.0 GB). Higher-quality quantizations need more.

Can I use Yi 1.5 34B commercially?

Yes — Yi 1.5 34B ships under the Apache 2.0, which permits commercial use. Always read the license text before deployment.

What's the context length of Yi 1.5 34B?

Yi 1.5 34B supports a context window of 16,384 tokens (about 16K).

How do I install Yi 1.5 34B with Ollama?

Run `ollama pull yi:34b` to download, then `ollama run yi:34b` to start a chat session. The default quantization is Q4_K_M.

Source: huggingface.co/01-ai/Yi-1.5-34B-Chat

Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.