Yi 1.5 34B

Positioning

A capable 34B-class generalist with good multilingual support, especially Chinese. Right pick for users who specifically want 01.AI's training distribution, or as a known-quantity baseline.

Strengths

34B fits 24 GB at Q4_K_M — full GPU on a 4090.
Apache 2.0 license — clean commercial terms.
Strong Chinese-English — better than Qwen 2.5 32B on Chinese-specific tasks.

Limitations

Beaten by Qwen 3 32B on most general benchmarks.
Long-context recall weaker than spec.
Knowledge cutoff dated — 2024-era data.

Real-world performance on RTX 4090

Q4_K_M (20.7 GB): 65–80 tok/s decode — full GPU
Q5_K_M (24.4 GB): partial offload, 22–30 tok/s
Q8_0 (37 GB): workstation territory

Should you run this locally?

Yes, for Chinese-English-specific work, or as an Apache-licensed alternative to Qwen. No, for general English work — Qwen 3 32B is stronger at the same VRAM.

How it compares

vs Qwen 3 32B → Qwen wins on general capability; Yi has cleaner license.
vs Mistral Small 3 24B → Mistral wins on instruction polish; Yi has slight edge on Chinese.
vs Llama 3.3 70B → Llama 3.3 70B is much smarter; Yi 34B is the full-GPU pick.

Run this yourself

ollama pull yi:34b-chat-v1.5-q4_K_M
ollama run yi:34b-chat-v1.5-q4_K_M

Settings: Q4_K_M GGUF, 8192 ctx, full GPU on RTX 4090

Quantization	File size	VRAM required
Q4_K_M	20.0 GB	24 GB

Quantization

File size

VRAM required

Q4_K_M

20.0 GB

24 GB

Frequently asked

What's the minimum VRAM to run Yi 1.5 34B?

24GB of VRAM is enough to run Yi 1.5 34B at the Q4_K_M quantization (file size 20.0 GB). Higher-quality quantizations need more.

Can I use Yi 1.5 34B commercially?

Yes — Yi 1.5 34B ships under the Apache 2.0, which permits commercial use. Always read the license text before deployment.

What's the context length of Yi 1.5 34B?

Yi 1.5 34B supports a context window of 16,384 tokens (about 16K).

How do I install Yi 1.5 34B with Ollama?

Run `ollama pull yi:34b` to download, then `ollama run yi:34b` to start a chat session. The default quantization is Q4_K_M.

Overview

Strengths

Weaknesses

Quantization variants

Get the model

Ollama

HuggingFace

Hardware that runs this

Models worth comparing

Frequently asked

What's the minimum VRAM to run Yi 1.5 34B?

Can I use Yi 1.5 34B commercially?

What's the context length of Yi 1.5 34B?

How do I install Yi 1.5 34B with Ollama?