Dolphin 3.0 Mistral 24B

Positioning

The uncensored counterpart to Mistral Small 3 24B. Same hardware footprint, same Apache base license, refusal layer minimized. Right pick for research, red-team, or technical-writing workflows where Mistral's base alignment gets in the way.

Strengths

Apache base license carries through — uncensored and license-clean.
24B body is meaningfully more capable than Hermes 3's 8B base.
Strong technical/research output — handles dual-use prompts that base models refuse.

Limitations

Niche use case — most users want base Mistral Small 3 24B.
Creative writing slightly worse than base — alignment training adds polish.
Refusals minimized but not zero — output still requires user judgment.

Real-world performance on RTX 4090

Q4_K_M (14.6 GB): 75–92 tok/s decode — full GPU
Q5_K_M (17.3 GB): 62–78 tok/s
Q8_0 (26 GB): partial offload only

Should you run this locally?

Yes, for technical research and security workflows where base Mistral refusals block legitimate work. No, for general chat — base Mistral Small 3 24B is the right default.

How it compares

vs Mistral Small 3 24B (base) → Dolphin minus alignment layer. Pick base for general use, Dolphin for research/technical.
vs Hermes 3 Llama 3.1 8B → Dolphin is bigger and Apache-licensed; Hermes is smaller and on Llama base.
vs Hermes 3 Llama 3.1 70B → 70B Hermes is smarter; Dolphin is more accessible (full GPU on 24 GB) and Apache-licensed.

Run this yourself

ollama pull dolphin3:24b-mistral-q4_K_M
ollama run dolphin3:24b-mistral-q4_K_M

Settings: Q4_K_M GGUF, 16384 ctx, full GPU on RTX 4090

Quantization	File size	VRAM required
Q4_K_M	14.0 GB	18 GB

Quantization

File size

VRAM required

Q4_K_M

14.0 GB

18 GB

Frequently asked

What's the minimum VRAM to run Dolphin 3.0 Mistral 24B?

18GB of VRAM is enough to run Dolphin 3.0 Mistral 24B at the Q4_K_M quantization (file size 14.0 GB). Higher-quality quantizations need more.

Can I use Dolphin 3.0 Mistral 24B commercially?

Yes — Dolphin 3.0 Mistral 24B ships under the Apache 2.0, which permits commercial use. Always read the license text before deployment.

What's the context length of Dolphin 3.0 Mistral 24B?

Dolphin 3.0 Mistral 24B supports a context window of 32,768 tokens (about 33K).

How do I install Dolphin 3.0 Mistral 24B with Ollama?

Run `ollama pull dolphin-mistral:24b` to download, then `ollama run dolphin-mistral:24b` to start a chat session. The default quantization is Q4_K_M.

Overview

Strengths

Weaknesses

Quantization variants

Get the model

Ollama

HuggingFace

Hardware that runs this

Models worth comparing

Frequently asked

What's the minimum VRAM to run Dolphin 3.0 Mistral 24B?

Can I use Dolphin 3.0 Mistral 24B commercially?

What's the context length of Dolphin 3.0 Mistral 24B?

How do I install Dolphin 3.0 Mistral 24B with Ollama?