dolphin
24B parameters
Commercial OK

Dolphin 3.0 Mistral 24B

Eric Hartford's Dolphin fine-tune of Mistral Small 3 — uncensored, function-calling, agent-friendly.

License: Apache 2.0·Released Jan 30, 2025·Context: 32,768 tokens
Our verdict
By Fredoline Eruo·Last verified May 6, 2026
7.5/10
Positioning

The uncensored counterpart to Mistral Small 3 24B. Same hardware footprint, same Apache base license, refusal layer minimized. Right pick for research, red-team, or technical-writing workflows where Mistral's base alignment gets in the way.

Strengths
  • Apache base license carries through — uncensored and license-clean.
  • 24B body is meaningfully more capable than Hermes 3's 8B base.
  • Strong technical/research output — handles dual-use prompts that base models refuse.
Limitations
  • Niche use case — most users want base Mistral Small 3 24B.
  • Creative writing slightly worse than base — alignment training adds polish.
  • Refusals minimized but not zero — output still requires user judgment.
Real-world performance on RTX 4090
  • Q4_K_M (14.6 GB): 75–92 tok/s decode — full GPU
  • Q5_K_M (17.3 GB): 62–78 tok/s
  • Q8_0 (26 GB): partial offload only
Should you run this locally?

Yes, for technical research and security workflows where base Mistral refusals block legitimate work. No, for general chat — base Mistral Small 3 24B is the right default.

How it compares
  • vs Mistral Small 3 24B (base) → Dolphin minus alignment layer. Pick base for general use, Dolphin for research/technical.
  • vs Hermes 3 Llama 3.1 8B → Dolphin is bigger and Apache-licensed; Hermes is smaller and on Llama base.
  • vs Hermes 3 Llama 3.1 70B → 70B Hermes is smarter; Dolphin is more accessible (full GPU on 24 GB) and Apache-licensed.
Run this yourself
ollama pull dolphin3:24b-mistral-q4_K_M
ollama run dolphin3:24b-mistral-q4_K_M
Settings: Q4_K_M GGUF, 16384 ctx, full GPU on RTX 4090
Why this rating

7.5/10 — Eric Hartford's uncensored/aligned-removed fine-tune of Mistral Small 3 24B. Same caveats as Hermes 3 but on a stronger 24B base. Loses points to the base Mistral Small 3 24B for general use.

Overview

Eric Hartford's Dolphin fine-tune of Mistral Small 3 — uncensored, function-calling, agent-friendly.

Strengths

  • Uncensored
  • Function calling
  • Apache 2.0

Weaknesses

  • Uncensored = use carefully in production

Quantization variants

Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.

QuantizationFile sizeVRAM required
Q4_K_M14.0 GB18 GB

Get the model

Ollama

One-line install

ollama run dolphin-mistral:24bRead our Ollama review →

HuggingFace

Original weights

huggingface.co/cognitivecomputations/Dolphin3.0-Mistral-24B

Source repository — direct quantization required.

Hardware that runs this

Cards with enough VRAM for at least one quantization of Dolphin 3.0 Mistral 24B.

Compare alternatives

Models worth comparing

Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.

Frequently asked

What's the minimum VRAM to run Dolphin 3.0 Mistral 24B?

18GB of VRAM is enough to run Dolphin 3.0 Mistral 24B at the Q4_K_M quantization (file size 14.0 GB). Higher-quality quantizations need more.

Can I use Dolphin 3.0 Mistral 24B commercially?

Yes — Dolphin 3.0 Mistral 24B ships under the Apache 2.0, which permits commercial use. Always read the license text before deployment.

What's the context length of Dolphin 3.0 Mistral 24B?

Dolphin 3.0 Mistral 24B supports a context window of 32,768 tokens (about 33K).

How do I install Dolphin 3.0 Mistral 24B with Ollama?

Run `ollama pull dolphin-mistral:24b` to download, then `ollama run dolphin-mistral:24b` to start a chat session. The default quantization is Q4_K_M.

Source: huggingface.co/cognitivecomputations/Dolphin3.0-Mistral-24B

Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.