other
1.55B parameters
Commercial OK
Multimodal
Reviewed June 2026

Whisper Large v3

OpenAI's flagship open speech-to-text model. 99 languages, MIT license. The de-facto open ASR baseline.

License: MIT·Released Nov 6, 2023·Context: 0 tokens
BLK · VERDICT

Our verdict

OP · Fredoline Eruo|VERIFIED JUN 12, 2026
unrated

Positioning

Whisper Large v3 is OpenAI's flagship open speech-to-text model, released under the permissive MIT license. With 1.55 billion dense parameters and support for 99 languages, it has become the de-facto open ASR baseline for researchers and developers. Its architecture is a standard encoder-decoder transformer, making it straightforward to deploy and fine-tune.

Strengths

  • Permissive MIT license – Allows unrestricted commercial use, modification, and redistribution without licensing fees.
  • Broad language support – Covers 99 languages, making it suitable for multilingual transcription tasks out of the box.
  • Small model footprint – At 1.55B parameters, even FP16 is only ~3 GB on disk, and quantized versions (e.g., Q4_K_M at ~0.9 GB) fit easily on consumer hardware.
  • Established ecosystem – As the de-facto open ASR baseline, it benefits from extensive community tooling, fine-tuning recipes, and integration with popular frameworks.

Limitations

  • No native context window – Whisper processes fixed-length audio segments (~30 seconds) and lacks a token-level context window, limiting its use for streaming or long-form transcription without additional logic.
  • Encoder-decoder overhead – Unlike pure decoder models, the encoder-decoder architecture requires more memory and compute for inference, especially at longer audio lengths.
  • No real-time streaming support – Designed for batch transcription; real-time applications require custom chunking and overlap handling.
  • Benchmark data not provided – We do not have independent benchmark scores for this model. Operators should treat published vendor metrics as best-case and validate on their own data.

What it takes to run this locally

Whisper Large v3 is firmly in the consumer deployment class. At FP16 (3 GB on disk), it can run on any modern GPU with 4+ GB VRAM. Quantized versions reduce the footprint further: Q8_0 (2 GB), Q4_K_M (0.9 GB), or Q2_K (0.5 GB). Add ~30-50% for framework overhead and audio processing buffers. A single consumer GPU (e.g., RTX 3060 12GB or RTX 4090) is more than sufficient.

Should you run this locally?

Yes if: You need a permissively licensed, multilingual speech-to-text model that runs on consumer hardware and serves as a reliable baseline for transcription tasks.

No if: You require real-time streaming, very low latency, or native long-form transcription without custom segmentation logic. In those cases, consider specialized streaming ASR models or endpoint solutions.

Catalog cross-links

  • Whisper.cpp – Optimized C++ inference for Whisper models
  • OpenAI Whisper – Other Whisper variants in the catalog

Overview

OpenAI's flagship open speech-to-text model. 99 languages, MIT license. The de-facto open ASR baseline.

Family & lineage

How this model relates to others in its lineage. Family members share architecture and training-data roots; parent / children edges record direct distillation or fine-tune relationships.

Family siblings (whisper)
Whisper Large v3 Turbo0.81B
Edge
Whisper Large v31.55B
You are here
Distilled / fine-tuned from this

Strengths

  • MIT license
  • 99-language coverage
  • Industry baseline

Weaknesses

  • No streaming-native variant

Quantization variants

Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.

QuantizationFile sizeVRAM required
FP163.1 GB4 GB

Get the model

HuggingFace

Original weights

huggingface.co/openai/whisper-large-v3

Source repository — direct quantization required.

Hardware that runs this

Cards with enough VRAM for at least one quantization of Whisper Large v3.

Compare alternatives

Models worth comparing

Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.

Step down
Smaller — faster, runs on weaker hardware
No verdicted models in the next tier down yet.

Frequently asked

What's the minimum VRAM to run Whisper Large v3?

4GB of VRAM is enough to run Whisper Large v3 at the FP16 quantization (file size 3.1 GB). Higher-quality quantizations need more.

Can I use Whisper Large v3 commercially?

Yes — Whisper Large v3 ships under the MIT, which permits commercial use. Always read the license text before deployment.

What's the context length of Whisper Large v3?

Whisper Large v3 supports a context window of 0 tokens (about 0K).

Does Whisper Large v3 support images?

Yes — Whisper Large v3 is multimodal and accepts audio + text inputs. Vision support requires a runner that handles its image-conditioning architecture.

Source: huggingface.co/openai/whisper-large-v3

Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.

Related — keep moving

Before you buy

Verify Whisper Large v3 runs on your specific hardware before committing money.