Whisper Large v3
OpenAI's flagship open speech-to-text model. 99 languages, MIT license. The de-facto open ASR baseline.
Positioning
Whisper Large v3 is OpenAI's flagship open speech-to-text model, released under the permissive MIT license. With 1.55 billion dense parameters and support for 99 languages, it has become the de-facto open ASR baseline for researchers and developers. Its architecture is a standard encoder-decoder transformer, making it straightforward to deploy and fine-tune.
Strengths
- Permissive MIT license – Allows unrestricted commercial use, modification, and redistribution without licensing fees.
- Broad language support – Covers 99 languages, making it suitable for multilingual transcription tasks out of the box.
- Small model footprint – At 1.55B parameters, even FP16 is only ~3 GB on disk, and quantized versions (e.g., Q4_K_M at ~0.9 GB) fit easily on consumer hardware.
- Established ecosystem – As the de-facto open ASR baseline, it benefits from extensive community tooling, fine-tuning recipes, and integration with popular frameworks.
Limitations
- No native context window – Whisper processes fixed-length audio segments (~30 seconds) and lacks a token-level context window, limiting its use for streaming or long-form transcription without additional logic.
- Encoder-decoder overhead – Unlike pure decoder models, the encoder-decoder architecture requires more memory and compute for inference, especially at longer audio lengths.
- No real-time streaming support – Designed for batch transcription; real-time applications require custom chunking and overlap handling.
- Benchmark data not provided – We do not have independent benchmark scores for this model. Operators should treat published vendor metrics as best-case and validate on their own data.
What it takes to run this locally
Whisper Large v3 is firmly in the consumer deployment class. At FP16 (3 GB on disk), it can run on any modern GPU with 4+ GB VRAM. Quantized versions reduce the footprint further: Q8_0 (2 GB), Q4_K_M (0.9 GB), or Q2_K (0.5 GB). Add ~30-50% for framework overhead and audio processing buffers. A single consumer GPU (e.g., RTX 3060 12GB or RTX 4090) is more than sufficient.
Should you run this locally?
Yes if: You need a permissively licensed, multilingual speech-to-text model that runs on consumer hardware and serves as a reliable baseline for transcription tasks.
No if: You require real-time streaming, very low latency, or native long-form transcription without custom segmentation logic. In those cases, consider specialized streaming ASR models or endpoint solutions.
Catalog cross-links
- Whisper.cpp – Optimized C++ inference for Whisper models
- OpenAI Whisper – Other Whisper variants in the catalog
Overview
OpenAI's flagship open speech-to-text model. 99 languages, MIT license. The de-facto open ASR baseline.
Family & lineage
How this model relates to others in its lineage. Family members share architecture and training-data roots; parent / children edges record direct distillation or fine-tune relationships.
Strengths
- MIT license
- 99-language coverage
- Industry baseline
Weaknesses
- No streaming-native variant
Quantization variants
Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.
| Quantization | File size | VRAM required |
|---|---|---|
| FP16 | 3.1 GB | 4 GB |
Get the model
HuggingFace
Original weights
Source repository — direct quantization required.
Hardware that runs this
Cards with enough VRAM for at least one quantization of Whisper Large v3.
Models worth comparing
Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.
Frequently asked
What's the minimum VRAM to run Whisper Large v3?
Can I use Whisper Large v3 commercially?
What's the context length of Whisper Large v3?
Does Whisper Large v3 support images?
Source: huggingface.co/openai/whisper-large-v3
Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.
Related — keep moving
Verify Whisper Large v3 runs on your specific hardware before committing money.