other

0.336B parameters

Restricted

Reviewed May 2026

F5-TTS

Flow-matching non-autoregressive TTS built on a Diffusion Transformer (DiT) backbone with ConvNeXt text refinement. Trained on the 100K-hour Emilia dataset; supports zero-shot voice cloning with strong naturalness and low RTF (~0.15 on a single GPU).

License: cc-by-nc-4.0·Context: 0 tokens

BLK · VERDICT

Our verdict

OP · Fredoline Eruo|VERIFIED MAY 29, 2026

unrated

Architecturally interesting and high-quality, but the non-commercial license rules it out for most products. Track for the inevitable commercial-friendly successor.

Overview

Flow-matching non-autoregressive TTS built on a Diffusion Transformer (DiT) backbone with ConvNeXt text refinement. Trained on the 100K-hour Emilia dataset; supports zero-shot voice cloning with strong naturalness and low RTF (~0.15 on a single GPU).

Strengths

Flow-matching architecture — fast and stable, no autoregressive drift
Zero-shot voice cloning competitive with XTTS-v2
Strong English + Mandarin out of the box; community fine-tunes for more languages
RTF ~0.15 on consumer GPU; faster than diffusion-based competitors

Weaknesses

CC-BY-NC-4.0 — research only, no commercial use without separate licensing
Mandarin and English only in the base checkpoint; other languages need fine-tunes
Diffusion-style sampling means GPU is effectively mandatory
Less mature tooling ecosystem than Coqui/Piper

Quantization variants

Each quantization trades model quality for file size and VRAM. Q4_K_M is the most popular starting point.

Quantization	File size	VRAM required
Q4_K_M	0.2 GB	1 GB

Get the model

HuggingFace

Original weights

huggingface.co/SWivid/F5-TTS

Source repository — direct quantization required.

Hardware that runs this

Cards with enough VRAM for at least one quantization of F5-TTS.

NVIDIA GB200 NVL72

13824GB · nvidia

AMD Instinct MI350X

NVIDIA B300 (Blackwell Ultra)

288GB · nvidia

AMD Instinct MI355X

AMD Instinct MI325X

AMD Instinct MI300X

192GB · nvidia

NVIDIA H100 NVL

188GB · nvidia

Compare alternatives

Models worth comparing

Same parameter band, plus what's one tier above and below — so you can decide what actually fits your hardware.

Same tier

Models in the same parameter band as this one

Step up

More capable — bigger memory footprint

Step down

Smaller — faster, runs on weaker hardware

No verdicted models in the next tier down yet.

Frequently asked

What's the minimum VRAM to run F5-TTS?

1GB of VRAM is enough to run F5-TTS at the Q4_K_M quantization (file size 0.2 GB). Higher-quality quantizations need more.

Can I use F5-TTS commercially?

F5-TTS is released under the cc-by-nc-4.0, which has restrictions for commercial use. Review the license terms before using it in a product.

What's the context length of F5-TTS?

F5-TTS supports a context window of 0 tokens (about 0K).

Source: huggingface.co/SWivid/F5-TTS

Reviewed by RunLocalAI Editorial. See our editorial policy for how we research and verify model claims.

Related — keep moving

Compare hardware

Buyer guides

When it doesn't work

Recommended hardware

Before you buy

Verify F5-TTS runs on your specific hardware before committing money.

Will it run on my hardware? →Custom hardware comparison →GPU recommender (4 questions) →