Generating music from text prompts or melody references. MusicGen, Stable Audio, Suno-clone open-weight models.
pip install audiocraft (Meta's MusicGen — the leading open-weight music generation model).from audiocraft.models import MusicGen will download the model (~1.5 GB for small, ~3.5 GB for medium, ~7 GB for large).from audiocraft.models import MusicGen
import soundfile as sf
model = MusicGen.get_pretrained("facebook/musicgen-medium")
model.set_generation_params(duration=30) # 30-second track
wav = model.generate(["Upbeat electronic dance music with a driving bassline, 120 BPM, synth melody, energetic drop"])
sf.write("output.wav", wav[0].cpu().numpy().T, model.sample_rate)
pip install stable-audio-tools) — better for ambient/soundscape, worse for structured music.MusicGen Small (1.5 GB) runs on CPU at 2-5× real-time — a 30-second track in 1-2 minutes. Any $300 laptop handles this. MusicGen Medium (3.5 GB) on a used GTX 1060 6 GB ($60) generates 30 seconds in 10-15 seconds — near real-time. For a full music generation rig: GTX 1060 6 GB ($60) + refurbished PC ($150) + 16 GB RAM ($30). Total: ~$240. Music generation is one of the most accessible creative AI tasks — even CPU-only laptops produce usable results. The bottleneck is your musical taste and prompt engineering, not hardware.
Used RTX 3060 12 GB ($200-250, see /hardware/rtx-3060-12gb) is more than sufficient for production music generation. MusicGen Large (7 GB) generates 30 seconds in 5-10 seconds — faster than real-time. Can batch-generate 100 tracks/hour for music library building. For melody-conditioned generation (hum a tune → full production), an RTX 3060 handles it in near-real-time. Total build: ~$700-900. Music generation is VRAM-light and fast — even entry-level GPUs handle the largest open-weight models. Spend your budget on studio monitors and a MIDI keyboard, not GPU.
The mistake: Generating a 30-second track with MusicGen, thinking "this sounds great," generating 10 more with different prompts, then trying to arrange them into a song — discovering every track has wildly different key, tempo, and mix. Why it fails: MusicGen generates independent clips. Clip 1 might be in C minor at 120 BPM with heavy reverb. Clip 2 is in E major at 140 BPM with dry production. Nothing matches. The fix: Treat MusicGen as an idea generator, not a song producer. Export stems. Import into a DAW (Ableton, FL Studio, Reaper). Use MusicGen to generate individual elements (bassline in Cm at 120 BPM, drum loop, synth pad) with consistent prompts specifying key and tempo. Arrange, mix, and master in the DAW. AI generates raw material; you produce the track. Professional AI-assisted music production is AI + DAW, never AI alone.
Browse all tools for runtimes that fit this workload.
Audio models are surprisingly forgiving on hardware. Whisper, Coqui, OpenAI Whisper-cpp all run well on 8-12 GB GPUs. The bottleneck is rarely the GPU; it's audio preprocessing and disk I/O for batch transcription.
The errors most operators hit when running music generation locally. Each links to a diagnose+fix walkthrough.
Verify your specific hardware can handle music generation before committing money.
Voice cloning, TTS, and audio generation models trade VRAM for output quality — most operators undersize here.