An open source text-to-audio model developed by Stability AI for generating up to 47 seconds of audio samples.Use Stable Audio Open online for free.
Stable Audio Open GeneratorComprises three components: an autoencoder that compresses waveforms, a T5-based text embedding for conditioning, and a transformer-based diffusion model operating in the autoencoder's latent space.
Based on a transformer architecture and trained using a latent diffusion model approach.
Trained on 486,492 audio recordings from FreeSound (472,618) and Free Music Archive (13,874), all licensed under CC0, CC BY, or CC Sampling+.
Rigorous process to ensure no copyrighted music was included in the training data.
Designed to be used with the open source stable-audio-tools library for inference and fine-tuning.
Users can fine-tune the model on their own custom audio data, e.g., a drummer fine-tuning on their own drum recordings.
Available under Stability AI's non-commercial research community agreement license.
Model weights accessible on Hugging Face after agreeing to the license.