Three of the models — Stable Audio 3.0 Small SFX, Stable Audio 3.0 Small, and Stable Audio 3.0 Medium — are being released as open weights through Hugging Face. The company’s largest model, Stable Audio 3.0 Large, remains proprietary and is only available through Stability AI’s API, enterprise licensing, or partner infrastructure provider fal.ai.
Stable Audio 3.0 Small SFX and Stable Audio 3.0 Small each contain 459 million parameters and can generate audio clips up to two minutes long. The SFX version is optimized for sound effects and lightweight hardware like smartphones and consumer laptops, while the standard Small model focuses on short music generation.
The larger Stable Audio 3.0 Medium model runs with 1.4 billion parameters and can create tracks extending beyond six minutes. Stability AI said the system can generate longer outputs while maintaining faster inference speeds through a redesigned architecture built around a semantic-acoustic autoencoder.
The company also highlighted new editing capabilities across the model family, including inpainting tools that allow users to modify individual sections of audio tracks, update multiple segments simultaneously, or extend existing compositions beyond their original length.
Stability AI is additionally releasing LoRA training documentation for the Small and Medium models, allowing developers to fine-tune systems using their own music or sound libraries. Enterprise customers will also have access to guided fine-tuning support.
Under the company’s Stability AI Community License, generated audio can be used commercially, with organizations earning more than $1 million annually required to obtain enterprise licensing.
The licensing structure is becoming increasingly important across the AI music industry as legal scrutiny intensifies around training data and copyright protections. Stability AI repeatedly emphasized that Stable Audio 3.0 was trained on licensed material and pointed to partnerships with Universal Music Group and Warner Music Group.
The company’s positioning contrasts with ongoing lawsuits involving AI music startups Suno and Udio, both of which face allegations tied to copyrighted recordings and music generation outputs. Stability AI also referenced a recent German court ruling involving OpenAI and copyrighted song lyrics as part of the broader legal backdrop surrounding generative AI training data.
Stable Audio 3.0 also marks a broader shift inside Stability AI itself. After helping popularize open image generation through Stable Diffusion, the company has increasingly focused on audio systems following financial challenges and leadership changes tied to former CEO Emad Mostaque.
The new release builds on several earlier audio products from the company, including Stable Audio Open Small, a lightweight text-to-audio model designed for smartphones, and Stable Audio 2.5, which focused on longer-form professional music production workflows. Stability AI said Stable Audio 3.0 now serves as the foundation for its next generation of licensed audio models.
This analysis is based on reporting from The Decoder.
Image courtesy of Stability AI.
This article was generated with AI assistance and reviewed for accuracy and quality.