Low-Resource Guidance for Controllable Latent Audio Diffusion
2026-03-04 • Sound
SoundArtificial IntelligenceMachine Learning
AI summaryⓘ
The authors looked at how current audio generation methods can be slow and costly when you want to control specific features like pitch or beats. They created a new method called Latent-Control Heads (LatCHs) that works inside a simpler, smaller part of the model, making it faster and needing less training. Their approach lets the model control audio features effectively without losing sound quality and uses much less computing power. They tested it on a popular audio model and showed it works well for controlling different audio aspects.
generative audiodiffusion modelslatent spaceaudio synthesismodel guidancedecoder backpropagationpitch controlbeat controlStable Audiomodel training efficiency
Authors
Zachary Novack, Zack Zukowski, CJ Carr, Julian Parker, Zach Evans, Josiah Taylor, Taylor Berg-Kirkpatrick, Julian McAuley, Jordi Pons
Abstract
Generative audio requires fine-grained controllable outputs, yet most existing methods require model retraining on specific controls or inference-time controls (\textit{e.g.}, guidance) that can also be computationally demanding. By examining the bottlenecks of existing guidance-based controls, in particular their high cost-per-step due to decoder backpropagation, we introduce a guidance-based approach through selective TFG and Latent-Control Heads (LatCHs), which enables controlling latent audio diffusion models with low computational overhead. LatCHs operate directly in latent space, avoiding the expensive decoder step, and requiring minimal training resources (7M parameters and $\approx$ 4 hours of training). Experiments with Stable Audio Open demonstrate effective control over intensity, pitch, and beats (and a combination of those) while maintaining generation quality. Our method balances precision and audio fidelity with far lower computational costs than standard end-to-end guidance. Demo examples can be found at https://zacharynovack.github.io/latch/latch.html.