Elastic Time: Dynamic Frame Rate Bottlenecks for Neural Audio Coding

2026-06-25Sound

Sound
AI summary

The authors present Elastic Time, a method that lets audio compression models adjust how often they check the sound over time. Instead of looking at every moment equally, their system can skip less important parts and focus on details only when needed, making the process faster and more efficient. They show that this approach improves the balance between audio quality and speed compared to older methods. This technique can help with tasks like generating audio or working with long sound sequences more effectively.

neural audio autoencodervariable bitratelatent frame-ratetemporal resolutionlatent predictorgreedy boundary selectionrate controlaudio compressionfeature extractiongeneration
Authors
Dimitrios Bralios, Paris Smaragdis, Minje Kim
Abstract
Neural audio autoencoders have become a core component of compression, feature extraction, and generation. However, while existing systems support variable bitrate, the vast majority of models still operate at a fixed latent frame-rate, allocating equal temporal budget to regions with very different information density, which can result in unnecessarily long sequences. We introduce Elastic Time, a dynamic frame-rate bottleneck that converts fixed-frame-rate autoencoders to dynamic ones. Our method learns a lightweight latent predictor used to decide which frames can be skipped and later reconstructed, enabling efficient greedy boundary selection at inference. Experiments show our method enables deployment-time rate control while improving efficiency-quality tradeoffs relative to baselines. Overall, we provide a flexible mechanism for adjusting temporal resolution in audio autoencoders, potentially facilitating more efficient downstream modeling for generation and long-context tasks.