A Cold Diffusion Approach for Percussive Dereverberation
2026-05-11 • Sound
SoundArtificial Intelligence
AI summaryⓘ
The authors focus on improving audio dereverberation specifically for drum and percussion sounds, which is harder than for speech because these sounds have quick, sharp bursts. They treat reverberation as a process that gradually changes clean drum sounds into echoey ones and work on reversing this process using a special diffusion method. By testing different prediction types and model designs, they show their approach works better than previous methods on various drum recordings with both fake and real room echoes. Their work offers a new way to clean up drum sounds in music production.
audio dereverberationpercussive audioreverberationcold diffusionUNetdiffusion Transformerroom impulse responsesharp transientsstereo drum stems
Authors
Dimos Makris, András Barják, Maximos Kaliakatsos-Papakostas
Abstract
Most recent advances in audio dereverberation focus almost exclusively on speech, leaving percussive and drum signals largely unexplored despite their importance in music production. Percussive dereverberation poses distinct challenges due to sharp transients and dense temporal structure. In this work, we propose a cold diffusion framework for dereverberating stereo drum stems (downmixes), modeling reverberation as a deterministic degradation process that progressively transforms anechoic signals into reverberant ones. We investigate two reverse-process parameterizations, Direct (next-state) and a Delta-normalized residual (velocity-style) prediction, and implement the framework using both a UNet and a diffusion Transformer backbone. The models are trained and evaluated on curated datasets comprising both acoustic and electronic drum recordings, with reverberation generated using a combination of synthetic and real room impulse responses. Extensive experiments on in-domain and fully out-of-domain test sets demonstrate that the proposed method consistently outperforms strong score-based and conditional diffusion baselines, evaluated using signal-based and perceptual metrics tailored to percussive audio.