Domain-incremental audio classification using domain-specific experts and prototype classifier

2026-06-22Machine Learning

Machine Learning
AI summary

The authors designed a system to recognize audio from different domains without needing access to past or future domain data all at once. They created small expert models that are trained step-by-step and then kept unchanged, combining their features at the end to identify sounds. To avoid forgetting old information, the authors used a method that generates fake data and a technique to fill in missing features from earlier stages. Their final system, which combines multiple expert models, performed better than any single model on the challenge development set.

domain-incremental learningaudio classificationfrozen featuresexpert modelscatastrophic forgettingDeepInversiongenerative replayprototype classifierfeature imputationensemble learning
Authors
Jongyeon Park, Do-Hyeon Lim, Sang-won Park, Hong Kook Kim, Kyungdeuk Ko, Hyeongcheol Geum, Jeong Eun Lim
Abstract
This technical report presents submission systems for Task 7(domain-incremental audio classification) of the DCASE 2026 Challenge. The main obstacle is that, the system is unable to access to past or future domain's data at once. We approached domain-incremental learning (DIL) as a frozen-feature replay problem. At each incremental stage, one or two compact experts are trained and then kept fixed; at the final stage, the penultimate features from all frozen experts are concatenated and used to train a lightweight per-class prototype classifier solely on cached features. This design prevents catastrophic forgetting by preserving each expert models at inference. To retain earlier-domain knowledge without storing raw audio, some experts were trained with DeepInversion-based generative replay. A cross-stage regression imputer was trained to fill the expert feature slots that did not yet exist at an ealier stage. We submit four fully DIL-compliant systems: three systems based on diverse frozen five-expert backbones and their cross-stack ensemble achieving 78.15% micro / 77.03% macro on the development set, outperforming every individual backbone on both evaluations.