APEX: Audio Prototype EXplanations for Classification Tasks

2026-05-11 • Sound

SoundMachine Learning

AI summaryⓘ

The authors created APEX, a tool that helps explain how audio classifiers make decisions without changing the original model. Unlike previous methods that used image-based techniques on sound data, APEX respects the unique features of audio, like timing and frequency. It breaks down explanations into four types, showing different ways sounds can be recognized, making the results easier to understand. This approach offers clearer explanations compared to typical techniques that rely on gradients.

Explainable AI (XAI)audio classificationspectrogramprototype reasoningpost-hoc explanationoutput invariancetransient eventstime-frequency analysis

Authors

Piotr Kawa, Kornel Howil, Piotr Borycki, Miłosz Adamczyk, Przemysław Spurek, Piotr Syga

Abstract

Explainable AI (XAI) has achieved remarkable success in image classification, yet the audio domain lacks equally mature solutions. Current methods apply vision-based attribution techniques to spectrograms, overlooking fundamental differences between visual and acoustic signals. While prototype reasoning is promising, acoustic similarity remains multidimensional. We introduce APEX (Audio Prototype EXplanations), a post-hoc framework for interpreting pre-trained audio classifiers. Crucially, APEX requires no fine-tuning of the original backbone and strictly preserves output invariance. APEX disentangles explanations into four perspectives: Square-based prototypes to localize transient events, Time-based for temporal patterns, Frequency-based highlighting spectral bands, and Time-Frequency-based integrating both. This yields intuitive, example-based explanations that respect acoustic properties, providing greater semantic clarity than standard gradient-based methods.

View PDFOpen arXiv