SIGMA: Saliency-Guided Sparse Mask Attacks for Speech Emotion Recognition

2026-06-29Sound

Sound
AI summary

The authors present SIGMA, a new method to attack Speech Emotion Recognition (SER) systems by subtly changing parts of the speech data that models find important. They use explainable AI techniques to identify these key parts and then apply small changes only there, making the attack both efficient and easier to understand. Their approach balances how strong the attack is with how well the changes align with explainability, and it works across different models and attack types. Tests on standard emotion datasets show SIGMA performs well compared to existing methods.

Speech Emotion Recognitionadversarial attacksparsityexplainable AIsaliency mapsself-supervised learningmagnitude-bound perturbationIEMOCAP datasetTESS dataset
Authors
Qiyang Sun, Yi Chang, Zixing Zhang, Björn W. Schuller
Abstract
Speech conveys rich emotional information. As Speech Emotion Recognition (SER) is usually deployed in privacy-sensitive and reliability-critical environments, adversarial attacks on SER have attracted increasing attention. Existing sparse attacks control the number of perturbed elements, yet, they often lack explainability guidance and explicit measures of explanation consistency. A unified treatment of sparsity and magnitude constraints is also uncommon. In addition, transferability across attack families and target models remains limited. Hence, we propose a SalIency-Guided sparse Mask Attack (SIGMA). On self-supervised speech features, we use post-hoc explainable artificial intelligence (XAI) techniques to produce saliency maps and identify the scope of the mask, and then restrict magnitude-bounded updates to this mask. The mask is computed once and can be reused across models and different sparsity attacks to amortise cost. We evaluate on the IEMOCAP and TESS datasets. Under matched budgets and across multiple sparse-attack settings, SIGMA maintains competitive attack success rates, navigating a conscious trade-off between attack efficacy and explanation consistency. SIGMA therefore provides an efficient and interpretable framework for analysing the vulnerability and explanation behaviour of SER models under structured perturbations.