Beyond Artifacts: Towards Generalizable Synthetic Song Detection via Music-Intrinsic Features

2026-06-15 • Sound

SoundMachine LearningMultimedia

AI summaryⓘ

The authors developed Sofia, a new method to detect songs made by AI by looking at internal features of the music instead of easy-to-fake surface clues. Sofia uses different expert models focused on vocals, audio effects, and overall structure, combining their strengths flexibly. They also created a tough test set called MUSIC8K with recent AI-generated songs and noisy audio to check performance. Their experiments show Sofia better recognizes AI-made songs across different generators and remains reliable even with audio distortions.

Synthetic Song DetectionAI Music GeneratorsFeature-specific ExpertsMixture-of-ExpertsVocal FeaturesAudio EffectsGlobal StructureBenchmark DatasetRobustnessF1 Score

Authors

Yan Han, Zhibin Wen, Yuan Wang, Shuangrun Shao, Xiaobing Li, Yang Xu, Wei Li

Abstract

The rapid advancement of AI music generators highlights the urgent need for reliable Synthetic Song Detection (SSD). Existing SSD methods often rely on low-level artifacts or fixed feature assumptions, struggling to capture generator-agnostic cues. To address this, we propose Sofia (Synthetic-song detection framework via music features), a flexible framework that models music-intrinsic attributes via feature-specific experts and an adaptive Mixture-of-Experts (MoE) module. By configuring Sofia with representative Vocal, Audio-effect, Global structure features, and their combinations, we present their individual and complementary contributions. To comprehensively evaluate our framework, we further construct MUSIC8K, a challenging benchmark featuring lastest emerging generators and realistic audio perturbations. Experiments show that Sofia learns generator-agnostic representations from music-intrinsic features, improving the F1 score by 18.5 points over the strongest baseline on MUSIC8K-O while maintaining strong robustness.

View PDFOpen arXiv