HAIM: Human-AI Music Datasets for AI Music Production Tracking Benchmark
2026-06-01 • Sound
SoundArtificial Intelligence
AI summaryⓘ
The authors explain that AI tools are now used at many points in making music, not just to create songs from scratch but also to help improve or finish music made by humans. Current methods only try to tell if music is made by AI or a human, which doesn’t work well because music often involves both. To address this, the authors created a new dataset called HAIM to track where and how AI is used in the music-making process. They found that existing detectors struggle with this more detailed task and suggest their benchmark will help improve AI music detection.
Generative AIMusic Production WorkflowVocal SynthesisMasteringAI DetectionHybrid Music ProductionDatasetAI IntegrationAudio Forensics
Authors
Seonghyeon Go, Yumin Kim
Abstract
As generative platforms such as Suno and Udio reach human-grade audio quality, the scope of AI's utility has expanded across the entire music production workflow. Beyond simple track generation, these advancements have catalyzed the adoption of AI-driven methodologies in diverse forms. These include vocal synthesis, arrangement, and professional mastering. However, current detection research remains largely confined to a binary `AI-or-human' paradigm. It fails to reflect the realities of contemporary music production workflows. In real-world production, AI tools are increasingly used to refine or master human-produced tracks, and human engineers likewise post-process AI-generated material to ensure professional quality. Moreover, users often employ adversarial tactics to bypass AI detectors, such as applying human mastering to AI-generated tracks. This creates a grey area that a simple binary classification fails to capture. In this paper, we define and investigate ``AI Music Tracking'': the challenge of identifying specific AI integration across the multifaceted spectrum of music production. To this end, we introduce HAIM, a dataset with diverse labels for stages of music production. It is designed to isolate stages of AI intervention, including hybrid production and agent-level tracking. Our evaluation of state-of-the-art detectors reveals systemic flaws. By releasing HAIM, we propose a new benchmark that shifts the field beyond binary classification toward a granular, structured evaluation of AI music.