Score-Agnostic Structure Analysis in Large-Scale Performance Datasets
2026-05-25 • Sound
Sound
AI summaryⓘ
The authors focus on improving how large collections of automatically transcribed piano performances can be analyzed. Since different performances of the same classical piece may vary structurally, they developed a method to group these transcriptions based on how they interpret the score, such as repeats or edition differences. They use sequence alignment and clustering techniques to compare and organize these performances without needing the original score or audio. This helps researchers compare performances more meaningfully by focusing on musical structure rather than exact accuracy. Their method was tested on about 1,500 transcriptions from a large piano performance dataset.
Automatic Music TranscriptionPiano PerformanceSequence AlignmentHierarchical ClusteringStructural InterpretationPerformance VariabilityScore-AgnosticMusical CoherenceData GroupingLarge-Scale Dataset
Authors
Patricia Hu, Silvan Peter, Gerhard Widmer
Abstract
In recent years, thanks to advances in automatic music transcription (AMT), several large-scale datasets of automatically transcribed piano solo music have been released. While these datasets undoubtedly offer extensive material for performance studies, they vary substantially in quality. In the case of classical music, performances often differ not only in expressive aspects such as tempo, but also in their structural interpretation of the score (including repeat patterns and edition-specific variants). To meaningfully use large-scale transcribed datasets for performance research, transcriptions of the same piece must be grouped according to their underlying structural realisation to support valid comparison. We address this by applying sequence-to-sequence alignment followed by hierarchical clustering: we create pairwise alignments for all pairs of transcriptions of a given piece, and use the alignment cost and (dis)similarity of performed sequence lengths to resolve structural mismatches as features for grouping. We propose this approach as a first step towards automatically evaluating large-scale transcribed datasets that lack ground-truth score and/or audio, shifting the evaluation criterion from truth-based accuracy to musical coherence and plausibility. We demonstrate our score-agnostic approach on around 1,500 transcriptions of 88 compositions from a recently published large-scale transcribed piano performance dataset.