Semi-Supervised Hyperbolic Hierarchical Clustering with Set-Level Structural Priors

2026-06-01Machine Learning

Machine Learning
AI summary

The authors address a problem in hierarchical clustering where the given supervision only relates to individual data points and doesn't help organize groups of points into meaningful branches. They introduce a method that groups points into sets, which act like soft guidelines for how larger branches should form in the hierarchy. Their approach uses hyperbolic geometry and learns similarities that respect these sets, improving the overall tree structure. Tests on multiple datasets show their method better matches true label groupings and creates higher-quality trees than existing methods.

semi-supervised learninghierarchical clusteringpairwise constraintshyperbolic embeddingstructural priorsset-level supervisiontree optimizationconstraint-consistent similaritysubtree coherence
Authors
Junjing Zheng, Xinyu Zhang, Xiangfeng Qiu, Chengliang Song, Weidong Jiang
Abstract
Semi-supervised hierarchical clustering aims to learn a tree structure consistent with data patterns and user-provided supervision. Supervision is usually given as leaf-level relations, such as pairwise must-link/cannot-link constraints or triplet-wise must-link-before constraints. Although useful for regulating local sample relations, such supervision does not directly indicate which samples should form coherent subtrees. Consequently, the non-leaf structure of the learned tree may deviate from the hierarchical organization preferred by ground-truth labels. To address this limitation, we propose a semi-supervised hyperbolic hierarchical clustering method with set-level structural priors. The main contribution is to introduce sets as basic modeling units for hierarchy learning. Each set denotes samples expected to cohere within a subtree and is induced from leaf-level supervision together with a learned constraint-consistent similarity structure. These sets act as soft structural priors for subtree-level supervision, allowing supervision to guide non-leaf hierarchy formation beyond local leaf-level relations. Specifically, we first learn constraint-consistent embeddings to obtain a reliable set partition, then construct constraint-induced sets and estimate inter-set similarities to form set-level structural priors. Finally, these priors are incorporated into a hyperbolic hierarchy objective for continuous tree optimization. Experiments on eleven benchmark datasets and ablation studies show that the proposed method consistently improves label consistency over representative hierarchical clustering baselines while also enhancing similarity-based tree quality.