Curvature-Guided Sheaf Diffusion for Unsupervised Community Detection on Heterophilic Graphs
2026-06-29 • Machine Learning
Machine LearningArtificial Intelligence
AI summaryⓘ
The authors address the problem of finding groups in graphs where connected nodes often belong to different classes, which is difficult for existing unsupervised methods. They propose a new algorithm called Curvature-Guided Sheaf Diffusion (CGSD) that uses a measure called Forman-Ricci curvature on edges to guide the process. CGSD includes a special encoder and a spectral clustering step that both use curvature information without needing labels. Their tests on five datasets show CGSD performs well, especially due to the curvature-aware clustering step, and provides interpretable results based on curvature differences within and between communities.
heterophilic graphscommunity detectionForman-Ricci curvaturesheaf diffusionspectral clusteringmodularityunsupervised learninggraph embeddingk-NN affinity
Authors
Feifan Wang
Abstract
Detecting communities in heterophilic graphs -- where connected nodes often belong to different classes -- is hard for unsupervised methods: classical modularity and spectral methods are feature agnostic, while deep graph-clustering methods rely on contrastive or generative machinery that is opaque. We propose Curvature-Guided Sheaf Diffusion (CGSD), a fully unsupervised community-detection algorithm that uses the discrete Forman--Ricci curvature of each edge as its single topological signal, propagated through every stage of an end-to-end pipeline. CGSD makes three concrete contributions: (i)~a curvature-gated sheaf-diffusion encoder that gates edge messages by $σ(κ_e)$ and is trained from three label-free structural losses (modularity, anti-collapse, curvature-weighted reconstruction); (ii)~a curvature-aware spectral clusterer (CSpec) that re-weights the $k$-NN affinity of the embedding by $σ(ακ_{e^*})$ before Ng--Jordan--Weiss; and (iii)~a unified label-free evaluation against nine truly-unsupervised baselines. On five heterophilic benchmarks (Cora, Cornell, Texas, Wisconsin, Chameleon), CGSD wins outright on Wisconsin and Chameleon and is competitive on the remaining three against nine unsupervised baselines. The gain over the strongest baseline is driven by the clusterer, not the encoder: on the same embedding, CSpec improves mean NMI from $0.091$ with $K$-Means to $0.107$ ($+15\%$, paired $t$-test $p=0.008$). The mechanism is interpretable: intra-community and inter-community curvature distributions are visibly separated. Code is open-sourced at https://github.com/woodywff/cgsd.