Thresholded Local Hyper-Flow Diffusion

2026-06-08 • Machine Learning

Machine Learning

AI summaryⓘ

The authors introduce Thresholded Local Hyper-Flow Diffusion (TL-HFD), a new method for clustering data in hypergraphs that keeps computations focused around initial seed points, rather than spreading everywhere. Their approach updates the cluster progressively by expanding only important boundary areas, ensuring efficiency without losing accuracy. They prove mathematically that these localized updates are just as exact as global ones and provide guarantees on how well their method performs, even if stopped early. Tests show TL-HFD works as well or better than previous methods while examining less data, especially helping to avoid including unrelated points when the data is noisy.

Hyper-Flow DiffusionSeeded ClusteringSubmodular HypergraphsProjected Subgradient MethodCheeger-type GuaranteeLocal UpdatesSweep-cut GuaranteeDual SuboptimalityThresholdingOracle Sensitivity

Authors

Meher Chaitanya, Sebastian Dalleiger, Luana Ruiz

Abstract

Local Hyper-Flow Diffusion (HFD) gives an edge-size-independent Cheeger-type guarantee for seeded clustering in general submodular hypergraphs, but existing HFD solvers do not keep intermediate computation local at every iteration. We introduce Thresholded Local HFD (TL-HFD), a first-order method that maintains an active region around the seeds, performs projected subgradient updates on that region and its immediate boundary, and expands via thresholded (top-k) boundary activation. We prove that the local update is exact: the degree-preconditioned projected subgradient step restricted to the active region and its boundary coincides with the unrestricted global update. We establish finite-time dual suboptimality for both exact and thresholded updates, treating the latter as inexact projected subgradient steps with explicit skipped-boundary error. We further derive an additive activated-volume bound controlled by realized local subgradient norms and the minimum boundary-push among newly activated vertices, and translate approximate dual optimality with localized support into a robust sweep-cut guarantee for early-stopped iterates. For general submodular cut-costs, each iteration is local in the scanned region and oracle-sensitive in the hyperedge primitive. Empirically, TL-HFD often matches or improves over HFD while activating less volume, with the largest gains on noisy instances where diffusion tends to absorb non-target vertices.

View PDFOpen arXiv