A Closed-Form Persistence-Landmark Pipeline for Certified Point-Cloud and Graph Classification

2026-05-04 • Machine Learning

Machine Learning

AI summaryⓘ

The authors present PLACE, a method for classifying point clouds and graphs by analyzing their shapes using persistent homology features. Their approach provides three mathematical guarantees based on training data alone, without relying on learned weights or extra calibration data. PLACE uses a special embedding technique over selected landmarks, and closed-form formulas guide how well the classification separates classes and selects descriptors. Experimentally, PLACE performs very well on several benchmark datasets, with limitations linked to certain descriptor sets or data coverage. The method also produces a confidence certificate for predictions, though it is not fully operational at current training sizes.

persistent homologypoint cloudsgraphsclassificationembeddingdescriptor selectionmargin boundcovariance shrinkageMahalanobis marginconfidence certificate

Authors

Sushovan Majhi, Atish Mitra, Žiga Virk, Pramita Bagchi

Abstract

We introduce PLACE (Persistence-Landmark Analytic Classification Engine), a closed-form pipeline for classifying point clouds and graphs through their persistent-homology signatures. Three quantitative guarantees -- a margin-based excess-risk rate, a closed-form descriptor-selection rule, and a per-prediction certificate -- are derived from training labels alone, with no learned weights or held-out calibration. The embedding sums Mitra-Virk single-point coordinate functions over a sparse landmark grid; closed-form weights maximize a structural distortion constant $λ(ν)$ (a Lipschitz lower bound on $\mathcal{D}_n$ under non-interference). (i) An $O(kR/(Δ\sqrt{m_{\min}}))$ margin bound, driven by class-mean separation $Δ$ and embedding radius $R$, matched by a sample-starved minimax lower bound. (ii) The Mahalanobis margin under Ledoit-Wolf-shrunk covariance is the strongest closed-form descriptor selector on a heterogeneous 64-descriptor chemical-graph pool (mean Spearman $ρ\approx +0.54$ across 10 benchmarks, positive on 9 of 10); the isotropic surrogate $Δ/\sqrt\ell$ admits a closed-form selection-consistency rate on homogeneous (14-15 descriptor) protein/social pools. (iii) A training-time-decided certificate with no per-prediction overhead, in non-asymptotic Pinelis and asymptotic Gaussian plug-in forms. Empirically, PLACE is the strongest diagram-based method on Orbit5k and matches the strongest topology-based baseline within statistical noise on MUTAG and COX2. The remaining gaps fall into two diagnosable regimes: descriptor blindness on NCI1/NCI109, and pool-coverage limits elsewhere. Both radii exceed the firing threshold $\hatΔ/2$ on every benchmark at our training-set sizes, dominated by the $\sqrt\ell$ scaling of the multivariate-norm bound; the per-prediction certificate is constructive but not yet operational at these sizes.

View PDFOpen arXiv