Estimating Staged Event Tree Models via Hierarchical Clustering on the Simplex
2026-03-16 • Machine Learning
Machine Learning
AI summaryⓘ
The authors study a way to build staged tree models, which improve Bayesian networks by adding detailed context-specific rules. They introduce a new method that uses hierarchical clustering with different ways to measure distance between probabilities. Through simulations, they find that using Total Variation distance with Ward.D2 linkage gives the best balance of model quality and speed. Compared to a common alternative, their approach works just as well but runs faster, making it better for big or quick projects.
Staged tree modelsBayesian networksHierarchical clusteringProbability simplexTotal Variation distanceWard.D2 linkageBackward Hill ClimbingBayesian Information CriterionHamming distance
Authors
Muhammad Shoaib, Eva Riccomagno, Manuele Leonelli, Gherardo Varando
Abstract
Staged tree models enhance Bayesian networks by incorporating context-specific dependencies through a stage-based structure. In this study, we present a new framework for estimating staged trees using hierarchical clustering on the probability simplex, utilizing simplex basesd divergences. We conduct a thorough evaluation of several distance and divergence metrics including Total Variation, Hellinger, Fisher, and Kaniadakis; alongside various linkage methods such as Ward.D2, average, complete, and McQuitty. We conducted the simulation experiments that reveals Total Variation, especially when combined with Ward.D2 linkage, consistently produces staged trees with better model fit, structure recovery, and computational efficiency. We assess performance by utilizing relative Bayesian Information Criterion (BIC), and Hamming distance. Our findings indicate that although Backward Hill Climbing (BHC) delivers competitive outcomes, it incurs a significantly higher computational cost. On the other, Total Variation divergence with Ward.D2 linkage, achieves similar performance while providing significantly better computational efficiency, making it a more viable option for large-scale or time sensitive tasks.