Encoding Phylogenetic Networks with Least Common Ancestor Constraints

2026-06-15Discrete Mathematics

Discrete Mathematics
AI summary

The authors study how to represent certain types of evolutionary diagrams (phylogenetic networks) using information about the least common ancestors (LCAs) of leaf pairs. They show that by focusing on LCAs of one or two leaves and removing certain parts of the graph (2-regularization), they can capture the essential structure of these networks. They prove that for many types of phylogenetic networks, this LCA information uniquely identifies the network. Additionally, they introduce a simpler, more limited type of LCA comparison that still allows efficient reconstruction of these networks in many cases.

phylogenetic networkleast common ancestor (LCA)directed acyclic graph (DAG)2-regularizationreconstructionshortcut edgeslevel-1 networksclustering systemsweak hierarchiesnormal networks
Authors
Marc Hellmuth, Anna Lindeberg, Vincent Moulton
Abstract
Encoding phylogenetic networks by suitable substructures is a central problem in phylogenetic combinatorics. We study encodings based on least common ancestor (LCA) constraints. For a directed acyclic graph (DAG) $G$ with leaf set $X$, we consider the relation on pairs of leaves in which $(ab,xy)$ records that the LCAs of $a,b$ and $x,y$ are well-defined and that the former is a descendant of the latter. We first identify precisely which part of $G$ is determined by this relation. To this end, we compare the canonical DAG constructed from the LCA relation with the 2-regularization of $G$, obtained by removing all vertices that are not LCAs of one or two leaves and then deleting shortcut edges. We prove that these two DAGs are isomorphic. Hence the obstruction to encoding a graph by its LCA relation is exactly the information lost under 2-regularization. This yields a general reconstruction principle, which we apply to several natural classes of phylogenetic networks. In particular, we show that shortcut-free 2-LCA-relevant DAGs, phylogenetic trees, regular level-1 networks, regular networks with binary clustering systems, regular networks whose clustering systems are closed weak hierarchies, strong-phylogenetic normal networks, separated phylogenetic normal networks, and binary normal networks are encoded by their LCA relations. We also introduce a sparse triple-like restriction consisting only of comparisons of the form $(ab,ac)$, where $a,b,c\in X$ are pairwise distinct. For graphs with the 2-LCA property, we show that this sparse relation, together with the leaf set, determines the full LCA relation after a natural closure operation. Consequently, several of the above classes can be reconstructed, up to isomorphism, from the sparse relation in polynomial time.