Asymptotic enumeration of admixed arrays and a different independence heuristic

2026-04-10Information Theory

Information Theory
AI summary

The authors study special pairs of binary matrices called admixed arrays, which represent weighted connections in graphs and are useful in genetics. They explore two types of rules these arrays must follow, involving sums along rows and paired columns, and figure out how many such arrays exist under each rule alone and combined. By analyzing large-size cases, they link these counting problems to concepts from information theory and find a unique correction factor different from classical models. They also provide numerical evidence supporting their theory and release software to help count these matrices efficiently.

admixed arraysbinary matrixbipartite graphmarginal constraintsrow-sum constraintcolumn-sum constraintasymptotic enumerationentropy inequalitysaddle-point approximationinformation theory
Authors
Alan J. Aw
Abstract
We introduce a class of paired binary matrices called admixed arrays, which arise in analyses of large-scale genetic data and can be viewed as weighted edge colorings of complete bipartite graphs. This combinatorial structure gives rise to two natural families of marginal constraints: a row-sum constraint and a paired column-sum constraint, the latter inducing an inequality among entries of the matrix pair. We study the enumeration of admixed arrays under these constraints in dense regimes. First, we obtain exact formulas for the sizes of the families defined by each constraint in isolation and derive a finite-size criterion characterizing when one constraint is more restrictive than the other. In the large-dimension limit, this comparison simplifies to an entropy inequality, yielding an information-theoretic interpretation and a quantifiable error bound in the semi-regular case. We then analyze the asymptotic enumeration of the doubly constrained family in a semi-regular setting. Using saddle-point approximation and probabilistic techniques, we derive a detailed asymptotic expansion for the logarithm of the count, isolating an explicit fourth-moment contribution and establishing quantitative control of the higher-order remainder. A consequence of this analysis is a phenomenon absent from classical binary and integer matrix models: in the regime $N=Θ(P)$ with uniform margins and density bounded away from zero, the two constraint families obey the independence heuristic with a correction factor $1/\sqrt[4]{e}$ rather than the familiar $e^{\pm1/2}$. Numerical experiments corroborate the analytical approximations, and we implement and extend an algorithm of Miller and Harrison (2013) as open-source software to enumerate constrained admixed arrays.