The Algebra of Units: From Buckingham's Pi-grec Theorem to Latent-Variable Learning
2026-06-15 • Machine Learning
Machine Learning
AI summaryⓘ
The authors show a way to find important dimensionless numbers that describe physical systems automatically from data, without needing expert knowledge. They use a math technique called singular value decomposition (SVD) on log-transformed measurements to reveal simple patterns that correspond to these numbers. Their method successfully identified known engineering groups, like the Mach number, from thousands of compressor data points with very high accuracy. This work connects traditional dimensional analysis with modern data-driven methods, suggesting new ways to create easy-to-understand and efficient physical models.
Buckingham Pi theoremdimensionless numberssingular value decompositionlogarithmic transformationcompressor performancedimensional analysismanifoldengineering groupsdata-driven modelingscaling laws
Authors
Mauro Valorani
Abstract
Engineers often measure many quantities-speed, pressure, temperature, length-expressed in different physical units. The Buckingham Pi-grec theorem states that these variables can always be combined into a smaller set of dimensionless numbers whose values fully determine the system's behaviour. Identifying the appropriate dimensionless groups has traditionally required expert knowledge and physical insight. This paper shows that they can instead be discovered automatically from data, without prior knowledge of the governing physics. The key observation is that, after logarithmic transformation, measurements collected under different scalings of the same system lie on a low-dimensional manifold whose geometry is determined by the underlying dimensionless groups. Singular value decomposition (SVD) identifies this manifold directly from data. A subsequent search over integer-exponent combinations recovers candidate dimensionless quantities, while a repeating-variable filter retains only those constructed from the machine's characteristic scales. This procedure recovers familiar engineering groups, including the flow coefficient, head coefficient, and Mach number, while excluding equivalent but less interpretable alternatives. The method is demonstrated on a synthetic compressor dataset containing 16,000 measurements. Starting from raw dimensional variables and no physics input, it recovers the correct dimensionless groups to numerical precision and reproduces the compressor performance map with an error below 0.01%. More broadly, the work reveals a close connection between classical dimensional analysis and modern data-driven learning. Both rely on the same underlying algebraic structure, suggesting new approaches for building physical models that are simultaneously interpretable, scalable, and data-efficient.