A Geometric Lens on Physics-Aligned Data Compression

2026-06-02Machine Learning

Machine Learning
AI summary

The authors study how training AI models to compress scientific data while preserving important physical properties affects the quality of the reconstructed data. They find that focusing on preserving certain physical features can worsen overall image or data quality because the directions where errors get minimized don't always line up. The authors explain this tradeoff using a geometric theory and introduce a way to measure how well these important directions align. Their experiments across different scientific fields confirm that their alignment measure predicts the balance between preserving physics and maintaining data quality.

AI compressionphysics-informed lossrate-distortionlatent spaceentropy modeldistortion metricanisotropic erroreigenspace overlaptangent spaceobservable preservation
Authors
Aleix Segui, Wesley Armour
Abstract
In AI for Science, physics-informed losses are increasingly used to train learned compressors for scientific data, but their rate-distortion implications remain poorly understood. At fixed bitrate, these objectives often improve preservation of a target physical observable while degrading standard reconstruction fidelity. We develop a local geometric theory showing that this tradeoff is governed by the interaction of latent-space sensitivities induced by the entropy model, the physical observable, and the distortion metric. At each operating point, these induce preferred directions along which compression noise should be suppressed, yielding an anisotropic error-allocation mechanism. When these directions are misaligned, improving the observable at fixed rate necessarily worsens standard distortion, establishing a fundamental limit on simultaneous preservation. We formalise this through a local tangent-space rate-distortion law and introduce a practical alignment diagnostic based on dominant eigenspace overlap. Experiments across scientific domains test the theory and validate that the alignment diagnostic correlates with observed data- and physics-space trade-offs.