Neural Legendre-Fenchel transform with Hessian Preconditioning

2026-06-08Machine Learning

Machine Learning
AI summary

The authors studied a method to better approximate something called the Legendre-Fenchel transform, which is important in convex analysis and machine learning. They improved a neural network approach by using math tricks related to affine transformations and Hessians to simplify the problem near the function’s minimum. This preconditioning makes it easier for the neural network to learn the transform, especially for hard-to-handle functions, without adding much computational cost. Their experiments showed faster and more accurate results, though the authors also noted some limitations of their approach.

Legendre-Fenchel transformconvex analysisconvex conjugateaffine invarianceHessianpreconditioningresidual networksTaylor approximationeigendecompositionmachine learning
Authors
Basile Plus-Gourdon, Frank Nielsen
Abstract
The Legendre-Fenchel (LF) transform is a fundamental tool in convex analysis and machine learning that maps lower semi-continuous functions to their convex conjugates. In practice, when closed-form formula are not available for expressing convex conjugates of given functions, one must approximate them using various techniques. One recent such versatile numerical method is the deep Legendre transform method which relies on neural networks although it remains challenging particularly for tackling ill-conditioned functions. This work builds on the reformulation of the LF transform as a projective polarity. A notable property of this framework is its affine invariance. We leverage this affine invariance to introduce a Hessian-based preconditioning strategy. Specifically, we apply an affine deformation around a minimizer so that the second-order Taylor approximation of the function coincides with the canonical paraboloid, whose conjugation map is the identity. A residual network initialized near the identity can then learn this simplified mapping, while the original conjugation map is recovered through the inverse deformation. The proposed preconditioning incurs only a modest computational overhead, consisting of a single eigendecomposition during initialization and two matrix-vector multiplications per query. Experiments on a diverse set of convex functions, including high-dimensional benchmarks, demonstrate improved convergence rates and enhanced numerical accuracy of the conjugation, with particularly significant gains for ill-conditioned problems. Finally, we discuss the scope of applicability of our proposed method and highlight several of its limitations.