Functional Gradient Descent with Adaptive Representations

2026-06-15 • Machine Learning

Machine Learning

AI summaryⓘ

The authors look at a special way to solve optimization problems by working directly with functions instead of just their fixed representations, which is usually tricky because functions have infinite details. They create a new method that adjusts how these functions are represented during the solving process, making it easier to manage and analyze. Their approach guarantees the method will find good solutions even with approximations. They tested this on different tasks like regression, solving equations, and computer vision, and showed it works better than previous methods and standard neural networks.

Functional Gradient DescentNonconvex optimizationInfinite-dimensional spacePolyak-Lojasiewicz conditionConvergence analysisNeural networksNumerical PDEsRegressionFunction spaceApproximation error

Authors

Daniel Csillag, Rodrigo Schuller, Pedro Dall'Antonia, Leonidas Guibas, Luiz Velho, Tiago Novello

Abstract

Functional optimization problems are typically solved by optimizing the parameters of a fixed representation, such as a neural network, resulting in highly nonconvex losses that complicate both training and theoretical analysis. An interesting alternative is functional gradient descent (FGD), that is, gradient descent directly in function space, which benefits from strong convergence results and admits a clean theory. However, FGD is difficult to implement in practice because functional gradients are infinite-dimensional, and thus cannot be fully computed nor stored in memory. Existing implementations therefore rely on fixed approximations, which introduce approximation error. We propose a new, theoretically-grounded FGD algorithm that adapts the representation of the functional gradients over the course of optimization. By explicitly incorporating this approximation into the analysis, we establish convergence to a stationary point (for smooth losses) and to a global minimizer (under smoothness + a Polyak-Lojasiewicz-type condition) regardless of our approximations. To the best of our knowledge, this is the first implementable FGD method with such guarantees in a general setting. We demonstrate the effectiveness of our method on regression, numerical solution of PDEs, and modern computer vision. Across settings, our method consistently outperforms both FGD with fixed approximations and neural network baselines in efficiency and accuracy.

View PDFOpen arXiv