MAdam: Metric-Aware Multi-Objective Adam

2026-06-02Machine Learning

Machine LearningComputer Vision and Pattern Recognition
AI summary

The authors look at how multi-objective optimization (MOO) methods, which try to balance several goals at once, usually rely on a popular optimizer called Adam. They found that using Adam directly causes two problems: it mixes up the importance of each goal over time and changes the geometric properties the MOO methods expect. To fix this, they created MAdam, a wrapper that adjusts the way directions are given to Adam without changing the solver or Adam itself. Their experiments show that MAdam improves performance in various tasks involving multiple objectives.

multi-objective optimizationAdam optimizergradient balancingPareto trade-offscurvaturescalarized objectivepreconditioningmulti-task learningadaptive metricsPareto front
Authors
Fengbei Liu, Rachit Saluja, Sunwoo Kwak, Ruibo Wang, Ruining Deng, Heejong Kim, Johannes C. Paetzold, Mert R. Sabuncu
Abstract
Multi-objective optimization (MOO) underlies many machine learning problems, yet MOO solvers across the loss-balancing, gradient-balancing, and Pareto-based families almost universally hand their reconciled directions to Adam~\cite{kingma2015adam}. We show this coupling introduces two systematic gaps between the solver's intent and the optimizer's execution. The first is a \emph{weighting mismatch}: Adam's second-moment denominator entangles the time-varying preference vector with gradient statistics, marginalizing the preference into a history average and collapsing distinct Pareto trade-offs toward a near-uniform mixture. The second is a \emph{geometric mismatch}: Adam's adaptive metric distorts the Euclidean geometry MOO solvers assume, turning aligned objectives into apparent conflicts. To resolve both jointly, we introduce \textbf{MAdam} (Metric-Aware Multi-Objective Adam), a drop-in wrapper that leaves both solver and optimizer unchanged. MAdam preconditions the reconciled direction by the preference-conditioned curvature of the scalarized objective; on this whitened input, Adam's second moment collapses to identity, so the realized update is governed by the preference-conditioned metric. Across multi-task learning, Pareto-front recovery, physics-informed neural networks, and medical imaging, MAdam consistently improves over Adam for every solver family.