Everywhere Learning: Artificial Intelligence with Pointwise Constraints

2026-06-01Machine Learning

Machine Learning
AI summary

The authors study a new way to train AI systems called everywhere learning, which ensures AI meets certain performance limits all the time, instead of just doing well on average. They developed a theory that connects the solutions found using real data to ideal solutions considering the entire data distribution. Their findings highlight how certain variables give more importance to hard-to-meet requirements and that the gap in performance is linked to differences in how data points and challenging cases are distributed. They also show that adding a penalty to relax constraints can help control this gap. Finally, they demonstrate their approach on a language model task involving decision-making.

everywhere learningloss constraintsgeneralizationduality theoryempirical problemstatistical problemL1 penaltyconstraint relaxationagentic classificationlanguage models
Authors
Ignacio Boero, Ignacio Hounie, Luiz Chamon, Alejandro Ribeiro
Abstract
Everywhere learning is a new paradigm whereby Artificial Intelligence (AI) systems are trained to satisfy loss constraints with probability one over the data distribution. This is in contrast to the standard paradigm of training AI systems to minimize average losses. We develop an approximate duality theory to substantiate a generalization analysis that establishes the proximity between solutions of empirical and statistical everywhere learning problems. Our results show that dual variables reweigh the data distribution towards points in which loss constraints are more difficult to satisfy and that generalization is controlled by the mismatch between the concentration of mass of the data distribution and the concentration of mass on points where constraints are more difficult to satisfy. We further show that we can control generalization with a sparse L1 penalty on constraint relaxations. We illustrate the merits of everywhere learning with an experiment in agentic classification for language model tasks.