Local Preferential Bayesian Optimization

2026-06-01 • Machine Learning

Machine Learning

AI summaryⓘ

The authors developed new methods to improve Bayesian optimization when using human preferences instead of explicit scores. Their approach focuses on looking closely around promising areas rather than searching everywhere at once, which helps in problems with many variables. They use special techniques that include understanding the shape of the problem to make smarter decisions. Tests show their methods work better in tough and high-dimensional problems, making them useful for tasks like tuning policies based on human feedback.

Bayesian optimizationPreferential Bayesian optimizationPairwise feedbackHigh-dimensional optimizationTrust-region methodsGaussian processesLaplace approximationDerivative-informed searchCumulative regretPolicy search

Authors

Johanna Menn, Miriam Kober, Paul Brunzema, David Stenger, Sebastian Trimpe

Abstract

Bayesian optimization (BO) is a popular and effective approach for tuning expensive, noisy experiments, but requires the formulation of an explicit objective function. Preferential BO (PBO) removes this requirement by learning from pairwise human feedback, yet existing methods struggle to efficiently optimize beyond low- and medium-dimensional problems due to their global search approaches. We address this limitation by developing a family of local PBO methods that transfer key ideas from high-dimensional BO to the preferential setting. In particular, we introduce local PBO methods which adapt trust-region and derivative-informed local search to pairwise preference feedback, where the latter exploits first- and second-order derivatives of the Laplace-approximated GP posterior. Our benchmark on GP sample paths, standard optimization benchmark functions, and policy-search tasks shows that local PBO methods are especially effective in high-dimensional and complex landscapes with steep optima. Compared with global preference-based baselines, they can substantially reduce cumulative regret, making them particularly useful for real-world preference-based optimization tasks such as policy search.

View PDFOpen arXiv