Optimal Design for Multinomial Logit Model with Applications to Best Assortment Identification

2026-05-25Machine Learning

Machine Learning
AI summary

The authors study how to best design experiments for MNL bandits, where an agent picks groups of items and gets feedback on only one chosen item. Since these choices form a large combinatorial space, standard methods are too slow or complex. They propose two efficient methods: one using a special integer programming reformulation, and another using a faster, approximate surrogate approach. They show these methods balance accuracy and speed well, and apply them to find the best item sets with theoretical guarantees on the number of samples needed.

Multinomial Logit (MNL) ModelBanditsOptimal Experimental DesignCombinatorial Action SpaceMixed-Integer Linear Program (MILP)Kiefer-Wolfowitz Equivalence TheoremG-optimalityBest Arm IdentificationSample ComplexityLinear Utilities
Authors
Joongkyu Lee, Min-hwan Oh
Abstract
We study optimal experimental design for multinomial logit (MNL) bandits, where an agent repeatedly selects a subset of $K$ items from a ground set of size $N$ and observes single-choice feedback. Unlike linear or generalized linear bandits, MNL bandits have a combinatorial action space, which makes classical optimal design approaches and naive optimization over all subsets computationally intractable. We propose a computationally efficient optimal design framework for MNL models that achieves both statistical efficiency and scalability through two complementary approaches: (i) an exact or certified-approximate reformulation of the design oracle as a $0$-$1$ mixed-integer linear program (MILP) with solver-certified early stopping, and (ii) a fully polynomial-time lifted design that replaces the nonlinear objective with a tractable surrogate. Using the Kiefer-Wolfowitz equivalence theorem, we establish near G-optimality guarantees and characterize the induced statistical-computational trade-offs. As an application, we develop a best assortment identification algorithm for MNL bandits with linear utilities and non-uniform revenues, and prove an instance-dependent sample complexity of $\tilde{O}\big(\frac{d \log N}{Δ^2}\big)$, where $d$ is the feature dimension, $N$ is the number of arms, and $Δ$ is the minimum revenue gap.