MORL-A2C: Multi-Objective Reinforcement Learning Reranker for Optimizing Healthiness in MOPI-HFRS

2026-06-22 • Machine Learning

Machine Learning

AI summaryⓘ

The authors worked on improving food recommendation systems that usually focus on what people like without enough attention to health. They extended their previous system by making recommendations step-by-step, considering both health and user preference over time. Their new method, MORL-A2C, uses advanced algorithms to balance these factors better, showing much healthier food suggestions while only slightly lowering how well the system matches user tastes. They also fixed an error in their earlier evaluations to ensure fair comparisons. Overall, their approach shows that thinking about choices in sequences can help recommend healthier foods without losing too much on preferences.

Food recommendation systemMulti-objective optimizationHealth-aware recommendationSequential decision-makingPareto optimizationAdvantage Actor-Critic (A2C)Graph Neural Network (GNN) embeddingsBehavior cloningNutritional benchmarksRanking metrics

Authors

Aarya Vasantlal, Joshua Zolla

Abstract

Unhealthy dietary behavior continues to be a persistent public health issue in the United States, exacerbated by recommendation systems that prioritize user preference without considering nutritional health. The Multi-Objective Personalized Interpretable Health-aware Food Recommendation System (MOPI-HFRS), from which this work extends, addresses this by jointly optimizing preference, health, and diversity through Pareto-based optimization. However, this approach relies on static, per-step tradeoff solutions that fail to capture the sequential nature of dietary decision-making. We introduce MORL-A2C, a sequential decision-making extension to MOPI-HFRS targeting the health-preference axis. Leveraging frozen GNN embeddings, MORL-A2C formulates recommendation as a K-step reranking problem using an Advantage Actor-Critic algorithm with a scalarized relevance/health reward. The policy is warm-started via behavior cloning against a dot-product ranker derived from frozen embeddings. We also identify and correct a non-trivial bug in the MOPI-HFRS evaluation pipeline that understated baseline performance; all results are reported against the corrected baseline. On the macro-nutrient benchmark, MORL-A2C achieves a modest reduction in ranking quality (Recall@20: 25.64% to 23.61%, NDCG@20: 23.52% to 20.64%) in exchange for a substantial improvement in health alignment (H-Score@20: 46.05% to 69.57%), with consistent trends on the full-nutrient benchmark. These findings validate that policy-driven sequential optimization can effectively navigate the health-preference trade-off in multi-objective food recommendation.

View PDFOpen arXiv