When Gradients Collide: Failure Modes of Multi-Objective Prompt Optimization for LLM Judges
2026-05-25 • Computation and Language
Computation and LanguageArtificial IntelligenceMachine LearningMultiagent SystemsSoftware Engineering
AI summaryⓘ
The authors studied how to improve large language model (LLM) judgment prompts when optimizing for multiple goals at once. They found that combining feedback for different tasks can weaken the helpfulness of the gradients, meaning the model's improvement signals become less clear. Also, mixing multiple task instructions into one prompt can confuse the model and reduce performance. These two problems limit how well multi-objective optimizations can work with textual feedback prompts. The authors tested different ways of sharing information across tasks to better understand these challenges.
Large Language ModelsPrompt OptimizationMulti-objective OptimizationTextual GradientMulti-task LearningPCGradMGDASpearman's rhoGradient DilutionInstruction Interference
Authors
Parth Darshan, Abhishek Divekar
Abstract
Customizing an LLM judge to a specific task or domain often involves optimizing its prompt across multiple evaluation criteria simultaneously. Textual gradient methods automate this for a single judge criterion, however they produce natural-language critiques, not numerical vectors. Thus, the conflict-resolution toolkit of multi-task learning (PCGrad, MGDA) doesn't apply to the multi-objective textual gradient setting. We test five decomposition modes of textual gradient optimizers by varying how much cross-task information the loss, gradient and optimizer LLMs share. In 6 of 10 configurations, we observe that optimization never improves over the initial prompt. Gradient specificity drops by 59% (from 9.0 to 3.7) when the gradient LLM processes multiple criteria jointly. Separately, we observe that naively combining per-task instructions into a single prompt degrades Spearman's rho by -5.3%. These results identify two separable failure modes: optimization-time gradient dilution and inference-time instruction interference, which together constrain the design space for multi-objective judge customization using textual feedback.