Multimodal Evaluator Preference Collapse: Cross-Modal Contagion in Self-Evolving Agents

2026-06-15Machine Learning

Machine LearningComputation and Language
AI summary

The authors studied how AI agents judging their own work can develop biased preferences, especially when handling both text and images. They found that one evaluation approach dominated unfairly in combined tasks, much more than when only text was involved. They discovered a new problem called cross-modal contagion, where preferences learned in one type of task (like text) negatively affect how strategies are chosen for another type (like images). Their experiments showed that using different evaluators and setups can influence how much this bias spreads, with self-evaluation being the least affected. The authors also created tools to measure and analyze these biases to better understand and mitigate them.

AI agentslanguage modelsEvaluator Preference Collapse (EPC)multimodal evaluationcross-modal contagionstrategy selectionself-evaluationGPT-4oDeepSeek-chatcontagion matrix
Authors
Zewen Liu
Abstract
When AI agents use language models to evaluate their own outputs in a feedback loop, systematic biases emerge. We show that Evaluator Preference Collapse (EPC) is dramatically amplified in multimodal settings. Using GPT-4o to evaluate DeepSeek-chat across text and visual tasks, we find that a single strategy (step_by_step) absorbs 48.4% of all weight -- 3.2x the collapse observed in text-only self-evaluation -- while three visual-domain strategies receive only 9.1% combined weight. We then demonstrate a novel phenomenon we term cross-modal contagion: evaluator preferences acquired on one modality transfer to and corrupt strategy selection on another. Through a four-phase isolation training paradigm, we measure contagion coefficients and document strategy inversion -- the optimal strategy for a modality reverses after cross-modal exposure. A Phase 3 statistical validation across four evaluator configurations (N=53 total independent repetitions, 15,592 API calls) reveals a clear hierarchy: cross-model evaluation (GPT-4o, N=8) produces strong but symmetric bidirectional contagion (mean gamma_{T->V}=1.176, gamma_{V->T}=1.089, Delta=-0.088, p=0.575, Cohen's d=0.29); high round counts (DashScope, 50 rounds) cause collapse to single-strategy dominance (70% zero contagion); and self-evaluation provides near-complete immunity -- 97% of runs (N=30, DeepSeek-chat) yield exactly zero contagion (mean gamma=0.033, 95% CI [-0.031, 0.010], p=0.642, d=0.07). No evaluator condition shows statistically significant directional asymmetry. We introduce the contagion matrix indexed by evaluator identity, release the MM-EPC experimental framework, and identify cross-model evaluator architecture as the primary risk factor for preference contagion.