PIVOTSBench: Evaluating Fine-Grained Interpersonal Relationship Reasoning in Multimodal Large Language Models

2026-06-22 • Computation and Language

Computation and Language

AI summaryⓘ

The authors created PIVOTS, a new test to check how well AI models understand detailed human relationships using videos and conversations. This test is based on psychology research and looks at how these models predict how people relate to each other from both sides. They also included extra tasks to see if models can recognize important visual clues that help in understanding relationships. The authors tested various AI models and studied how different types of information, like visuals and conversation roles, affect their performance.

multimodal large language modelsinterpersonal relationshipsSocial-IQ 2.0YouTube datavisual cuespsychology researchbidirectional predictionconversational utterancesbenchmarkablation study

Authors

Shuxiang Zhang, Yiting Yin, Wenxuan Song, Yuhang Wu, Miao Liu

Abstract

Humans possess an innate ability to understand fine-grained interpersonal relationships, which is central to everyday social interactions. Although such reasoning is inherently multimodal, it remains largely unexplored by existing multimodal large language models (MLLMs). To address this gap, we introduce PIVOTS, the first benchmark built from Social-IQ 2.0 and YouTube data to evaluate MLLMs' ability to predict bidirectional interpersonal relationship dimensions grounded in established psychology research. In addition, PIVOTS includes auxiliary tasks that assess models' ability to identify and leverage the critical visual cues underlying such predictions. We evaluate both proprietary and open-source MLLMs and conduct detailed ablation studies to analyze the effects of visual modalities and explicit social role information in conversational utterances. We further examine how joint and pairwise prediction settings benefit MLLMs in scoring bidirectional PIVOTS dimensions. Project page and resources: https://flynnzhangsx.github.io/PIVOTSBench/ .

View PDFOpen arXiv