The Robot's Inner Critic: Self-Refinement of Social Behaviors through VLM-based Replanning
2026-03-20 • Robotics
RoboticsArtificial Intelligence
AI summaryⓘ
The authors created a system called CRISP that helps robots improve their social behaviors on their own by acting like a critic that can see and judge what the robot is doing. CRISP uses a special model that understands both images and language to check if the robot's movements look natural and appropriate in different situations, then fixes any mistakes it finds. This system works with many types of robots by only using their structure files, without needing extra programming for each one. In tests, robots using CRISP were liked more and fit better into their social situations than those using older methods. The authors show that this approach helps robots act more independently and flexibly in social settings.
robot social behaviorVision-Language Model (VLM)autonomous planningjoint control codesocial appropriatenessreward-based searchrobot structure filecross-platform roboticsmobile manipulatorshumanoid robots
Authors
Jiyu Lim, Youngwoo Yoon, Kwanghyun Park
Abstract
Conventional robot social behavior generation has been limited in flexibility and autonomy, relying on predefined motions or human feedback. This study proposes CRISP (Critique-and-Replan for Interactive Social Presence), an autonomous framework where a robot critiques and replans its own actions by leveraging a Vision-Language Model (VLM) as a `human-like social critic.' CRISP integrates (1) extraction of movable joints and constraints by analyzing the robot's description file (e.g., MJCF), (2) generation of step-by-step behavior plans based on situational context, (3) generation of low-level joint control code by referencing visual information (joint range-of-motion visualizations), (4) VLM-based evaluation of social appropriateness and naturalness, including pinpointing erroneous steps, and (5) iterative refinement of behaviors through reward-based search. This approach is not tied to a specific robot API; it can generate subtly different, human-like motions on various platforms using only the robot's structure file. In a user study involving five different robot types and 20 scenarios, including mobile manipulators and humanoids, our proposed method achieved significantly higher preference and situational appropriateness ratings compared to previous methods. This research presents a general framework that minimizes human intervention while expanding the robot's autonomous interaction capabilities and cross-platform applicability. Detailed result videos and supplementary information regarding this work are available at: https://limjiyu99.github.io/inner-critic/