Initiation of Interaction Detection Framework using a Nonverbal Cue for Human-Robot Interaction
2026-05-11 • Computer Vision and Pattern Recognition
Computer Vision and Pattern Recognition
AI summaryⓘ
The authors created a system for robots to notice when a person wants to start interacting without relying on special keywords. Their robot uses both sound and video sensors to detect when someone is speaking and looking at it. If the person is quiet but looking at the robot for a while, the robot can still recognize this as a signal to interact. They tested and showed this system works on a mobile robot and built it using a common robot software framework called ROS.
Initiation of Interaction (IoI)Human-Robot Interaction (HRI)Audio-Visual Sensor FusionSound Source LocalizationHuman TrackingFace DetectionState Transition ModelRobot Operating System (ROS)Mobile RobotInteraction Detection
Authors
Guhnoo Yun, Juhan Yoo, Kijung Kim, Dong Hwan Kim
Abstract
This paper describes an initiation of interaction(IoI) detection framework without keywords for human-robot interaction(HRI) based on audio and vision sensor fusion in a domestic environment. In the proposed framework, the robot has its own audio and vision sensors, and can employ external vision sensor for stable human detection and tracking. When the user starts to speak while looking at the robot, the robot can localize his or her position by its sound source localization together with human tracking information. Then the robot can detect the IoI if it perceives the face of the speaker faces the robot. In case that the user does not speak directly, the robot can also detect the IoI if he or she looks at the robot for more than predefined periods of time. A state transition model for the proposed IoI detection framework is designed and verified by experiments with a mobile robot. In order to implement and associate our model in a robot architecture, all the components are implemented and integrated in the Robot Operating System(ROS) environment.