多模态
概括性
计算机科学
人工智能
代表(政治)
人机交互
机器人
图形
人机交互
编码器
特征(语言学)
对象(语法)
计算机视觉
理论计算机科学
心理学
语言学
哲学
政治
万维网
政治学
法学
心理治疗师
操作系统
作者
Jianhao Lv,Rong Zhang,Xinyu Li,Shimin Liu,Tianyuan Liu,Qi Zhang,Jinsong Bao
标识
DOI:10.1109/tii.2023.3303964
摘要
Human–robot collaborative assembly is required to comprehensively perceive the working scenarios for the most possible assembly collaborations. Nevertheless, existing works have paid much attention to physical entities (i.e., object detection, pose estimation), while ignores the weight of interactive relationships. This research gap makes it difficult to become aware of the cues for decision-making, especially in a complicated assembly task. Furthermore, inadequate relative position characteristics and indescribable object influence remain quite challenging for visual relationship representation. To overcome these abovementioned gaps, a multimodality scene graph generation approach is proposed to more robustly describe the abstract visual relationships. A novel heat modality is presented to better represent the relative spatial characteristic. Three strategies are developed for adapting different baselines in the multimodality feature encoder module. Experimental results show the generality and superb performance for multimodality scene graph generation tasks in human–robot collaborative assembly scenarios.
科研通智能强力驱动
Strongly Powered by AbleSci AI