计算机科学
频道(广播)
过程(计算)
知识库
表达式(计算机科学)
人工智能
融合机制
机器人
传输(电信)
情态动词
计算机视觉
机器学习
人机交互
融合
计算机网络
语言学
哲学
电信
化学
脂质双层融合
高分子化学
程序设计语言
操作系统
作者
Ya Hou,Zhiquan Feng,Tao Xu
标识
DOI:10.1145/3379247.3379255
摘要
To solve the problem of multimodal information fusion, this paper proposes a method based on the filling of scene main components, and evaluates the channel information according to the knowledge base. The modal channel of this paper chooses vision and hearing, which is more suitable for the information transmission in the actual communication. Firstly, the single mode information is identified by the neural network. After processing, the image and audio expression are transformed into text expression, and the component value describing the scene is filled in according to text analysis. According to the knowledge base, the evaluation model is established to calculate the confidence of each channel when the information conflicts. After getting the scene, query the content of the prior knowledge base again and send the corresponding action instructions to the robot. The experimental results show that this paper can correct the modal fusion results under the guidance of prior knowledge base, and achieve effective human-computer cooperation in specific scenes. It reduces the dependence on single channel information in the interaction process, increases fault tolerance mechanism, and improves user experience evaluation.
科研通智能强力驱动
Strongly Powered by AbleSci AI