人工智能
计算机科学
计算机视觉
RGB颜色模型
三维单目标识别
判别式
特征提取
视觉对象识别的认知神经科学
模式识别(心理学)
卷积神经网络
作者
Ying Zhang,Maoliang Yin,Heyong Wang,Changchun Hua
标识
DOI:10.1109/tcsvt.2023.3275814
摘要
Object recognition, one of the main goals of robot vision, is a vital prerequisite for service robots to perform domestic tasks. Thanks to the rich sense of information provided by RGB-D sensors, RGB-D-based object recognition has received increasing attention. However, the existing works focus on collaborative RGB and depth data for object recognition, while ignoring the influence of depth image quality on recognition performance. Moreover, in real-world scenarios, there are many objects with strong similarity from certain observation angles, which poses a challenge for the service robot to recognize objects accurately. In this paper, we propose CNN-TransNet, a novel end-to-end Transformer-based architecture with convolutional neural networks (CNNs) for RGB-D object recognition. In order to deal with the effect of high inter-class similarity, discriminative multi-modal feature representations are generated by learning and relating multi-modal features at multiple levels. Besides, we employ a multi-modal fusion and projection (MMFP) module to reweight the contribution of each modality to address the problem of poor-quality depth image. Our proposed approach achieves state-of-the-art performance on three datasets (including Washington RGB-D Object Dataset, JHUIT-50, and Object Clutter Indoor Dataset), with accuracy of 95.4%, 98.1%, and 94.7%, respectively. The results demonstrate the effectiveness and superiority of the proposed model in RGB-D object recognition task.
科研通智能强力驱动
Strongly Powered by AbleSci AI