人工智能
立体视
比例(比率)
机器人
计算机科学
特征(语言学)
计算机视觉
地图学
地理
语言学
哲学
标识
DOI:10.1109/tim.2023.3315355
摘要
As the global population ages and the labor force shrinks, using Artificial Intelligence (AI) technology to promote labor productivity growth has become a hot topic. The emergence of Empty-Dish Recycling Robots has effectively alleviated the impact of the decline in labor productivity. This paper proposes a Multi-scale Stereoscopic Attention (MSA) network YOLO-MSA to detect postprandial dishes for Empty-Dish Recycling Robots. First, the standard convolution is replaced with a Res2Net module, which improves the multi-scale expressiveness of the network at a finer-grained level. Second, we adopt a Res2Net with different dilation rates and a novel stereoscopic attention mechanism to propose an MSA module, which is used for coarse-grained multi-scale expression. Thirdly, for multi-scale feature learning in the dimensionality reduction process, a Dimension Reduction Spatial Pyramid Pooling (DRSPP) is proposed to fuse feature maps of different scales. Extensive experiments demonstrate the effectiveness of the proposed MSA module for multi-scale feature learning. Furthermore, YOLO-MSA has achieved 98.47% mean Average Precision ( mAP ) on Dish-21, a dataset of the postprandial dishes, which is much higher than other state-of-the-art models, and has achieved an inference speed of 33.93 frames per second ( FPS ), which meets the needs of real-time detection of the postprandial dish for the Empty-Dish Recycling Robot. Test results on other public datasets show that the proposed YOLO-MSA has a better generalization ability. In summary, YOLO-MSA exhibits satisfactory multi-scale feature expression ability, demonstrates effectiveness and robustness in postprandial dish detection, and has far-reaching significance for the development of Empty-Dish Recycling Robots.
科研通智能强力驱动
Strongly Powered by AbleSci AI