Region Pixel Voting Network (RPVNet) for 6D Pose Estimation from Monocular Image

人工智能计算机科学计算机视觉像素兰萨克姿势条件随机场对象（语法）模式识别（心理学）投票卷积神经网络图像（数学）政治政治学法学

作者

Feng Xiong,Chengju Liu,Qijun Chen

出处

期刊：Applied sciences [MDPI AG]
日期：2021-01-14 卷期号：11 (2): 743-743 被引量：3

链接

mdpi.com mdpi.com mdpi.com mdpi.comdoi.org

标识

DOI：10.3390/app11020743

摘要

Recent studies have shown that deep learning achieves superior results in the task of estimating 6D-pose of target object from an image. End-to-end techniques use deep networks to predict pose directly from image, avoiding the limitations of handcraft features, but rely on training dataset to deal with occlusion. Two-stage algorithms alleviate this problem by finding keypoints in the image and then solving the Perspective-n-Point (PnP) problem to avoid directly fitting the transformation from image space to 6D-pose space. This paper proposes a novel two-stage method using only local features for pixel voting, called Region Pixel Voting Network (RPVNet). Front-end network detects target object and predicts its direction maps, from which the keypoints are recovered by pixel voting using Random Sample Consensus (RANSAC). The backbone, object detection network and mask prediction network of RPVNet are designed based on Mask R-CNN. Direction map is a vector field with the direction of each point pointing to its source keypoint. It is shown that predicting an object’s keypoints is related to its own pixels and independent of other pixels, which means the influence of occlusion decreases in the object’s region. Based on this phenomenon, in RPVNet, local features instead of the whole features, i.e., the output of the backbone, are used by a well-designed Convolutional Neural Networks (CNN) to compute direction maps. The local features are extracted from the whole features through RoIAlign, based on the region provided by detection network. Experiments on LINEMOD dataset show that RPVNet’s average accuracy (86.1%) is almost equal to state-of-the-art (86.4%) when no occlusion occurs. Meanwhile, results on Occlusion LINEMOD dataset show that RPVNet outperforms state-of-the-art (43.7% vs. 40.8%) and is more accurate for small object in occluded scenes.

求助该文献

最长约 10秒，即可获得该文献文件

Region Pixel Voting Network (RPVNet) for 6D Pose Estimation from Monocular Image

今日热心研友