视觉伺服                        
                
                                
                        
                            人工智能                        
                
                                
                        
                            计算机视觉                        
                
                                
                        
                            计算机科学                        
                
                                
                        
                            机器人                        
                
                                
                        
                            探测器                        
                
                                
                        
                            像素                        
                
                                
                        
                            帧速率                        
                
                                
                        
                            推论                        
                
                                
                        
                            电信                        
                
                        
                    
            作者
            
                Junqi Luo,Liucun Zhu,Liang Li,Peitao Hong            
         
                    
        
    
            
            标识
            
                                    DOI:10.1109/tim.2023.3335521
                                    
                                
                                 
         
        
                
            摘要
            
            The paradigm of ” deep learning visual perception + hand-eye transformation + motion planning ” for robot grasping has demonstrated viable capabilities in specific scenarios. However, its further development faces challenges in handling complex and dynamic environments. This paper proposes a keypoint detection network-driven visual servoing grasping framework. Firstly, we develop an efficient two-stage keypoint detector to perform real-time inference of sparse image-plane features for the target. Subsequently, a low-pass filtering algorithm is employed to smoothen the detected keypoints. These processed keypoints are then used in an image-based visual servoing controller to calculate the robot joint velocities, enabling precise tracking. A specialized dataset for training and evaluation was constructed using domain randomization techniques, comprising 11K samples across six categories. Comprehensive experiments demonstrate the detector’s low latency and accurate performance, even in low lighting, overexposure, partial occlusion, and densely packed environments. Static and dynamic grasping experiments validate that this framework achieves localization accuracy superior to 5 pixels and an overall grasping success rate exceeding 70% under unknown hand-eye calibration. The dataset is provided at github.com/hijunqi/VS_grasping_keypoint_detection_dataset.
         
            
 
                 
                
                    
                    科研通智能强力驱动
Strongly Powered by AbleSci AI