抓住
计算机科学
人工智能
编码器
钥匙(锁)
变压器
计算机视觉
机器人
人机交互
程序设计语言
工程类
计算机安全
电压
电气工程
操作系统
作者
Shaochen Wang,Zhangli Zhou,Zhen Kan
出处
期刊:IEEE robotics and automation letters
日期:2022-06-29
卷期号:7 (3): 8170-8177
被引量:58
标识
DOI:10.1109/lra.2022.3187261
摘要
In this letter, we present a transformer-based architecture, namely TF-Grasp, for robotic grasp detection. The developed TF-Grasp framework has two elaborate designs making it well suitable for visual grasping tasks. The first key design is that we adopt the local window attention to capture local contextual information and detailed features of graspable objects. Then, we apply the cross window attention to model the long-term dependencies between distant pixels. Object knowledge, environmental configuration, and relationships between different visual entities are aggregated for subsequent grasp detection. The second key design is that we build a hierarchical encoder-decoder architecture with skip-connections, delivering shallow features from the encoder to decoder to enable a multi-scale feature fusion. Due to the powerful attention mechanism, TF-Grasp can simultaneously obtain the local information (i.e., the contours of objects), and model long-term connections such as the relationships between distinct visual concepts in clutter. Extensive computational experiments demonstrate that TF-Grasp achieves competitive results versus state-of-art grasping convolutional models and attains a higher accuracy of $97.99 \%$ and 94.6% on Cornell and Jacquard grasping datasets, respectively. Real-world experiments using a 7DoF Franka Emika Panda robot also demonstrate its capability of grasping unseen objects in a variety of scenarios. The code is available at https://github.com/WangShaoSUN/grasp-transformer .
科研通智能强力驱动
Strongly Powered by AbleSci AI