计算机科学
人工智能
感知
人机交互
模仿
多模式学习
机器人
触觉知觉
计算机视觉
触觉传感器
主动感知
有线手套
机器人学
仿人机器人
信号(编程语言)
机器人学习
视觉感受
人机交互
感性学习
多通道交互
强化学习
深度学习
多模态
作者
Yuyang Li,Yinghan Chen,Zihang Zhao,Puhao Li,Tengyu Liu,Siyuan Huang,Yixin Zhu
出处
期刊:Cornell University - arXiv
日期:2025-12-10
标识
DOI:10.48550/arxiv.2512.09851
摘要
Robotic manipulation requires both rich multimodal perception and effective learning frameworks to handle complex real-world tasks. See-through-skin (STS) sensors, which combine tactile and visual perception, offer promising sensing capabilities, while modern imitation learning provides powerful tools for policy acquisition. However, existing STS designs lack simultaneous multimodal perception and suffer from unreliable tactile tracking. Furthermore, integrating these rich multimodal signals into learning-based manipulation pipelines remains an open challenge. We introduce TacThru, an STS sensor enabling simultaneous visual perception and robust tactile signal extraction, and TacThru-UMI, an imitation learning framework that leverages these multimodal signals for manipulation. Our sensor features a fully transparent elastomer, persistent illumination, novel keyline markers, and efficient tracking, while our learning system integrates these signals through a Transformer-based Diffusion Policy. Experiments on five challenging real-world tasks show that TacThru-UMI achieves an average success rate of 85.5%, significantly outperforming the baselines of tactile policy(66.3%) and vision-only policy (55.4%). The system excels in critical scenarios, including contact detection with thin and soft objects and precision manipulation requiring multimodal coordination. This work demonstrates that combining simultaneous multimodal perception with modern learning frameworks enables more precise, adaptable robotic manipulation.
科研通智能强力驱动
Strongly Powered by AbleSci AI