计算机科学
概化理论
变压器
人工智能
边距(机器学习)
姿势
统一模型
机器学习
计算机视觉
模式识别(心理学)
统计
物理
数学
量子力学
电压
气象学
作者
Bowen Wen,Wei Yang,Jan Kautz,Stan Birchfield
出处
期刊:Cornell University - arXiv
日期:2023-12-13
被引量:7
标识
DOI:10.48550/arxiv.2312.08344
摘要
We present FoundationPose, a unified foundation model for 6D object pose estimation and tracking, supporting both model-based and model-free setups. Our approach can be instantly applied at test-time to a novel object without fine-tuning, as long as its CAD model is given, or a small number of reference images are captured. We bridge the gap between these two setups with a neural implicit representation that allows for effective novel view synthesis, keeping the downstream pose estimation modules invariant under the same unified framework. Strong generalizability is achieved via large-scale synthetic training, aided by a large language model (LLM), a novel transformer-based architecture, and contrastive learning formulation. Extensive evaluation on multiple public datasets involving challenging scenarios and objects indicate our unified approach outperforms existing methods specialized for each task by a large margin. In addition, it even achieves comparable results to instance-level methods despite the reduced assumptions. Project page: https://nvlabs.github.io/FoundationPose/
科研通智能强力驱动
Strongly Powered by AbleSci AI