计算机科学
水准点(测量)
帧(网络)
初始化
匹配(统计)
边距(机器学习)
人工智能
点(几何)
雅卡索引
计算机视觉
跟踪(教育)
序列(生物学)
推论
弹道
编码(集合论)
模式识别(心理学)
机器学习
数学
心理学
天文
几何学
地理
大地测量学
程序设计语言
集合(抽象数据类型)
电信
教育学
物理
遗传学
统计
生物
作者
Carl Doersch,Yi Yang,Mel Vecerík,Dilara Gokay,Ankush Gupta,Yusuf Aytar,João Carreira,Andrew Zisserman
出处
期刊:Cornell University - arXiv
日期:2023-06-14
标识
DOI:10.48550/arxiv.2306.08637
摘要
We present a novel model for Tracking Any Point (TAP) that effectively tracks any queried point on any physical surface throughout a video sequence. Our approach employs two stages: (1) a matching stage, which independently locates a suitable candidate point match for the query point on every other frame, and (2) a refinement stage, which updates both the trajectory and query features based on local correlations. The resulting model surpasses all baseline methods by a significant margin on the TAP-Vid benchmark, as demonstrated by an approximate 20% absolute average Jaccard (AJ) improvement on DAVIS. Our model facilitates fast inference on long and high-resolution video sequences. On a modern GPU, our implementation has the capacity to track points faster than real-time, and can be flexibly extended to higher-resolution videos. Given the high-quality trajectories extracted from a large dataset, we demonstrate a proof-of-concept diffusion model which generates trajectories from static images, enabling plausible animations. Visualizations, source code, and pretrained models can be found on our project webpage.
科研通智能强力驱动
Strongly Powered by AbleSci AI