计算机科学
启发式
视频跟踪
变压器
端到端原则
BitTorrent跟踪器
帧(网络)
人工智能
公制(单位)
数据挖掘
对象(语法)
实时计算
计算机视觉
眼动
计算机网络
运营管理
物理
量子力学
电压
经济
操作系统
作者
Fangao Zeng,Bin Dong,Yuang Zhang,Tiancai Wang,Xiangyu Zhang,Yichen Wei
标识
DOI:10.1007/978-3-031-19812-0_38
摘要
Temporal modeling of objects is a key challenge in multiple-object tracking (MOT). Existing methods track by associating detections through motion-based and appearance-based similarity heuristics. The post-processing nature of association prevents end-to-end exploitation of temporal variations in video sequence. In this paper, we propose MOTR, which extends DETR [6] and introduces “track query” to model the tracked instances in the entire video. Track query is transferred and updated frame-by-frame to perform iterative prediction over time. We propose tracklet-aware label assignment to train track queries and newborn object queries. We further propose temporal aggregation network and collective average loss to enhance temporal relation modeling. Experimental results on DanceTrack show that MOTR significantly outperforms state-of-the-art method, ByteTrack [42] by 6.5% on HOTA metric. On MOT17, MOTR outperforms our concurrent works, TrackFormer [18] and TransTrack [29], on association performance. MOTR can serve as a stronger baseline for future research on temporal modeling and Transformer-based trackers. Code is available at https://github.com/megvii-research/MOTR .
科研通智能强力驱动
Strongly Powered by AbleSci AI