行人
情态动词
变压器
人行横道
计算机科学
任务(项目管理)
平交道口
工程类
运输工程
模拟
电压
电气工程
系统工程
机械工程
化学
高分子化学
作者
Xiaobo Chen,Shilin Zhang,Jun Li,Jian Yang
标识
DOI:10.1109/tits.2024.3386689
摘要
Accurate prediction of whether pedestrians will cross the street is prevalently recognized as an indispensable function of autonomous driving systems, especially in urban environments. How to utilize the complementary information present in different types of data (or modalities) is one of the major challenges. This paper makes the first attempt to develop a cross-modal transformer-based crossing intention prediction model merely using bounding boxes and ego-vehicle speed as input features. The cross-modal transformer can leverage self-attention and cross-modal attention to mine the modality-specific and complementary correlation. A bottleneck feature fusion is presented to obtain the compressed feature representation. To facilitate the network training, we further put forward a novel uncertainty-aware multi-task learning method that jointly predicts the future bounding box as well as crossing action such that the commonalities and differences across two tasks can be exploited. To evaluate the proposed method, extensive comparative experiments and ablation studies are performed on two benchmark datasets. The results demonstrate that by only using the bounding box and ego-vehicle speed as input features, our model is on a par with other state-of-the-art approaches that rely on more inputs, and even achieves superior performance in most cases. The source code will be released at https://github.com/xbchen82/PedCMT.
科研通智能强力驱动
Strongly Powered by AbleSci AI