行人
变压器
计算机科学
人工智能
工程类
电气工程
电压
运输工程
作者
Xiaobo Chen,Shilin Zhang,Wei Xu,Dapeng Cheng,Lei Yang
标识
DOI:10.1109/tim.2025.3575998
摘要
Predicting pedestrian crossing behavior is becoming increasingly important for autonomous driving vehicles, especially in the scene of urban transport. Most of the previous methods concentrate on feature-level fusion that integrates various types of input data, without considering the prediction of each individual input. To overcome this defect, this paper proposes an uncertainty-guided Transformer ensemble network (UTENet) that explores the merits of both feature-level and decision-level in a unified framework. The proposed model takes only the pedestrian bounding box and ego-vehicle velocity as input. First, for each input, we apply the self-attention mechanism to model the intra-modal correlation and aggregate the correlated features at different moments. Then, we put forward a cross-modal attention-based fusion module to capture the intra-modal relationships between two inputs such that a more comprehensive representation related to crossing intention can be generated. Finally, we design an uncertainty-based ensemble strategy for decision-level fusion, thus remedying the drawback of individual prediction and enhancing the robustness. The experiment on real-world benchmark dataset results verify that our model can predict pedestrian crossing behavior using less modal information while achieving performance that is comparable to or even better than the methods relying on more inputs. Extensive ablation studies are also provided to verify the effectiveness of our model components.
科研通智能强力驱动
Strongly Powered by AbleSci AI