稳健性(进化)
人工智能
计算机科学
传感器融合
深度学习
RGB颜色模型
跳跃式监视
计算机视觉
编码器
推论
融合
最小边界框
特征提取
实时计算
工程类
延迟(音频)
字节
激光雷达
感知
模式识别(心理学)
作者
Zhong Wei-min,Zhong Wei-min
标识
DOI:10.1142/s0218126626420090
摘要
Autonomous vehicles require accurate perception for safe and effective navigation. Relying solely on RGB cameras or LiDAR is unreliable under poor lighting, occlusion, or adverse weather. Traditional multi-sensor fusion methods (early or late fusion) often ignore contextual cues, lack robustness to sensor failure, and suffer from high latency and lower accuracy in complex traffic. To address these issues, this paper proposes a deep learning-based middle fusion framework using EfficientNetV2-S and VoxelNet for feature extraction, and a Transformer encoder for adaptive, attention-based fusion. The model uses environment-aware, modality-specific encoders and dynamically aligns sensor features based on context. It achieved a mean Average Precision of 91.6%, bounding box accuracy of 95.5%, and an inference speed of 41.3 ms/frame. Performance remained strong under night (82.7%) and occlusion (92.1%) conditions, with a Fusion Robustness Index of 0.81. The approach outperformed FCN8, U-Net, and early fusion models in accuracy, speed, and fault tolerance, offering a real-time, robust perception solution for intelligent transportation systems.
科研通智能强力驱动
Strongly Powered by AbleSci AI