点云                        
                
                                
                        
                            计算机科学                        
                
                                
                        
                            激光雷达                        
                
                                
                        
                            情态动词                        
                
                                
                        
                            人工智能                        
                
                                
                        
                            计算机视觉                        
                
                                
                        
                            分割                        
                
                                
                        
                            稳健性(进化)                        
                
                                
                        
                            特征(语言学)                        
                
                                
                        
                            特征提取                        
                
                                
                        
                            融合                        
                
                                
                        
                            模式识别(心理学)                        
                
                                
                        
                            遥感                        
                
                                
                        
                            地理                        
                
                                
                        
                            基因                        
                
                                
                        
                            生物化学                        
                
                                
                        
                            哲学                        
                
                                
                        
                            化学                        
                
                                
                        
                            高分子化学                        
                
                                
                        
                            语言学                        
                
                        
                    
            作者
            
                Jiale Li,Hang Dai,Han Hao,Yong Ding            
         
            
    
            
            标识
            
                                    DOI:10.1109/cvpr52729.2023.02078
                                    
                                
                                 
         
        
                
            摘要
            
            LiDAR and camera are two modalities available for 3D semantic segmentation in autonomous driving. The popular LiDAR-only methods severely suffer from inferior segmentation on small and distant objects due to insufficient laser points, while the robust multi-modal solution is under-explored, where we investigate three crucial inherent difficulties: modality heterogeneity, limited sensor field of view intersection, and multi-modal data augmentation. We propose a multi-modal 3D semantic segmentation model (MSeg3D) with joint intra-modal feature extraction and inter-modal feature fusion to mitigate the modality heterogeneity. The multi-modal fusion in MSeg3D consists of geometry-based feature fusion GF-Phase, cross-modal feature completion, and semantic-based feature fusion SF-Phase on all visible points. The multi-modal data augmentation is reinvigorated by applying asymmetric transformations on LiDAR point cloud and multi-camera images individually, which benefits the model training with diversified augmentation transformations. MSeg3D achieves state-of-the-art results on nuScenes, Waymo, and SemanticKITTI datasets. Under the malfunctioning multi-camera input and the multi-frame point clouds input, MSeg3D still shows robustness and improves the LiDAR-only baseline. Our code is publicly available at https://github.com/jialeli1/lidarseg3d.
         
            
 
                 
                
                    
                    科研通智能强力驱动
Strongly Powered by AbleSci AI