Significant advancements have been made in neural networks for 3D object detection in autonomous driving. However, these vehicles often encounter small and occluded objects, leading to fewer available features and requirement of high positioning accuracy. Current approaches to 3D vehicle detection frequently overlook this challenge, simply feeding features into existing detection models. This paper introduces an innovative boosting multi-modal fusion for 3D vehicle object detection. Initially, we employ pre-trained 3D and 2D object detection models to generate 3D and 2D bounding boxes. Subsequently, a fusion strategy grounded in the rotation intersection ratio, merges two kinds of bounding boxes. To capture information from small objects, we develop a grouping-splitting residual network enhanced with coordinate attention, facilitating the extraction of more detailed information. Experimental results on KITTI dataset reveal that our method achieves a 90.46% accuracy for hard samples in Bird’s eye view. Compared with the advanced multi-modal 3D object detection performance, such as CLOCs, HMFI and PointPainting, our accuracy in hard samples has improved by 1.1%, 1.84%, and 3.75%.