Abstract In the pursuit of advancing computer vision, this manuscript addresses the complex challenge of underwater object detection through a novel YOLOv9-Side scan Network (YOLOv9-SN) model. The research encapsulates the integration of Negative Sample Refinement, Attention and Convolution mix strategy, and Spatial and Channel reconstruction Convolution convolutional layers, thus enhances the model’s discriminative learning and efficiency. The incorporation of Bidirectional Feature Pyramid Network and Multi-Path Distance Intersectionover Union metrics significantly improves the performance of feature integration and object localization. Comparative analysis with established models such as Faster region-based convolutional neural network, DEtection TRansformer, and YOLOv5, along with rigorous ablation studies, demonstrates the superiority of the proposed YOLOv9-SN model. Evaluating on the zero-shot learning-sonar submarine simulation dataset, this model achieves a mAP@0.5:0.95 of 72.1%, surpassing the baseline of YOLOv9 by 3.5%. This research contributes to the enhancement of detection metrics and the advancement of side scan sonar imaging for underwater targets, emphasizing the model’s high precision and accuracy in underwater target detection.