Abstract Accurate detection of steel surface defects is of great significance for ensuring product quality production safety. However, existing detection models still suffer from insufficient accuracy and poor robustness when facing practical industrial scenarios characterized by complex defect morphologies, diverse scales, blurred boundaries, and strong background interference. To address this, this paper proposes an improved YOLO11 detection model—DSP-YOLO, aimed at enhancing the comprehensive performance of steel surface defect detection. We design a Dynamic Aggregation Network Block, which introduces multi-scale direction-aware dynamic depthwise convolution and channel interaction mechanisms to strengthen the flexibility and expressive power of feature extraction; propose a spatial context fusion module that integrates semantic awareness and context-guided strategies to effectively improve feature fusion effectiveness under complex textured backgrounds; additionally, introduce a lightweight asymmetric pinwheel-shaped convolution structure to significantly enhance the receptive field and directional modeling capability while maintaining low computational cost. Experimental results on two typical steel defect datasets, NEU-DET and GC10-DET, show that the proposed method achieved 2.5% and 2.6% improvements in mAP@0.5 respectively, surpassing existing mainstream YOLO models in terms of accuracy, efficiency, and generalization ability, demonstrating good potential for industrial applications.