ABSTRACT This study addressed the challenges of small target detection in aerial imaging applications, including limited pixel coverage, weak feature representation, and complex background interference, by proposing a collaborative optimisation algorithm named HMMSC‐YOLO. Firstly, a CNN‐Transformer heterogeneous feature interaction network was constructed to mitigate high‐frequency information attenuation during hierarchical transmission of small targets. Secondly, a parameter‐shared dilated convolutional chain structure was designed, employing a weight‐reuse strategy across multi‐branch heterogeneous receptive fields to enhance geometric feature sensitivity towards minuscule targets. A differentiable affine transformation‐guided multi‐kernel dynamic fusion mechanism was further developed, achieving high‐precision geometric alignment of cross‐scale features through learnable deformation fields, thereby overcoming the rigid fusion limitations of conventional feature pyramids. A dual‐attention‐driven feature recalibration architecture was introduced to improve target localisation robustness under complex background interference. Finally, a dual‐path collaborative downsampling module was implemented to suppress feature confusion caused by traditional single‐path downsampling. Experimental evaluations on the VisDrone2019 dataset demonstrated 1.4% and 1% improvements in mAP50 and mAP50:95 metrics respectively compared to baseline models, alongside 23.3% and 2.5% reductions in parameter quantity and computational costs. The algorithm exhibited superior localisation accuracy and occlusion resistance in dense small target scenarios, establishing an innovative technical framework for practical applications including aerial image analysis and low‐light environmental monitoring.