Unmanned Aerial Vehicles (UAVs) face a significant challenge in balancing high accuracy and high efficiency when performing real-time object detection tasks, especially amidst intricate backgrounds, diverse target scales, and stringent onboard computational resource constraints. To tackle these difficulties, this study introduces YOLO-SRMX, a lightweight real-time object detection framework specifically designed for infrared imagery captured by UAVs. Firstly, the model utilizes ShuffleNetV2 as an efficient lightweight backbone and integrates the novel Multi-Scale Dilated Attention (MSDA) module. This strategy not only facilitates a substantial 46.4% reduction in parameter volume but also, through the flexible adaptation of receptive fields, boosts the model’s robustness and precision in multi-scale object recognition tasks. Secondly, within the neck network, multi-scale feature extraction is facilitated through the design of novel composite convolutions, ConvX and MConv, based on a “split–differentiate–concatenate” paradigm. Furthermore, the lightweight GhostConv is incorporated to reduce model complexity. By synthesizing these principles, a novel composite receptive field lightweight convolution, DRFAConvP, is proposed to further optimize multi-scale feature fusion efficiency and promote model lightweighting. Finally, the Wise-IoU loss function is adopted to replace the traditional bounding box loss. This is coupled with a dynamic non-monotonic focusing mechanism formulated using the concept of outlier degrees. This mechanism intelligently assigns elevated gradient weights to anchor boxes of moderate quality by assessing their relative outlier degree, while concurrently diminishing the gradient contributions from both high-quality and low-quality anchor boxes. Consequently, this approach enhances the model’s localization accuracy for small targets in complex scenes. Experimental evaluations on the HIT-UAV dataset corroborate that YOLO-SRMX achieves an mAP50 of 82.8%, representing a 7.81% improvement over the baseline YOLOv8s model; an F1 score of 80%, marking a 3.9% increase; and a substantial 65.3% reduction in computational cost (GFLOPs). YOLO-SRMX demonstrates an exceptional trade-off between detection accuracy and operational efficiency, thereby underscoring its considerable potential for efficient and precise object detection on resource-constrained UAV platforms.