Introduction The intelligent detection and counting of maize seedlings constitute crucial components in future smart maize cultivation and breeding. However, the detection of maize seedlings in field environments faces substantial challenges due to their relatively small target size and the complex environment of the farmland. Methods This study proposed an improved detection model named CBAM-RTDETR. Based on the original feature extraction backbone network of RT-DETR, the model introduced the CBAM module and grouped convolution. Results The CBAM-RTDETR model achieved a mean Average Precision at 0.5 IoU threshold (mAP0.5) of 92.9%, a mean Average Recall (AR) of 64.4%, and a Frames Per Second (FPS) of 87f/s on the test dataset, all of which are better than the comparison model. Discussion The proposed model strengthened the shallow edge detail information of the seedlings and increased the feature diversity, effectively addressed the challenges of real-time and accurate identification of maize seedlings in UAV remote sensing images.