Crop maturity status recognition is a key component of automated harvesting. Traditional manual detection methods are inefficient and costly, presenting a significant challenge for the agricultural industry. To improve crop maturity detection, we propose enhancements to the Real-Time DEtection TRansformer (RT-DETR) method. The original model's Backbone structure is refined by: HG Block Enhancement: Replacing conventional convolution with the Rep Block during feature extraction, incorporating multiple branches to improve model accuracy. Partial Convolution (PConv): Replacing traditional convolution in the Rep Block with PConv, which applies convolution to only a portion of the input channels, reducing computational redundancy. Efficient Multi-Scale Attention (EMA): Introducing EMA to ensure a uniform distribution of spatial semantic features within feature groups, improving model performance and efficiency. The refined model significantly enhances detection accuracy. Compared to the original model, the average accuracy (mAP@0.5) improves by 2.9%, while model size is reduced by 5.5% and computational complexity decreases by 9.6%. Further experiments comparing the RT-DETR model, YOLOv8, and our improved model on plant pest detection datasets show that our model outperforms others in general scenarios. The experimental results validate the efficacy of the enhanced RT-DETR model in crop maturity detection. The improvements not only enhance detection accuracy but also reduce model size and computational complexity, making it a promising solution for automated crop maturity detection.