作者
Zhongyuan Liu,Zhuo Li,Chunwang Dong,Jiafeng Li
摘要
Automatic Tea Bud Detection (TBD) is one of the core technologies in intelligent tea-picking systems Since the tea buds are small, dense, highly overlapped, and their colors are close to the background, accurate tea bud detection faces great challenges. In this paper, a tea bud detection method, named as YOLO-TBD, is proposed, which adopts YOLOv8 as the basic framework. Firstly, the Path Aggregation Feature Pyramid Network (PAFPN) in YOLOv8 is improved by incorporating the features from the 2nd layer into the PAFPN network. This modification enables better utilization of low-level features, such as texture and color information, thereby enhancing the network’s feature representation ability. Secondly, a Triple-Branch Attention Mechanism (TBAM) is designed and integrated into the output of the backbone network and the C2f module. This attention mechanism strengthens the features of the tea bud objects and suppresses background noise through feature channel interactions, without increasing the model parameters. Finally, a Self-Correction Group Convolution (SCGC) is proposed, which replaces the conventional convolution in the C2f module. This convolution establishes long-range spatial and channel dependencies around each spatial position, enabling a larger receptive field and better contextual information capture with fewer parameters, thereby mitigating false detections and missed detections of tea bud objects. The proposed modules are integrated into the YOLOv8 network architecture, resulting in the construction of three detection models with different parameters, namely YOLO-TBD-L, YOLO-TBD-M and YOLO-TBD-S, respectively. Experimental results on our self-built tea bud detection dataset and the publicly available GWHD_2021 dataset demonstrate that, compared with current methods, the proposed YOLO-TBD-L method can attain a state-of-the-art accuracy, with mAP value reaching 87.04 % and 94.5 %, respectively. And the proposed YOLO-TBD-S model achieves comparable detection accuracy to the YOLOv8-L model with much lower model parameters and computational complexity. • The Path Aggregation Feature Pyramid Network (PAFPN) in YOLOv8 is improved, in which the 2nd layer features are also fed into the network, to fully exploit the texture and color information contained in the low-level features. • A Triple-Branch Attention Mechanism (TBAM) is designed, which employs a dual-branch structure to capture cross-dimensional interactions and the remaining branch is utilized to compute the similarity between each pixel in the feature maps and its adjacent pixels. • A Self-Correction Group Convolution (SCGC) is proposed, which establishes long-range spatial and channel dependencies around each spatial position.