ABSTRACT In order to address the challenge of small target recognition in traffic scenes, we propose a model based on you only look once version 8X (Yolov8X) network model, which has been combined with receptive fields block (RFB) and multidimensional collaborative attention (MCA). First, the model employs the RFB to extract reliable and distinctive features, thereby enhancing the precision of small target identification. Furthermore, the MCA structure is introduced to simulate multidimensional attention through three parallel branches, thereby enhancing the feature expression ability of the model. This fragment describes a compression transformation and an excitation transformation that captures the differentiated feature representation of the command. These transformations facilitate the network's ability to locate and predict the location of small objects more accurately. Utilizing these transformations enhances the expressiveness and diversity of features, thereby improving the detection performance of small objects. Furthermore, data augmentation and hyperparameter optimization techniques are employed to enhance the model's generalisability. The validation results on the Argoverse 1.1 autonomous driving dataset demonstrate that the enhanced network model outperforms the prevailing detectors, achieving an F1 score of 78.6, an average precision of 55.1, and an average recall of 72.4. The algorithm's excellent performance for small target detection was demonstrated through visual analysis, proving its high application value and potential for promotion in fields such as autonomous driving.