人工智能
目标检测
计算机科学
比例(比率)
代表(政治)
对象(语法)
计算机视觉
视觉对象识别的认知神经科学
模式识别(心理学)
机器学习
政治学
量子力学
政治
物理
法学
作者
Yuming Chen,Xinbin Yuan,Jiabao Wang,Ruiqi Wu,Xiang Li,Qibin Hou,Ming–Ming Cheng
标识
DOI:10.1109/tpami.2025.3538473
摘要
We aim at providing the object detection community with an efficient and performant object detector, termed YOLO-MS. The core design is based on a series of investigations on how multi-branch features of the basic block and convolutions with different kernel sizes affect the detection performance of objects at different scales. The outcome is a new strategy that can significantly enhance multi-scale feature representations of real-time object detectors. To verify the effectiveness of our work, we train our YOLO-MS on the MS COCO dataset from scratch without relying on any other large-scale datasets, like ImageNet or pre-trained weights. Without bells and whistles, our YOLO-MS outperforms the recent state-of-the-art real-time object detectors, including YOLO-v7, RTMDet, and YOLO-v8. Taking the XS version of YOLO-MS as an example, it can achieve an AP score of 42+% on MS COCO, which is about 2% higher than RTMDet with the same model size. Furthermore, our work can also serve as a plug-and-play module for other YOLO models. Typically, our method significantly advances the APs, APl, and AP of YOLOv8-N from 18%+, 52%+, and 37%+ to 20%+, 55%+, and 40%+, respectively, with even fewer parameters and MACs. Code and trained models are publicly available at https://github.com/FishAndWasabi/YOLO-MS. We also provide the Jittor version at https://github.com/NK-JittorCV/nk-yolo.
科研通智能强力驱动
Strongly Powered by AbleSci AI