计算机科学
目标检测
无人机
加权
推论
人工智能
对象(语法)
探测器
光学(聚焦)
结构化
计算资源
计算机视觉
机器学习
数据挖掘
模式识别(心理学)
计算复杂性理论
算法
放射科
财务
物理
经济
光学
生物
电信
医学
遗传学
作者
Jun Wang,Weifeng Liu,Weishan Zhang,Baodi Liu
标识
DOI:10.1109/icsp56322.2022.9965217
摘要
Object detection has a wide range of applications in drone-captured scenarios. However, implementing these cumbersome object detection models on resource-constrained devices, such as cell phones and drones, is challenging not only because of the high computational complexity but also the large amount of storage required. The spatial induction bias in CNN models enable the models to learn representational information with smaller parameters. However, CNN networks are spatially localized and ignore the global information of the image, which is a serious miss for dense object scenarios. With the emergence of self-attention mechanisms, the importance of global information is back in focus. Self-attention-based vision transformers (Vits) are representative network models of pure attention mechanism models, but the number of parameters is huge and difficult to train. Light-weighting is an important task in object detection tasks, and to implement a more light-weight self-attention detection network, we propose the LG-YOLOv5 detection framework. We have conducted experiments on the VisDrone dataset and the results show that the performance in terms of accuracy is 0.4% and 8.2% higher than yolov5l and CornetNet, respectively. More importantly, our model size is only 36% of the yolov5l model and the inference speed is 46% faster compared to YOLOv5l.
科研通智能强力驱动
Strongly Powered by AbleSci AI