计算机科学
人工智能
编码器
特征(语言学)
模式识别(心理学)
计算机视觉
面子(社会学概念)
面部识别系统
棱锥(几何)
特征提取
解码方法
数据挖掘
算法
数学
哲学
操作系统
社会学
语言学
社会科学
几何学
作者
Pei Wen,Cheng Sun,Shiwen Zhang,Yuansheng Luo,Haowei Huang,Jin Zhang
标识
DOI:10.1109/isctech58360.2022.00060
摘要
Face detection in classroom scenes has always been a challenging problem because of the diversity and difficulty in extracting face features caused by multi-face clustering and multi-scale face changes. The recently proposed end-to-end object detector called DETR uses tr899978ansformer architecture instead of hand-designed components to obtain global interactive attention information of images, which not only simplifies the model structure but also greatly improves the feature interaction ability. Inspired by these works, we proposed an encoder-only DETR for classroom face detection without reducing the accuracy of detection, dubbed EOPSA-FACE, considering the huge amount of computation caused by the encoding and decoding structure of transformer included in the DETR. Firstly, aiming at the deficiency of multi-scale feature fusion, an efficient pyramid squeeze attention block is used to improve the backbone Resnet, so that the model can learn richer multi-scale feature representation. Secondly, considering the decrease of detection accuracy caused by insufficient allocation of positive and negative samples in the DETR, a pseudo intersection over union (Pseudo-IOU) design was introduced to achieve more accurate sample allocation. Extensive experiments on self-built face datasets in real classroom scenarios demonstrate the superiority of EOPSA-FACE.
科研通智能强力驱动
Strongly Powered by AbleSci AI