凝视
计算机科学
变压器
人工智能
计算机视觉
计算
人机交互
算法
工程类
电压
电气工程
作者
Yujie Li,Xinghe Wang,Zihang Ma,Yifu Wang,Michael C. Meyer
标识
DOI:10.1109/mcsoc60832.2023.00026
摘要
Egocentric gaze estimation represents a challenging and immensely significant task which has promising future applications in areas such as human-computer interaction and AR/VR. In this work, we propose a novel model based on the Video Swin Transformer architecture. Through the introduction of localized inductive bias, our model extracts essential local features from first person videos during the windowed self-attention computation process. Additionally, we approximate the modeling of the global context within the gaze region using a shift window approach. We evaluate our approach on the EGTEA Gaze+ dataset, a publicly available dataset for egocentric activity videos. Experimental results unequivocally demonstrate that our model achieves state-of-the-art performance.
科研通智能强力驱动
Strongly Powered by AbleSci AI