计算机科学
人工智能
水准点(测量)
理解力
自然语言处理
表达式(计算机科学)
机器学习
计算机视觉
模式识别(心理学)
特征提取
支持向量机
语义学(计算机科学)
作者
Kai Wang,Guanbo Wu,Xueyang Fu,Xingbo Wang,Kean Liu,Xin Lu,Chengjie Ge,Wei Zhai,Jing Zhang
标识
DOI:10.1109/tpami.2026.3681112
摘要
Unmanned aerial vehicles (UAV) are increasingly deployed to assist humans in diverse tasks, where understanding human intentions is critical to effective collaboration. Referring expression comprehension (REC) links language to visual targets, allowing UAV to recognize human-intended targets of interest, thereby supporting subsequent actions. However, existing REC research is almost exclusively confined to ground-based scenarios, leaving aerial scenarios largely unexplored. In this paper, we formally define UAV-based REC as a new research problem and highlight its unique challenges, including abundant background interference, small target size, and complex referring relations. To enable systematic study, we introduce SkyFind, a large-scale dataset with one million high-quality target-expression pairs, providing a solid foundation. In addition, we propose AerialREC, a baseline framework that reduces background interference in UAV imagery by searching for a potential target region before localization. We establish benchmark results on SkyFind using ten representative REC methods and validate the effectiveness of the AerialREC framework.
科研通智能强力驱动
Strongly Powered by AbleSci AI