计算机科学
基线(sea)
水准点(测量)
人工智能
地球观测
桥(图论)
原始数据
资源(消歧)
数据挖掘
空间分析
地理空间分析
遥感
语义学(计算机科学)
机器学习
空间智能
可验证秘密共享
数据建模
空间数据库
支持向量机
作者
Roger Ferrod,Maël Lecene,Krishna Sapkota,George Leifman,Vered Silverman,Genady Beryozkin,Sylvain Lobry
出处
期刊:Cornell University - arXiv
日期:2026-03-15
摘要
Precise spatial understanding in Earth Observation is essential for translating raw aerial imagery into actionable insights for critical applications like urban planning, environmental monitoring and disaster management. However, Multimodal Large Language Models exhibit critical deficiencies in fine-grained spatial understanding within Remote Sensing, primarily due to a reliance on limited or repurposed legacy datasets. To bridge this gap, we introduce a large-scale dataset grounded in verifiable cadastral vector data, comprising 3.8 million annotated objects across 510k high-resolution images with 135 granular semantic categories. We validate this resource through a comprehensive instruction-tuning benchmark spanning seven spatial reasoning tasks. Our evaluation establishes a robust baseline using a standard LLaVA architecture. We show that while current RS-specialized and commercial models (e.g., Gemini) struggle in zero-shot settings, high-fidelity supervision effectively bridges this gap, enabling standard architectures to master fine-grained spatial grounding without complex architectural modifications.
科研通智能强力驱动
Strongly Powered by AbleSci AI