Cambrian-S: Towards Spatial Supersensing in Video

作者
Shusheng Yang,Jihan Yang,Peiyan Huang,Ellis Brown,Zihao Yang,Xingxing Zhu,Shengbang Tong,Zhongqiang Zheng,Yifan Xu,Minghua Wang,D. Lu,Rob Fergus,Yann LeCun,Feifei Li,Saining Xie
出处
期刊:Cornell University - arXiv
标识
DOI:10.48550/arxiv.2511.04670
摘要

We argue that progress in true multimodal intelligence calls for a shift from reactive, task-driven systems and brute-force long context towards a broader paradigm of supersensing. We frame spatial supersensing as four stages beyond linguistic-only understanding: semantic perception (naming what is seen), streaming event cognition (maintaining memory across continuous experiences), implicit 3D spatial cognition (inferring the world behind pixels), and predictive world modeling (creating internal models that filter and organize information). Current benchmarks largely test only the early stages, offering narrow coverage of spatial cognition and rarely challenging models in ways that require true world modeling. To drive progress in spatial supersensing, we present VSI-SUPER, a two-part benchmark: VSR (long-horizon visual spatial recall) and VSC (continual visual spatial counting). These tasks require arbitrarily long video inputs yet are resistant to brute-force context expansion. We then test data scaling limits by curating VSI-590K and training Cambrian-S, achieving +30% absolute improvement on VSI-Bench without sacrificing general capabilities. Yet performance on VSI-SUPER remains limited, indicating that scale alone is insufficient for spatial supersensing. We propose predictive sensing as a path forward, presenting a proof-of-concept in which a self-supervised next-latent-frame predictor leverages surprise (prediction error) to drive memory and event segmentation. On VSI-SUPER, this approach substantially outperforms leading proprietary baselines, showing that spatial supersensing requires models that not only see but also anticipate, select, and organize experience.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
PDF的下载单位、IP信息已删除 (2025-6-4)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
刚刚
qwe31533完成签到,获得积分10
1秒前
赘婿应助123456采纳,获得10
1秒前
Ava应助执着白筠采纳,获得10
1秒前
朱虾仁发布了新的文献求助10
1秒前
2秒前
dll完成签到,获得积分10
2秒前
2秒前
雨林木风发布了新的文献求助10
2秒前
3秒前
3秒前
愉快道之完成签到 ,获得积分10
3秒前
3秒前
隔壁小孩完成签到,获得积分10
4秒前
Arther完成签到,获得积分10
4秒前
英姑应助紫薇的舔狗采纳,获得10
4秒前
5秒前
5秒前
隐形曼青应助不要加糖采纳,获得10
6秒前
6秒前
dll发布了新的文献求助10
6秒前
kingkingmai完成签到 ,获得积分10
6秒前
小蘑菇应助oenao采纳,获得10
6秒前
黄狻发布了新的文献求助10
7秒前
7秒前
8秒前
你怎么睡得着觉完成签到,获得积分10
9秒前
9秒前
胤子墨铭发布了新的文献求助10
9秒前
啦啦啦完成签到,获得积分10
10秒前
阿发发布了新的文献求助30
10秒前
11秒前
科研通AI5应助刘嘻嘻采纳,获得10
11秒前
11秒前
zhengze233发布了新的文献求助10
11秒前
佛系少女发布了新的文献求助10
11秒前
狐假假发布了新的文献求助10
12秒前
刘小仟完成签到,获得积分10
12秒前
舒心易烟完成签到,获得积分10
12秒前
图图羊发布了新的文献求助10
12秒前
高分求助中
(应助此贴封号)【重要!!请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000
Pipeline and riser loss of containment 2001 - 2020 (PARLOC 2020) 1000
The Social Work Ethics Casebook: Cases and Commentary (revised 2nd ed.).. Frederic G. Reamer 600
Extreme ultraviolet pellicle cooling by hydrogen gas flow (Conference Presentation) 500
Phylogenetic study of the order Polydesmida (Myriapoda: Diplopoda) 500
A Manual for the Identification of Plant Seeds and Fruits : Second revised edition 500
Lloyd's Register of Shipping's Approach to the Control of Incidents of Brittle Fracture in Ship Structures 500
热门求助领域 (近24小时)
化学 医学 生物 材料科学 工程类 有机化学 内科学 生物化学 物理 计算机科学 纳米技术 遗传学 基因 复合材料 化学工程 物理化学 病理 催化作用 免疫学 量子力学
热门帖子
关注 科研通微信公众号,转发送积分 5176292
求助须知:如何正确求助?哪些是违规求助? 4365276
关于积分的说明 13591128
捐赠科研通 4215011
什么是DOI,文献DOI怎么找? 2311757
邀请新用户注册赠送积分活动 1310667
关于科研通互助平台的介绍 1258741