动作(物理)
扎根理论
因果模型
因果参照理论
答疑
计算机科学
认知心理学
心理学
认识论
人工智能
社会学
定性研究
数学
哲学
统计
社会科学
物理
量子力学
作者
Ting En Lam,Yuhan Chen,Elston Tan,Eric Peh,Ruirui Chen,Paritosh Parmar,Basura Fernando
出处
期刊:Cornell University - arXiv
日期:2024-04-01
标识
DOI:10.48550/arxiv.2404.01299
摘要
Causal video question answering (QA) has garnered increasing interest, yet existing datasets often lack depth in causal reasoning analysis. To address this gap, we capitalize on the unique properties of cartoons and construct CausalChaos!, a novel, challenging causal Why-QA dataset built upon the iconic "Tom and Jerry" cartoon series. With thoughtful questions and multi-level answers, our dataset contains much longer causal chains embedded in dynamic interactions and visuals, at the same time principles of animation allows animators to create well-defined, unambiguous causal relationships. These factors allow models to solve more challenging, yet well-defined causal relationships. We also introduce hard negative mining, including CausalConfusion version. While models perform well, there is much room for improvement, especially, on open-ended answers. We identify more advanced/explicit causal relationship modeling and joint modeling of vision and language as the immediate areas for future efforts to focus upon. Along with the other complementary datasets, our new challenging dataset will pave the way for these developments in the field. We will release our dataset, codes, and models to help future efforts in this domain.
科研通智能强力驱动
Strongly Powered by AbleSci AI