发布文献求助

Assessing Modality Bias in Video Question Answering Benchmarks with Multimodal Large Language Models

模态（人机交互）答疑计算机科学自然语言处理人工智能

作者

Jean Park,Kuk Jin Jang,Basam Alasaly,Sriharsha Mopidevi,Andrew Zolensky,Eric Eaton,Inseop Lee,Kevin B. Johnson

出处

期刊：Proceedings of the ... AAAI Conference on Artificial Intelligence [Association for the Advancement of Artificial Intelligence]
日期：2025-04-11 卷期号：39 (19): 19821-19829

链接

doi.org arxiv.org arxiv.orgdoi.org

标识

DOI：10.1609/aaai.v39i19.34183

摘要

Multimodal large language models (MLLMs) can simultaneously process visual, textual, and auditory data, capturing insights that complement human analysis. However, existing video question-answering (VidQA) benchmarks and datasets often exhibit a bias toward a single modality, despite the goal of requiring advanced reasoning skills that integrate diverse modalities to answer the queries. In this work, we introduce the modality importance score (MIS) to identify such bias. It is designed to assess which modality embeds the necessary information to answer the question. Additionally, we propose an innovative method using state-of-the-art MLLMs to estimate the modality importance, which can serve as a proxy for human judgments of modality perception. With this MIS, we demonstrate the presence of unimodal bias and the scarcity of genuinely multimodal questions in existing datasets. We further validate the modality importance score with multiple ablation studies to evaluate the performance of MLLMs on permuted feature sets. Our results indicate that current models do not effectively integrate information due to modality imbalance in existing datasets. Our proposed MLLM-derived MIS can guide the curation of modality-balanced datasets that advance multimodal learning and enhance MLLMs' capabilities to understand and utilize synergistic relations across modalities.

求助该文献

科研通智能强力驱动
Strongly Powered by AbleSci AI

我的文献求助列表浏览历史

一分钟了解求助规则 | 捐赠本站 | 历史今天

更新

⚡ 2026年影响因子、分区 已更新！ (2026-6-17)

更新

📰 新增『新锐期刊分区』 (2026-3-24)

更新

💬 新增更精细的自定义提醒设置 (2026-1-4)

新增

🕒 每天60秒读懂世界·精选全球要闻 (2026-1-2)

新增

PDF的下载单位、IP信息已删除 (2025-6-4)

科研通是完全免费的文献互助平台，具备全网最快的应助速度，最高的求助完成率。对每一个文献求助，科研通都将尽心尽力，给求助人一个满意的交代。

实时播报: 茉莉雨发布了新的文献求助10

刚刚; momo完成签到，获得积分10

刚刚; 搜集达人上传了应助文件

刚刚; 故里江添完成签到，获得积分10

1秒前; ASLYJS发布了新的文献求助10

1秒前; 儒雅谷云完成签到，获得积分10

1秒前; 刘鑫杰发布了新的文献求助10

2秒前; Taniiyn完成签到，获得积分10

2秒前; 思源上传了应助文件

2秒前; 科研通AI6.4的应助被jing采纳，获得10

2秒前; 搜集达人的应助被paddi采纳，获得10

3秒前; xy完成签到，获得积分10

3秒前; 852上传了应助文件

4秒前; Orange上传了应助文件

4秒前; freedom完成签到，获得积分20

4秒前; 乐乐发布了新的文献求助10

4秒前; 我是老大的应助被wtian采纳，获得10

4秒前; 谢霆锋完成签到，获得积分20

5秒前; 爆米花上传了应助文件

5秒前; hyhyhyhy发布了新的文献求助10

5秒前; 玛卡巴卡发布了新的文献求助10

5秒前; 淡定怀莲关注了科研通微信公众号

5秒前; 韭菜盒子完成签到，获得积分10

6秒前; 甘蓝型油菜完成签到，获得积分10

6秒前; 科目三的应助被赛特新思采纳，获得10

7秒前; Sherlock完成签到，获得积分10

7秒前; 婉莹发布了新的文献求助10

8秒前; 爆米花上传了应助文件

8秒前; 谢霆锋发布了新的文献求助10

8秒前; 李爱国上传了应助文件

8秒前; 好想毕业啊发布了新的文献求助30

8秒前; comic发布了新的文献求助10

8秒前; 李笑笑发布了新的文献求助10

9秒前; 共享精神上传了应助文件

10秒前; 我是老大的应助被baibai采纳，获得10

10秒前; xiadu发布了新的文献求助10

10秒前; www发布了新的文献求助10

10秒前; 柏柏上传了应助文件

11秒前; 脑洞疼的应助被玛卡巴卡采纳，获得10

11秒前; lizhian发布了新的文献求助10

11秒前

高分求助中: Principles of Economics, 11th Edition 10000; University Physics with Modern Physics, 16th edition 10000; (应助此贴封号)【重要！！请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000; Molecular Mechanisms of Photosynthesis, 4th Edition 1000; Organic Reactions, Volume 116 1000; Matrix Methods in Data Mining and Pattern Recognition 510; Social Skills Improvement System-Rating Scales--Chinese Version 500

热门求助领域（近24小时）

热门帖子: 关注科研通微信公众号，转发送积分 7255591; 求助须知：如何正确求助？哪些是违规求助？ 8877682; 关于积分的说明 18747986; 捐赠科研通 6935860; 什么是DOI，文献DOI怎么找？ 3200446; 关于科研通互助平台的介绍 2374923; 邀请新用户注册赠送积分活动 2175692

今日热心研友

注：热心度 = 本日应助数 + 本日被采纳获取积分÷10

Copyright © 2020-2026 AbleSci.COM, 科研通, All Right Reserved

科研通是非营利科研互助平台，不忘初心，为科研助力

本站互助的所有文件仅供个人学习研究用，禁止任何人把求助的所得文献进行盈利或传播

皖ICP备2024041134号-1

皖公网安备34019202002308

科研通【文献互助QQ群】：如果您有特殊求助，或发布求助超过24小时未得到应助，可加群求助，群号：821889395【点击一键加群】

科研通【志愿服务QQ群】：如果您热爱文献互助，有热心愿意为更多人服务，请加入小伙伴群，点击申请加入

关注微信服务号

科研通