亲爱的研友该休息了!由于当前在线用户较少,发布求助请尽量完整的填写文献信息,科研通机器人24小时在线,伴您度过漫漫科研夜!身体可是革命的本钱,早点休息,好梦!

A Causality-Aware Paradigm for Evaluating Creativity of Multimodal Large Language Models

创造力 计算机科学 因果关系(物理学) 人工智能 自然语言处理 机器学习 认知科学 人机交互 心理学 社会心理学 物理 量子力学
作者
Zhongzhan Huang,Shanshan Zhong,Pan Zhou,Shanghua Gao,Marinka Žitnik,Liang Lin
出处
期刊:IEEE Transactions on Pattern Analysis and Machine Intelligence [IEEE Computer Society]
卷期号:: 1-17
标识
DOI:10.1109/tpami.2025.3539433
摘要

Recently, numerous benchmarks have been developed to evaluate the logical reasoning abilities of large language models (LLMs). However, assessing the equally important creative capabilities of LLMs is challenging due to the subjective, diverse, and data-scarce nature of creativity, especially in multimodal scenarios. In this paper, we consider the comprehensive pipeline for evaluating the creativity of multimodal LLMs, with a focus on suitable evaluation platforms and methodologies. First, we find the Oogiri game-a creativity-driven task requiring humor, associative thinking, and the ability to produce unexpected responses to text, images, or both. This game aligns well with the input-output structure of modern multimodal LLMs and benefits from a rich repository of high-quality, human-annotated creative responses, making it an ideal platform for studying LLM creativity. Next, beyond using the Oogiri game for standard evaluations like ranking and selection, we propose LoTbench, an interactive, causality-aware evaluation framework, to further address some intrinsic risks in standard evaluations, such as information leakage and limited interpretability. The proposed LoTbench not only quantifies LLM creativity more effectively but also visualizes the underlying creative thought processes. Our results show that while most LLMs exhibit constrained creativity, the performance gap between LLMs and humans is not insurmountable. Furthermore, we observe a strong correlation between results from the multimodal cognition benchmark MMMU and LoTbench, but only a weak connection with traditional creativity metrics. This suggests that LoTbench better aligns with human cognitive theories, highlighting cognition as a critical foundation in the early stages of creativity and enabling the bridging of diverse concepts.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
3秒前
讷讷呐啊完成签到,获得积分10
5秒前
11秒前
星辰大海应助llll采纳,获得10
14秒前
SciGPT应助科研通管家采纳,获得10
23秒前
阳光发布了新的文献求助10
34秒前
scm应助阳光采纳,获得10
55秒前
牧沛凝完成签到 ,获得积分10
1分钟前
1分钟前
andrele发布了新的文献求助10
1分钟前
wanci应助有魅力的半蕾采纳,获得10
1分钟前
1分钟前
桐桐应助冷傲路灯采纳,获得30
1分钟前
围城完成签到 ,获得积分10
2分钟前
雪白元风完成签到 ,获得积分10
2分钟前
2分钟前
zhuuuuuuu完成签到,获得积分10
2分钟前
隐形曼青应助zhuuuuuuu采纳,获得10
2分钟前
leslie完成签到 ,获得积分10
2分钟前
3分钟前
沉醉的中国钵完成签到,获得积分10
3分钟前
3分钟前
zhuuuuuuu发布了新的文献求助10
3分钟前
wykion完成签到,获得积分0
3分钟前
阿怪12333完成签到 ,获得积分10
4分钟前
YifanWang应助科研通管家采纳,获得10
4分钟前
YifanWang应助科研通管家采纳,获得20
4分钟前
YifanWang应助科研通管家采纳,获得20
4分钟前
YifanWang应助科研通管家采纳,获得20
4分钟前
Eatanicecube完成签到,获得积分10
4分钟前
4分钟前
情怀应助yangjoy采纳,获得10
5分钟前
星辰大海应助llsssyy采纳,获得10
5分钟前
Forizix完成签到,获得积分10
6分钟前
乐乐应助Forizix采纳,获得30
6分钟前
6分钟前
6分钟前
Forizix发布了新的文献求助30
6分钟前
YifanWang应助科研通管家采纳,获得10
6分钟前
6分钟前
高分求助中
Thinking Small and Large 500
Algorithmic Mathematics in Machine Learning 500
Mapping the Stars: Celebrity, Metonymy, and the Networked Politics of Identity 400
Getting Published in SSCI Journals: 200+ Questions and Answers for Absolute Beginners 300
Engineering the boosting of the magnetic Purcell factor with a composite structure based on nanodisk and ring resonators 240
Cleaning Technology in Semiconductor Device Manufacturing: Proceedings of the Sixth International Symposium (Advances in Soil Science) 200
Study of enhancing employee engagement at workplace by adopting internet of things 200
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 物理 生物化学 纳米技术 计算机科学 化学工程 内科学 复合材料 物理化学 电极 遗传学 量子力学 基因 冶金 催化作用
热门帖子
关注 科研通微信公众号,转发送积分 3837395
求助须知:如何正确求助?哪些是违规求助? 3379544
关于积分的说明 10509877
捐赠科研通 3099190
什么是DOI,文献DOI怎么找? 1706976
邀请新用户注册赠送积分活动 821348
科研通“疑难数据库(出版商)”最低求助积分说明 772552