计算机科学
模式
图像(数学)
主观性
秩(图论)
互联网
刺激形态
人工智能
关系(数据库)
模态(人机交互)
情报检索
感觉系统
认知心理学
心理学
数据挖掘
数学
万维网
组合数学
认识论
哲学
社会学
社会科学
作者
Xixi Nie,Bo Hu,Xinbo Gao,Leida Li,Xiaodan Zhang,Bin Xiao
标识
DOI:10.1145/3581783.3611996
摘要
Image aesthetic assessment (IAA) has drawn wide attention in recent years as more and more users post images and texts on the Internet to share their views. The intense subjectivity and complexity of IAA make it extremely challenging. Text triggers the subjective expression of human aesthetic experience based on human implicit memory, so incorporating the textual information and identifying the relationship with the image is of great importance for IAA. However, IAA with the image as input fails to fully consider subjectivity, while existing multimodal IAA ignores the interrelationship among modalities. To this end, we propose a brain-inspired multimodal interaction network (BMI-Net) that simulates how the association area of the cerebral cortex processes sensory stimuli. In particular, the knowledge integration LSTM (KI-LSTM) is proposed to learn the image-text interaction relation. The proposed scalable multimodal fusion (SMF) based on low-rank decomposition fuses image, text and interaction modalities to predict the aesthetic distribution. Extensive experiments show that the proposed BMI-Net outperforms existing state-of-the-art methods on three IAA tasks.
科研通智能强力驱动
Strongly Powered by AbleSci AI