自动汇总
计算机科学
语义学(计算机科学)
主题(文档)
自然语言处理
语义匹配
人工智能
匹配(统计)
嵌入
情报检索
新颖性
语义相似性
可用性
光学(聚焦)
钥匙(锁)
机器学习
分布语义学
多模式学习
文字嵌入
接头(建筑物)
多通道交互
利用
作者
Xujian Zhao,C. Q. Deng,Peiquan Jin
标识
DOI:10.1109/tkde.2025.3610544
摘要
Multimodal Summarization aims to use multimodal data to generate accurate and concise summaries for long sentences. While previous work has achieved promising success, they have overlooked the mismatching among multimodal semantics and lacked subject information guidance for adaptive referential images. Motivated by this observation, we propose ASSM, an Adaptive Subject-focused modeling for multimodal summarization via Semantic Matching. The novelty of ASSM lies in two aspects. First, we propose a multimodal semantic matching module that projects multimodal inputs into a shared joint embedding semantic space to determine whether the semantics between multimodalities are mismatching. Second, we propose an adaptive subject-focused guide module, which adaptively references images to learn subject tokens based on the multimodal semantic matching results. With these subject tokens, we are able to focus on the subject information, providing precise guidance for summary generation. We conduct extensive experiments on two standard benchmarks and compare ASSM with 17 existing models. The experimental results regarding ROUGE, BERTScore, and MoverScore show that the proposed ASSM model outperforms all competitors, achieving state-of-the-art performance and suggesting the effectiveness of our proposal. In addition, we provide a case study to further demonstrate the usability of ASSM.
科研通智能强力驱动
Strongly Powered by AbleSci AI