Two-Stage Dynamic Fusion Framework for Multimodal Classification Tasks

计算机科学融合阶段（地层学）人工智能机器学习数据挖掘语言学生物哲学古生物学

作者

Shoumeng Ge,Ying Chen

出处

期刊：Informs Journal on Computing 日期：2025-05-29

标识

DOI：10.1287/ijoc.2023.0448

摘要

Multimodal learning has provided an opportunity to better analyze a system or phenomenon. Numerous classification studies have developed advanced dynamic fusion methods to fuse information from different modalities. However, few works have considered a reliable design of dynamic fusion methods based on theoretical insights. In this context, we address the research gaps as follows. From a theoretical perspective, we initially establish the performance range for the accuracy of a multimodal classifier. Subsequently, we derive a condition based on the upper limit of the range to indicate how to improve the accuracy of the model. From a technical perspective, we propose a two-stage dynamic fusion framework according to this condition. In the first stage, we design an uncertainty-aware dynamic fusion method. In the second stage, we propose a regression-based method to adaptively generate the learned fusion weight for each modality. In the experiment, we use seven existing models for comparisons and exploit four public data sets to examine the effectiveness of the two-stage framework. The results indicate that our proposed framework generally outperforms existing methods in terms of accuracy and robustness. Additionally, we conduct a comprehensive discussion from several aspects to further illustrate the merits of the proposed framework. History: Accepted by Ram Ramesh, Area Editor for Data Science and Machine Learning. Funding: This study was supported by the China National Key R&D Program [Grant 2022YFB3305500], the National Natural Science Foundation of China [Grants 72121001, 72101066, and 72131005], the Heilongjiang Natural Science Excellent Youth Fund [Grant YQ2022G004], and the Key Research and Development Projects of Heilongjiang Province [Grant JD22A003]. Supplemental Material: The software that supports the findings of this study is available within the paper and its Supplemental Information ( https://pubsonline.informs.org/doi/suppl/10.1287/ijoc.2023.0448 ) as well as from the IJOC GitHub software repository ( https://github.com/INFORMSJoC/2023.0448 ). The complete IJOC Software and Data Repository is available at https://informsjoc.github.io/ .

求助该文献

最长约 10秒，即可获得该文献文件

Two-Stage Dynamic Fusion Framework for Multimodal Classification Tasks

今日热心研友