Two-Stage Dynamic Fusion Framework for Multimodal Classification Tasks
计算机科学
融合
阶段(地层学)
人工智能
机器学习
数据挖掘
古生物学
哲学
语言学
生物
作者
Shoumeng Ge,Ying Chen
出处
期刊:Informs Journal on Computing日期:2025-05-29
标识
DOI:10.1287/ijoc.2023.0448
摘要
Multimodal learning has provided an opportunity to better analyze a system or phenomenon. Numerous classification studies have developed advanced dynamic fusion methods to fuse information from different modalities. However, few works have considered a reliable design of dynamic fusion methods based on theoretical insights. In this context, we address the research gaps as follows. From a theoretical perspective, we initially establish the performance range for the accuracy of a multimodal classifier. Subsequently, we derive a condition based on the upper limit of the range to indicate how to improve the accuracy of the model. From a technical perspective, we propose a two-stage dynamic fusion framework according to this condition. In the first stage, we design an uncertainty-aware dynamic fusion method. In the second stage, we propose a regression-based method to adaptively generate the learned fusion weight for each modality. In the experiment, we use seven existing models for comparisons and exploit four public data sets to examine the effectiveness of the two-stage framework. The results indicate that our proposed framework generally outperforms existing methods in terms of accuracy and robustness. Additionally, we conduct a comprehensive discussion from several aspects to further illustrate the merits of the proposed framework. History: Accepted by Ram Ramesh, Area Editor for Data Science and Machine Learning. Funding: This study was supported by the China National Key R&D Program [Grant 2022YFB3305500], the National Natural Science Foundation of China [Grants 72121001, 72101066, and 72131005], the Heilongjiang Natural Science Excellent Youth Fund [Grant YQ2022G004], and the Key Research and Development Projects of Heilongjiang Province [Grant JD22A003]. Supplemental Material: The software that supports the findings of this study is available within the paper and its Supplemental Information ( https://pubsonline.informs.org/doi/suppl/10.1287/ijoc.2023.0448 ) as well as from the IJOC GitHub software repository ( https://github.com/INFORMSJoC/2023.0448 ). The complete IJOC Software and Data Repository is available at https://informsjoc.github.io/ .