计算机科学
融合
建筑
人工智能
图像融合
传感器融合
模式识别(心理学)
机器学习
计算机视觉
图像(数学)
语言学
哲学
艺术
视觉艺术
作者
Kaifang Long,Guoyang Xie,Lianbo Ma,Qing Li,Min Huang,Jianhui Lv,Zhichao Lu
标识
DOI:10.1109/tip.2025.3599673
摘要
The design of effective multimodal feature fusion strategies is the key task for multimodal learning, which often requires huge computational costs with extensive expertise. In this paper, we seek to enhance multimodal learning via hierarchical fusion architecture search with inconsistency mitigation. Different from previous works, our Hierarchical Fusion Multimodal Neural Architecture Search (HF-MNAS) considers the inconsistency in modalities and labels, and fine-grained exploitation in multi-level fusion architectures. Specifically, it disentangles the hierarchical fusion problem into two-level (macro- and micro-level) search spaces. In the macro-level search space, the high-level and low-level features are extracted and then connected in a fine-grained way, where the inconsistency mitigation module is designed to minimize discrepancies between modalities and labels in cell outputs. In the micro-level search space, we find that different intermediate nodes in the cells exhibit different importance degrees. Then, we propose an importance-based node selection mechanism to form the optimal cells for feature fusion. We evaluate HF-MNAS on a series of multimodal classification tasks. Empirical evidence shows that HF-MNAS achieves competitive trade-off performance across accuracy, search time, and inference speed. In particular, HF-MNAS consumes minimal computational cost compared with state-of-the-art MNASs. Furthermore, we theoretically and experimentally verify that the modality-label inconsistency deteriorates the overall fusion performance of models such as accuracy and F1 score, and demonstrate that the proposed inconsistency mitigation module could effectively mitigate this phenomenon.
科研通智能强力驱动
Strongly Powered by AbleSci AI