作者
Xikai Yang,Xilin Dang,Jinyue Cai,Jinpeng Li,Xi Wang,Pheng‐Ann Heng
摘要
BACKGROUND: As one of the most prevalent neurodegenerative disorders, Alzheimer's disease (AD) severely impacts human thinking and behavior. Early and accurate prediction of cognitive decline is crucial for timely AD intervention. However, most existing prognostic methods hardly explore the underlying association among longitudinal data from different modalities in disease progression, thus the predictive ability of current models is still quite limited. PURPOSE: We propose the unifying Multi-Modality fusion with DUal-gRanularity Alignment framework (MM-DURA) to simultaneously model longitudinal correlations and modalities interactions for cognitive assessment forecasting. Our proposed framework leverages temporal MRI scans, time-aligned clinical diagnostics, and genomic data as inputs to forecast multiple cognitive assessment scores. METHODS: We propose a novel coarse-to-fine feature representation learning approach to ascertain the congruence between modalities at both the subject and visit granularities. This method ensures the alignment of multimodal data pertaining to individual subjects and captures the temporal progression of these modalities. Additionally, we design a hierarchical multimodality fusion (HMF) block that can effectively exploit the interrelationships and dependencies among modalities. Lastly, we employ an LSTM-based regression head with the fused multimodality embedding as input to forecast the future status of cognitive ability. RESULTS: We validate our method on the public Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset and investigate the optimal hierarchical structure for modality fusion. The whole dataset includes 707 subjects participating in the ADNI1, ADNIGO, and ADNI2 studies. All subjects underwent longitudinal examinations with an average study period of approximately 14 months. The subject-level split for training, validation, and testing sets is 0.75:0.05:0.20. The proposed MM-DURA framework demonstrates superior performance, achieving remarkable RMSE values of 1.099 for CDRSB, 5.601 for ADAS-Cog, 2.051 for MMSE, 6.504 for RAVLT, and 3.447 for FAQ cognitive assessments forecasting. These results outperform all six comparison methods, including two state-of-the-art multimodal temporal modeling approaches. Comprehensive ablation experimental results affirm the effectiveness of longitudinal modeling with temporal-multimodal alignment, highlighting its clinical potential for cognitive assessment prediction. Visualizations of key brain regions and SNP significance analysis also provide substantial interpretability. CONCLUSIONS: In this work, we proposed a novel framework that unifies multimodality fusion with dual-granularity alignment for cognitive assessment forecasting. Our approach utilizes a temporal-multimodal consistency alignment strategy, which effectively synchronizes various modalities within a unified latent space. Furthermore, the innovative HMF block we developed capitalizes on the inherent relationships and dependencies between modalities to optimize data integration. Extensive numerical results on five cognitive assessment scores, supported by detailed visualizations demonstrate the superior performance of our approach compared to existing methods. Our code has been released, and it is available at https://github.com/IcecreamArtist/MM_DURA.