人工智能
计算机科学
模式识别(心理学)
计算机视觉
作者
Anshul Sharma,Utkarsh Varman,Vandana Bharti,Abhinav Kumar,Amit Kumar Singh,Sanjay Kumar Singh
标识
DOI:10.1109/jbhi.2025.3561024
摘要
Accurate and interpretable AI models play a critical role in medical image analysis. However, despite advancements in explainable AI (XAI), existing methods struggle with inconsistent interpretability. To overcome this limitation, we introduce the Integrated Latent and Attention Mapping (ILAM) framework, which enhances both classification accuracy and explainability by fusing local and global feature representations. ILAM integrates a custom-designed Autoencoder (AE) with a Vision Transformer (ViT), where the AE learns fine-grained local features through unsupervised patchwise image reconstruction in the latent space. These local features are then fused with global representations extracted by ViT, creating a hybrid model that improves both performance and post hoc interpretability. To refine explainability, ILAM incorporates a modified attention rollout mechanism, which recursively aggregates latent feature representations and attention weights to produce precise and stable activation maps. We evaluate ILAM on three publicly available medical imaging datasets-BreakHis, Chest X-Ray, and Retinal, demonstrating its superior performance over transformer-based models such as ViT, DeiT, CvT, and SwinT. ILAM consistently generates detailed and reliable activation maps, providing clearer visualizations of critical image regions influencing model decisions. By effectively combining local and global feature fusion, ILAM establishes itself as a robust and interpretable framework for medical image classification.
科研通智能强力驱动
Strongly Powered by AbleSci AI