子空间拓扑
计算机科学
人工智能
模式识别(心理学)
班级(哲学)
代表(政治)
星团(航天器)
政治学
政治
程序设计语言
法学
作者
Bangyi Zhang,Yiping Zuo,Zhiqiang Dai,Song Zhu,Xuan Liu,Zhaohong Deng
标识
DOI:10.1109/jbhi.2025.3537284
摘要
mRNA subcellular localization is a prevalent and essential mechanism that precisely regulates protein translation and significantly impacts various cellular processes. mRNA subcellular localization has advanced the understanding of mRNA function, yet existing methods face limitations, including imbalanced data, suboptimal model performance, and inadequate generalization, particularly in multi-label localization scenarios where solutions are scarce. This study introduces MBSCLoc, a predictor for mRNA multi-label subcellular localization. MBSCLoc predicts mRNA locations across multiple cellular compartments simultaneously, overcoming challenges like single-location prediction, incomplete feature extraction, and imbalanced data. MBSCLoc leverages UTR-LM model for feature extraction, followed by multi-class contrastive representation learning and Clustering Balanced Subspace Partitioning to construct balanced subspaces. It then optimizes sample distribution to tackle severe data imbalance and uses multiple XGBoost classifiers, integrated through voting, to enhance accuracy and generalization. Five-fold cross-validation and independent testing results show that MBSCLoc significantly outperforms other methods. Additionally, MBSCLoc offers superior pixel-level interpretability, strongly supporting mRNA multi-label subcellular localization research. Crucially, the importance of the 5' UTR and 3' UTR regions has been preliminarily confirmed using traditional biological analysis and Tree-SHAP, with most mRNA sequences showing significant relevance in these regions, especially the 3' UTR where about 80% of specific sites reach peak significance. Concurrently, in order to facilitate the use of MBSCLoc by researchers, a freely accessible web has also been developed: http://www.mbscloc.com/.
科研通智能强力驱动
Strongly Powered by AbleSci AI