作者
Jiyue Jiang,Yunke Li,Shiwei Cao,Yuheng Shan,Yuexing Liu,Tianyi Fei,Yule Yu,Yi Feng,Yu Li,Yi-Xue Li,Jiao Yuan
摘要
Abstract The widespread adoption of high-throughput sequencing technologies and multi-omics approaches has led to rapid accumulation of genomic, transcriptomic, proteomic, and even single-cell multimodal datasets, resulting in an exponential growth of biological data. The massive scale and inherent complexity of these datasets pose significant challenges for data management, analysis, and interpretation in the field of bioinformatics. Concurrently, artificial intelligence (AI) techniques, particularly deep learning and reinforcement learning, have achieved groundbreaking advances in medical diagnostics, drug discovery, and genomic analyses, providing novel theoretical tools and analytical paradigms for bioinformatics research. AI techniques are now extensively applied to DNA, RNA, and protein sequence prediction and design, 3D structural elucidation, functional annotation, integrative analysis of multi-omics data, and personalized drug design for precision medicine, significantly advancing biological research. This review systematically summarizes recent research progress and representative applications of AI techniques in bioinformatics, specifically discussing suitable scenarios and advantages of traditional machine learning algorithms, deep learning models, and reinforcement learning methods. We highlight AI’s transformative impact with quantitative metrics from landmark achievements: accurate near-atomic protein structure prediction (median 0.96 Å on CASP14), robust single-cell modeling (AvgBIO $\approx $ 0.82), high protein design success rates (up to 92%), and sensitive cancer detection (Area Under Curve (AUC) $\approx $ 0.93). Furthermore, the paper provides an in-depth analysis of the latest advancements of AI in specific tasks, including biomedical text mining, multimodal omics integration, and single-cell analyses, while highlighting current challenges such as data noise and sparsity, difficulties in modeling long biological sequences, complexities in multimodal data integration, insufficient model interpretability, and ethical and privacy concerns. Finally, the paper outlines promising future research directions, emphasizing large-scale data mining, cross-domain model generalization, innovations in drug design and personalized medicine, and advocates for establishing an open and collaborative research ecosystem.