特征选择
水准点(测量)
计算机科学
特征(语言学)
选择(遗传算法)
数据集成
数据挖掘
最小冗余特征选择
人工智能
语言学
哲学
大地测量学
地理
作者
Luke Zappia,Sabrina Richter,Ciro Ramírez-Suástegui,Raphael Kfuri-Rubens,Larsen Vornholz,W. Wang,Oliver Dietrich,Amit Frishberg,Malte D. Luecken,Fabian J. Theis
标识
DOI:10.1038/s41592-025-02624-3
摘要
Abstract The availability of single-cell transcriptomics has allowed the construction of reference cell atlases, but their usefulness depends on the quality of dataset integration and the ability to map new samples. Previous benchmarks have compared integration methods and suggest that feature selection improves performance but have not explored how best to select features. Here, we benchmark feature selection methods for single-cell RNA sequencing integration using metrics beyond batch correction and preservation of biological variation to assess query mapping, label transfer and the detection of unseen populations. We reinforce common practice by showing that highly variable feature selection is effective for producing high-quality integrations and provide further guidance on the effect of the number of features selected, batch-aware feature selection, lineage-specific feature selection and integration and the interaction between feature selection and integration models. These results are informative for analysts working on large-scale tissue atlases, using atlases or integrating their own data to tackle specific biological questions.
科研通智能强力驱动
Strongly Powered by AbleSci AI