人口
计算机科学
核糖核酸
计算生物学
人工智能
匹配(统计)
数据挖掘
生物
数学
基因
遗传学
统计
社会学
人口学
作者
Laleh Haghverdi,Aaron T. L. Lun,Michael D. Morgan,John C. Marioni
摘要
Large-scale single-cell RNA sequencing (scRNA-seq) data sets that are produced in different laboratories and at different times contain batch effects that may compromise the integration and interpretation of the data. Existing scRNA-seq analysis methods incorrectly assume that the composition of cell populations is either known or identical across batches. We present a strategy for batch correction based on the detection of mutual nearest neighbors (MNNs) in the high-dimensional expression space. Our approach does not rely on predefined or equal population compositions across batches; instead, it requires only that a subset of the population be shared between batches. We demonstrate the superiority of our approach compared with existing methods by using both simulated and real scRNA-seq data sets. Using multiple droplet-based scRNA-seq data sets, we demonstrate that our MNN batch-effect-correction method can be scaled to large numbers of cells.
科研通智能强力驱动
Strongly Powered by AbleSci AI