scBaseCamp: An AI agent-curated, uniformly processed, and continually expanding single cell data repository

计算机科学
作者
Nicholas D Youngblut,Christopher Carpenter,Jaanak Prashar,Chiara Ricci-Tam,Rajesh Ilango,Noam Teyssier,Silvana Konermann,Patrick D. Hsu,Alexander Dobin,David P Burke,Hani Goodarzi,Yusuf Roohani
标识
DOI:10.1101/2025.02.27.640494
摘要

Building a virtual model of the cell is an emerging frontier at the intersection of artificial intelligence and biology, aided by the rapid growth of single-cell RNA sequencing data. By aggregating gene expression profiles from millions of cells across hundreds of studies, single cell atlases have provided a foundation for training AI-driven models of the cell. However, reliance on datasets with pre-processed counts limits the size and diversity of these repositories and constrains downstream model training to data curated for divergent purposes. This introduces analytical variability due to differences in the choice of alignment tools, genome references, and counting strategies. Here, we introduce scBaseCamp, a continuously updated single-cell RNA-seq database that leverages an AI agent-driven hierarchical workflow to automate discovery, metadata extraction, and standardized data processing. Built by directly mining and processing all publicly accessible 10X Genomics single-cell RNA sequencing reads, scBaseCamp is currently the largest public repository of single-cell data, comprising over 230 million cells spanning 21 organisms and 72 tissues. Using studies comprised of both single cell and single nucleus sequencing data, we demonstrate that uniform processing across datasets helps mitigate analytical artifacts introduced by inconsistent data processing choices. This standardized approach lays the groundwork for more accurate virtual cell models and serves as a foundation for a wide range of biological and biomedical applications.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
PDF的下载单位、IP信息已删除 (2025-6-4)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
Joanne完成签到 ,获得积分10
刚刚
研友_VZG7GZ应助下文献采纳,获得10
刚刚
1秒前
材料打工人完成签到 ,获得积分10
2秒前
Ava应助星陨采纳,获得10
2秒前
狂风暴雨完成签到,获得积分10
2秒前
鱼是乎发布了新的文献求助10
3秒前
狂野的清涟完成签到,获得积分10
3秒前
迷人长颈鹿完成签到,获得积分10
4秒前
Orange应助金磊采纳,获得10
4秒前
英姑应助现代绮玉采纳,获得10
4秒前
深情安青应助认真的艳采纳,获得10
4秒前
sdzylx7发布了新的文献求助10
4秒前
星辰大海应助Peter采纳,获得10
5秒前
6秒前
6秒前
吕佳完成签到 ,获得积分10
7秒前
高兴的又菡完成签到,获得积分10
7秒前
归海含烟发布了新的文献求助20
8秒前
酷波er应助无畏采纳,获得10
8秒前
量子星尘发布了新的文献求助10
8秒前
9秒前
9秒前
111111发布了新的文献求助10
9秒前
贪玩的书南完成签到,获得积分10
10秒前
棒棒冰发布了新的文献求助10
10秒前
10秒前
10秒前
开心完成签到,获得积分10
10秒前
JamesPei应助雪山飞龙采纳,获得10
11秒前
俊俊发布了新的文献求助10
11秒前
酷波er应助宗熙江采纳,获得10
12秒前
12秒前
12秒前
wl1700完成签到,获得积分10
12秒前
英勇雁芙完成签到,获得积分10
13秒前
Lucky发布了新的文献求助10
14秒前
14秒前
胖头鱼发布了新的文献求助10
14秒前
14秒前
高分求助中
(应助此贴封号)【重要!!请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000
An overview of orchard cover crop management 1000
二维材料在应力作用下的力学行为和层间耦合特性研究 600
Schifanoia : notizie dell'istituto di studi rinascimentali di Ferrara : 66/67, 1/2, 2024 470
Laboratory Animal Technician TRAINING MANUAL WORKBOOK 2012 edtion 400
Efficacy and safety of ciprofol versus propofol in hysteroscopy: a systematic review and meta-analysis 400
Progress and Regression 400
热门求助领域 (近24小时)
化学 医学 生物 材料科学 工程类 有机化学 内科学 生物化学 物理 计算机科学 纳米技术 遗传学 基因 复合材料 化学工程 物理化学 病理 催化作用 免疫学 量子力学
热门帖子
关注 科研通微信公众号,转发送积分 4831065
求助须知:如何正确求助?哪些是违规求助? 4136309
关于积分的说明 12802342
捐赠科研通 3878716
什么是DOI,文献DOI怎么找? 2133423
邀请新用户注册赠送积分活动 1153674
关于科研通互助平台的介绍 1052009