Linear-regression-based algorithms can succeed at identifying microbial functional groups despite the nonlinearity of ecological function

功能(生物学) 计算机科学 回归 任务(项目管理) 机器学习 钥匙(锁) 人工智能 生态学 数学 统计 生物 工程类 进化生物学 计算机安全 系统工程
作者
Yuanchen Zhao,Otto X. Cordero,Mikhail Tikhonov
标识
DOI:10.1101/2024.01.21.576558
摘要

Abstract Microbial communities play key roles across diverse environments. Predicting their function and dynamics is a key goal of microbial ecology, but detailed microscopic descriptions of these systems can be prohibitively complex. One approach to deal with this complexity is to resort to coarser representations. Several approaches have sought to identify useful groupings of microbial species in a data-driven way. Of these, recent work has claimed some empirical success at de novo discovery of coarse representations predictive of a given function using methods as simple as a linear regression, against multiple groups of species or even a single such group (the EQO approach of Shan et al . [25]). This success seems puzzling, since modeling community function as a linear combination of contributions of individual species appears simplistic. However, the task of identifying a predictive coarsening of an ecosystem is distinct from the task of predicting the function well, and it is conceivable that the former could be accomplished by a simpler methodology than the latter. Here, we use the resource competition framework to design a model where the “correct” grouping to be discovered is well-defined, and use synthetic data to evaluate and compare three regression-based methods, namely, two proposed previously and one we introduce. We find that regression-based methods can recover the groupings even when the function is manifestly nonlinear; that multi-group methods offer an advantage over a single-group EQO; and crucially, that simpler (linear) methods can outperform more complex ones. Author summary Natural microbial communities are highly complex, making predictive modeling difficult. One appealing approach is to make their description less detailed, rendering modeling more tractable while hopefully still retaining some predictive power. The Tree of Life naturally provides one possible method for building coarser descriptions (instead of thousands of strains, we could think about hundreds of species; or dozens of families). However, it is known that useful descriptions need not be taxonomically coherent, as illustrated, for example, by the so-called functional guilds. This prompted the development of computational methods seeking to propose candidate groupings in a data-driven manner. In this computational study, we examine one class of such methods, recently proposed in the microbial context. Quantitatively testing their performance can be difficult, as the answer they “should” recover is often unknown. Here, we overcome this difficulty by testing these methods on synthetic data from a model where the ground truth is known by construction. Curiously, we demonstrate that simpler approaches, rather than suffering from this simplicity, can in fact be more robust.

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
tzy完成签到,获得积分10
1秒前
肥宅小周应助科研采纳,获得10
3秒前
5秒前
坦率翠霜完成签到 ,获得积分10
5秒前
潇洒的宛菡完成签到,获得积分10
5秒前
zyznh完成签到 ,获得积分10
6秒前
张.完成签到 ,获得积分10
6秒前
pitto完成签到,获得积分10
10秒前
10秒前
hkh发布了新的文献求助10
11秒前
嗳7完成签到 ,获得积分10
13秒前
15秒前
昂叔的头发丝儿完成签到,获得积分10
16秒前
17秒前
爆米花完成签到,获得积分10
18秒前
18秒前
yy完成签到,获得积分20
19秒前
小陈医师完成签到,获得积分10
20秒前
几号大家好完成签到,获得积分10
21秒前
同玉完成签到,获得积分10
22秒前
天真琳发布了新的文献求助10
22秒前
苹果颖完成签到,获得积分10
24秒前
cc关闭了cc文献求助
24秒前
dropwater完成签到,获得积分10
25秒前
Yc完成签到 ,获得积分10
29秒前
hkh发布了新的文献求助10
30秒前
hu完成签到,获得积分10
31秒前
Ch完成签到 ,获得积分10
31秒前
libling完成签到,获得积分10
34秒前
34秒前
陈预立完成签到,获得积分10
35秒前
小棉背心完成签到 ,获得积分10
35秒前
小二郎应助小猪佩奇采纳,获得10
35秒前
俊逸的若剑完成签到 ,获得积分10
37秒前
谢富杰发布了新的文献求助10
39秒前
didoo完成签到,获得积分10
40秒前
如意的馒头完成签到 ,获得积分10
40秒前
42秒前
个性的依风完成签到,获得积分10
42秒前
玩命的平蓝完成签到 ,获得积分10
43秒前
高分求助中
【此为提示信息,请勿应助】请按要求发布求助,避免被关 20000
ISCN 2024 – An International System for Human Cytogenomic Nomenclature (2024) 3000
Continuum Thermodynamics and Material Modelling 2000
Encyclopedia of Geology (2nd Edition) 2000
105th Edition CRC Handbook of Chemistry and Physics 1600
Maneuvering of a Damaged Navy Combatant 650
Fashion Brand Visual Design Strategy Based on Value Co-creation 350
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 物理 生物化学 纳米技术 计算机科学 化学工程 内科学 复合材料 物理化学 电极 遗传学 量子力学 基因 冶金 催化作用
热门帖子
关注 科研通微信公众号,转发送积分 3777801
求助须知:如何正确求助?哪些是违规求助? 3323321
关于积分的说明 10213817
捐赠科研通 3038554
什么是DOI,文献DOI怎么找? 1667549
邀请新用户注册赠送积分活动 798161
科研通“疑难数据库(出版商)”最低求助积分说明 758275