亲爱的研友该休息了!由于当前在线用户较少,发布求助请尽量完整地填写文献信息,科研通机器人24小时在线,伴您度过漫漫科研夜!身体可是革命的本钱,早点休息,好梦!

Predicting gene sequences with AI to study codon usage patterns

密码子使用偏好性 起始密码子 基因 遗传学 计算生物学 计算机科学 人工智能 生物 基因组 基序列
作者
Tomer Sidi,Shir Bahiri-Elitzur,Tamir Tuller,Rachel Kolodny
标识
DOI:10.1101/2024.02.11.579798
摘要

Abstract Selective pressure acts on the codon use, optimizing multiple, overlapping signals that are only partially understood. We trained artificial intelligence (AI) models to predict the codons given their amino acid sequence in the eukaryotes Saccharomyces cerevisiae and Schizosaccharomyces pombe and the bacteria Escherichia coli and Bacillus subtilis , to study the extent to which we can learn patterns in naturally occurring codons to improve predictions. We trained our models on a subset of the proteins, and evaluated their predictions on large, separate sets of proteins of varying lengths and expression levels. Our models significantly outperformed naïve frequency-based approaches, demonstrating that there are dependencies between codons that can be learned to better predict evolutionary-selected codon usage. The prediction accuracy advantage of our models is greater for highly expressed genes and it is greater in bacteria than eukaryotes, supporting the hypothesis that there is a monotonic relationship between selective pressure for complex codon patterns and effective population size. Also, in S . cerevisiae and bacteria, our models were more accurate for longer proteins, suggesting that the AI system may have learned patterns related to co-translational folding. Gene functionality and conservation were also important determinants that affect the performance of our models. Finally, we showed that using information encoded in homologous proteins has only a minor effect on prediction accuracy, perhaps due to complex codon-usage codes in genes undergoing rapid evolution. In summary, our study employing contemporary AI methods offers a new perspective on codon usage patterns and a novel tool to optimize codon usage in endogenous and heterologous proteins. Significance statement Can one predict codon sequences used by an organism to encode a given amino acid sequence? This is difficult, because there are exponentially many codon sequences that can encode the same amino acid sequence and evolution is stochastic. Indeed, codons frequencies vary, a phenomenon known as codon-bias, yet we improve upon frequency-based predictions using contemporary AI tools that learn complex patterns and capture interactions between codons. Because our predictions are tested fairly, on cases not seen during the training process, accurate predictions suggest that these learned patterns are not random, and may be related to the evolutionary process. Thus, studying where our predictions are more accurate, is expected to reveal novel insights related to the way evolution shapes coding regions.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
4秒前
英俊的铭应助混子玉采纳,获得10
7秒前
脑洞疼应助灰姑娘采纳,获得10
8秒前
FFFFcom发布了新的文献求助10
8秒前
Catherine完成签到,获得积分10
8秒前
犹豫麦片发布了新的文献求助10
9秒前
量子星尘发布了新的文献求助10
18秒前
19秒前
天天快乐应助XWX采纳,获得10
21秒前
混子玉发布了新的文献求助10
26秒前
打打应助六碗鱼采纳,获得10
34秒前
49秒前
袁青寒完成签到,获得积分10
50秒前
丘比特应助犹豫麦片采纳,获得10
56秒前
美有姬发布了新的文献求助10
56秒前
1分钟前
犹豫麦片发布了新的文献求助10
1分钟前
白苏完成签到,获得积分10
1分钟前
1分钟前
Lucas应助科研通管家采纳,获得10
1分钟前
曹兆发布了新的文献求助10
1分钟前
1分钟前
1分钟前
1分钟前
犹豫麦片完成签到,获得积分20
1分钟前
云微颖发布了新的文献求助10
1分钟前
1分钟前
Esther发布了新的文献求助10
1分钟前
1分钟前
Xu思語完成签到 ,获得积分10
1分钟前
大个应助Ava采纳,获得10
1分钟前
徐biao发布了新的文献求助10
1分钟前
1分钟前
六碗鱼发布了新的文献求助10
1分钟前
李爱国应助纯恨PPT采纳,获得10
1分钟前
2分钟前
云微颖完成签到,获得积分10
2分钟前
boning完成签到 ,获得积分10
2分钟前
XWX发布了新的文献求助10
2分钟前
慕青应助六碗鱼采纳,获得10
2分钟前
高分求助中
(应助此贴封号)【重要!!请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000
Bioseparations Science and Engineering Third Edition 1000
Lloyd's Register of Shipping's Approach to the Control of Incidents of Brittle Fracture in Ship Structures 1000
BRITTLE FRACTURE IN WELDED SHIPS 1000
Entre Praga y Madrid: los contactos checoslovaco-españoles (1948-1977) 1000
Encyclopedia of Materials: Plastics and Polymers 800
Signals, Systems, and Signal Processing 610
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 纳米技术 有机化学 物理 生物化学 化学工程 计算机科学 复合材料 内科学 催化作用 光电子学 物理化学 电极 冶金 遗传学 细胞生物学
热门帖子
关注 科研通微信公众号,转发送积分 6110360
求助须知:如何正确求助?哪些是违规求助? 7938927
关于积分的说明 16454131
捐赠科研通 5236032
什么是DOI,文献DOI怎么找? 2797918
邀请新用户注册赠送积分活动 1779889
关于科研通互助平台的介绍 1652398