Predicting gene expression levels from DNA sequences and post-transcriptional information with transformers

计算机科学 变压器 抄写(语言学) 计算生物学 嵌入 人工智能 机器学习 数据挖掘 生物 工程类 语言学 电气工程 哲学 电压
作者
Vittorio Pipoli,Mattia Cappelli,A. Palladini,Carlo Peluso,Marta Lovino,Elisa Ficarra
出处
期刊:Computer Methods and Programs in Biomedicine [Elsevier]
卷期号:225: 107035-107035 被引量:4
标识
DOI:10.1016/j.cmpb.2022.107035
摘要

In the latest years, the prediction of gene expression levels has been crucial due to its potential applications in the clinics. In this context, Xpresso and others methods based on Convolutional Neural Networks and Transformers were firstly proposed to this aim. However, all these methods embed data with a standard one-hot encoding algorithm, resulting in impressively sparse matrices. In addition, post-transcriptional regulation processes, which are of uttermost importance in the gene expression process, are not considered in the model.This paper presents Transformer DeepLncLoc, a novel method to predict the abundance of the mRNA (i.e., gene expression levels) by processing gene promoter sequences, managing the problem as a regression task. The model exploits a transformer-based architecture, introducing the DeepLncLoc method to perform the data embedding. Since DeepLncloc is based on word2vec algorithm, it avoids the sparse matrices problem.Post-transcriptional information related to mRNA stability and transcription factors is included in the model, leading to significantly improved performances compared to the state-of-the-art works. Transformer DeepLncLoc reached 0.76 of R2 evaluation metric compared to 0.74 of Xpresso.The Multi-Headed Attention mechanisms which characterizes the transformer methodology is suitable for modeling the interactions between DNA's locations, overcoming the recurrent models. Finally, the integration of the transcription factors data in the pipeline leads to impressive gains in predictive power.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
大幅提高文件上传限制,最高150M (2024-4-1)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
刚刚
斯文败类应助科研通管家采纳,获得10
1秒前
上官若男应助科研通管家采纳,获得10
1秒前
打打应助科研通管家采纳,获得10
1秒前
大个应助科研通管家采纳,获得10
1秒前
2秒前
zl完成签到,获得积分10
2秒前
JackyZ发布了新的文献求助10
3秒前
科研通AI2S应助SADFSADF采纳,获得10
4秒前
hushidi发布了新的文献求助10
4秒前
qingfeng完成签到,获得积分10
6秒前
早日暴富应助asdad采纳,获得10
7秒前
科研通AI2S应助hushidi采纳,获得10
8秒前
CodeCraft应助hushidi采纳,获得10
8秒前
ahfjk发布了新的文献求助10
8秒前
小檗碱完成签到 ,获得积分10
9秒前
hushidi完成签到,获得积分10
14秒前
温柔共振完成签到,获得积分20
17秒前
19秒前
在水一方应助ll采纳,获得10
20秒前
Wd完成签到,获得积分10
21秒前
自然1111完成签到,获得积分10
22秒前
温柔共振发布了新的文献求助10
22秒前
星空不设限完成签到 ,获得积分10
22秒前
25秒前
轩儿完成签到,获得积分10
25秒前
26秒前
26秒前
huangtao发布了新的文献求助10
29秒前
ahfjk完成签到,获得积分10
31秒前
田様应助看文献搞科研采纳,获得10
31秒前
bio-tang完成签到,获得积分10
31秒前
KK完成签到 ,获得积分10
32秒前
33秒前
吱吱吱吱发布了新的文献求助10
34秒前
34秒前
Ren完成签到,获得积分10
35秒前
神奇的大蛇丸完成签到,获得积分10
35秒前
长江完成签到,获得积分10
36秒前
huangtao完成签到,获得积分20
36秒前
高分求助中
Manual of Clinical Microbiology, 4 Volume Set (ASM Books) 13th Edition 1000
Cross-Cultural Psychology: Critical Thinking and Contemporary Applications (8th edition) 800
Counseling With Immigrants, Refugees, and Their Families From Social Justice Perspectives pages 800
マンネンタケ科植物由来メロテルペノイド類の網羅的全合成/Collective Synthesis of Meroterpenoids Derived from Ganoderma Family 500
Electrochemistry 500
[Lambert-Eaton syndrome without calcium channel autoantibodies] 400
Statistical Procedures for the Medical Device Industry 400
热门求助领域 (近24小时)
化学 材料科学 医学 生物 有机化学 工程类 生物化学 纳米技术 物理 内科学 计算机科学 化学工程 复合材料 遗传学 基因 物理化学 催化作用 电极 光电子学 量子力学
热门帖子
关注 科研通微信公众号,转发送积分 2375058
求助须知:如何正确求助?哪些是违规求助? 2082608
关于积分的说明 5221511
捐赠科研通 1809937
什么是DOI,文献DOI怎么找? 903374
版权声明 558428
科研通“疑难数据库(出版商)”最低求助积分说明 482274