Explainable Transcription Factor Prediction with Protein Language Models

计算机科学 人工智能 转录因子 因子(编程语言) 自然语言处理 计算生物学 基因 程序设计语言 生物 遗传学
作者
Liyuan Gao,K.-H. Shu,Jun Zhang,Victor S. Sheng
标识
DOI:10.1109/bibm58861.2023.10385498
摘要

Language models have exhibited remarkable performance across diverse tasks, including those in the realm of biological research such as protein language modeling. Transcription factors (TFs) are pivotal in gene regulation, influencing gene expression through specific DNA sequence binding. While various TF prediction techniques exist, they often necessitate extensive training datasets or suffer from limited accuracy. In this study, we propose an ESM-TFpredict model, which leverages a pre-trained protein language model to encode amino acid sequences, followed by 1-D convolutional neural networks for TF prediction. To elucidate the model's decision-making, we employ an integrated gradients method to highlight the important features driving TF identification. Comparative experimental analysis with existing models, DeepTFactor and TFpredict, reveals that the ESM-TFpredict achieves an accuracy exceeding 95% across four evaluation metrics, surpassing both competitors. By utilizing a slide window approach for protein representation compression, the training duration of ESM-TFpredict is 315.78 seconds, which is only 51% of the training time required by DeepTFactor and a mere 12% of the training time required by TFpredict. We further analyze the contributions of known TF-related regions (average attribution score 0.9152) versus Non-TF-related regions (average attribution score 0.0848), demonstrating that the TF-related regions have dominant influences on TF prediction.

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
1秒前
wangzhenghua完成签到 ,获得积分10
1秒前
Jasper应助干炒法棍采纳,获得10
1秒前
LJQ6完成签到,获得积分10
2秒前
2秒前
Ava应助hrq采纳,获得10
2秒前
香蕉觅云应助EAZE采纳,获得10
2秒前
3秒前
wanci应助高兴的万宝路采纳,获得10
6秒前
6秒前
干羞花发布了新的文献求助10
6秒前
喜悦的苠完成签到,获得积分10
9秒前
10秒前
李爱国应助流萤采纳,获得10
10秒前
赵晴发布了新的文献求助10
10秒前
Lucas应助111采纳,获得10
13秒前
CipherSage应助张明采纳,获得30
15秒前
积极远望完成签到 ,获得积分10
16秒前
ding应助刘齐采纳,获得10
16秒前
www发布了新的文献求助10
16秒前
三金发布了新的文献求助10
17秒前
打打应助甜美的依白采纳,获得10
18秒前
大个应助动听的涵双采纳,获得10
18秒前
华仔应助小白一定会发sci采纳,获得10
19秒前
19秒前
19秒前
19秒前
兵王应助乐观元彤采纳,获得30
19秒前
21秒前
21秒前
22秒前
丘比特应助心灵美的南晴采纳,获得10
22秒前
宋瓜发布了新的文献求助10
22秒前
22秒前
23秒前
Jasper应助lucas采纳,获得10
23秒前
zhangw发布了新的文献求助10
24秒前
24秒前
24秒前
25秒前
高分求助中
Adhesion Science: Principles & Practice 1234
Signals, Systems, and Signal Processing 610
Fundamentals of Pharmaceutical and Biologics Regulations: A Global Perspective, Second Edition 600
The Resilient Mindset 400
Impact of Storage Orientation and Duration on Prefilled Syringe Performance: Break-Loose and Glide Forces, and Injection Time Across Multiple Time Points 360
Programming for Chemical Engineers Using C, C++, and MATLAB 300
Upland Kenya wild flowers and ferns: a flora of the flowers, ferns, grasses, and sedges of highland Kenya 300
热门求助领域 (近24小时)
化学 材料科学 医学 生物 纳米技术 工程类 有机化学 化学工程 生物化学 计算机科学 物理 内科学 复合材料 催化作用 物理化学 光电子学 电极 细胞生物学 基因 无机化学
热门帖子
关注 科研通微信公众号,转发送积分 6649559
求助须知:如何正确求助?哪些是违规求助? 8404633
关于积分的说明 17971670
捐赠科研通 5843588
什么是DOI,文献DOI怎么找? 2970868
邀请新用户注册赠送积分活动 1946139
关于科研通互助平台的介绍 1865638