MaTPIP: A deep-learning architecture with eXplainable AI for sequence-driven, feature mixed protein-protein interaction prediction

计算机科学 人工智能 水准点(测量) 卷积神经网络 深度学习 麦克内马尔试验 分类器(UML) 机器学习 模式识别(心理学) 大地测量学 数学 统计 地理
作者
Shubhrangshu Ghosh,Pralay Mitra
出处
期刊:Computer Methods and Programs in Biomedicine [Elsevier BV]
卷期号:244: 107955-107955 被引量:5
标识
DOI:10.1016/j.cmpb.2023.107955
摘要

Protein-protein interaction (PPI) is a vital process in all living cells, controlling essential cell functions such as cell cycle regulation, signal transduction, and metabolic processes with broad applications that include antibody therapeutics, vaccines, and drug discovery. The problem of sequence-based PPI prediction has been a long-standing issue in computational biology. We introduce MaTPIP, a cutting-edge deep-learning framework for predicting PPI. MaTPIP stands out due to its innovative design, fusing pre-trained Protein Language Model (PLM)-based features with manually curated protein sequence attributes, emphasizing the part-whole relationship by incorporating two-dimensional granular part (amino-acid) level features and one-dimensional whole-level (protein) features. What sets MaTPIP apart is its ability to integrate these features across three different input terminals seamlessly. MatPIP also includes a distinctive configuration of Convolutional Neural Network (CNN) with Transformer components for concurrent utilization of CNN and sequential characteristics in each iteration and a one-dimensional to two-dimensional converter followed by a unified embedding. The statistical significance of this classifier is validated using McNemar's test. MaTPIP outperformed the existing methods on both the Human PPI benchmark and cross-species PPI testing datasets, demonstrating its immense generalization capability for PPI prediction. We used seven diverse datasets with varying PPI target class distributions. Notably, within the novel PPI scenario, the most challenging category for Human PPI Benchmark, MaTPIP improves the existing state-of-the-art score from 74.1% to 78.6% (measured in Area under ROC Curve), from 23.2% to 32.8% (in average precision) and from 4.9% to 9.5% (in precision at 3% recall) for 50%, 10% and 0.3% target class distributions, respectively. In cross-species PPI evaluation, hybrid MaTPIP establishes a new benchmark score (measured in Area Under precision-recall curve) of 81.1% from the previous 60.9% for Mouse, 80.9% from 56.2% for Fly, 78.1% from 55.9% for Worm, 59.9% from 41.7% for Yeast, and 66.2% from 58.8% for E.coli. Our eXplainable AI-based assessment reveals an average contribution of different feature families per prediction on these datasets. MaTPIP mixes manually curated features with the feature extracted from the pre-trained PLM to predict sequence-based protein-protein association. Furthermore, MaTPIP demonstrates strong generalization capabilities for cross-species PPI predictions.

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
SCF发布了新的文献求助10
刚刚
leyellows完成签到 ,获得积分10
1秒前
马小帅完成签到,获得积分20
2秒前
不会学术的羊完成签到,获得积分10
3秒前
打打应助周海涛采纳,获得10
7秒前
田様应助归仔采纳,获得10
8秒前
Jasper应助归仔采纳,获得10
8秒前
科研通AI6.1应助归仔采纳,获得10
8秒前
彭于晏应助star采纳,获得10
9秒前
打打应助随便采纳,获得10
10秒前
10秒前
10秒前
Owen应助峨眉峰采纳,获得10
12秒前
酷波er应助luop采纳,获得10
12秒前
机智的紫丝完成签到,获得积分10
13秒前
圣诞节完成签到,获得积分10
14秒前
Pytong发布了新的文献求助10
14秒前
江江完成签到 ,获得积分10
16秒前
Jasper应助斯文的莫英采纳,获得10
16秒前
小马甲应助归仔采纳,获得10
16秒前
CipherSage应助归仔采纳,获得10
17秒前
FashionBoy应助归仔采纳,获得10
17秒前
玖玖发布了新的文献求助10
17秒前
顾矜应助归仔采纳,获得10
17秒前
丘比特应助归仔采纳,获得10
17秒前
赘婿应助归仔采纳,获得10
17秒前
nihaoaaaa完成签到,获得积分10
17秒前
李爱国应助归仔采纳,获得10
17秒前
大模型应助归仔采纳,获得10
17秒前
HOKUTO完成签到,获得积分10
17秒前
科研通AI6.3应助归仔采纳,获得10
17秒前
oner完成签到,获得积分10
18秒前
情怀应助归仔采纳,获得10
18秒前
深情安青应助凯不会取名采纳,获得30
18秒前
Pytong完成签到,获得积分10
19秒前
wx完成签到,获得积分10
20秒前
21秒前
21秒前
NexusExplorer应助天黑不打烊采纳,获得10
24秒前
zzx完成签到 ,获得积分10
25秒前
高分求助中
(应助此贴封号)【重要!!请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000
Prompt Engineering for Clinicians: Harnessing AI in Everyday Medical Practice 600
University Physics for the Life Sciences 500
REAL-WORLD EFFICACY AND GENOMIC LANDSCAPE OF POLATUZUMA VEDOTIN-BASED FIRST-LINE THERAPY IN DIFFUSE LARGE B-CELL LYMPHOMA: A FOCUS ON TP53 MUTATIONS AND TREATMENT RESPONSE 500
Handbook of Luminescence Dating 500
Safety Pharmacology 500
《KNN基无铅压电陶瓷电学性能优化与物理机理研究》 500
热门求助领域 (近24小时)
化学 材料科学 医学 生物 纳米技术 工程类 有机化学 计算机科学 化学工程 生物化学 物理 内科学 复合材料 催化作用 光电子学 物理化学 电极 细胞生物学 基因 遗传学
热门帖子
关注 科研通微信公众号,转发送积分 6954876
求助须知:如何正确求助?哪些是违规求助? 8638548
关于积分的说明 18319194
捐赠科研通 6399642
什么是DOI,文献DOI怎么找? 3083431
关于科研通互助平台的介绍 2129689
邀请新用户注册赠送积分活动 2060235