StructGuy: Data leakage free prediction of functional effects of genetic variants

作者
Alexander Greß,Johanna Becher,Dominique Mias‐Lucquin,Roman Joeres,Olga V. Kalinina
标识
DOI:10.64898/2025.12.01.691563
摘要

Abstract The extent to which variations in protein-coding genes affect protein function has drawn the biological machine learning community’s attention to computationally model variant effect prediction tools. Multiplexed assays of variant effects (MAVE) experiments serve as a rich data source, but cannot deliver enough data for training truly large neural-net models. Therefore, zero-shot methods, for example protein language models, have increasingly gained popularity. For these methods, MAVE results serve primarily for evaluation purposes, as exemplified by the ProteinGym benchmark. In this study, we argue that the rapidly increasing amounts of MAVE data can be used to train efficient supervised methods, presenting our new tool StructGuy, based on gradient boosting trees methodology. In contrast to other supervised methods in the field, StructGuy, thanks to its dedicated training dataset and data leakage-free training process, can predict variant effects for proteins not seen during training. To evaluate this generalization ability, we constructed a dedicated benchmark and compared StructGuy with zero-shot methods from the ProteinGym leaderboard achieving a competitive performance. Further, we demonstrate that thanks to its architecture and careful feature engineering, we are able to provide fully interpretable predictions and direct explanations of the influence of mutations on protein three-dimensional structure, which favourably differs StructGuy from zero-shot tools.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
George完成签到,获得积分10
1秒前
1秒前
爆米花应助开朗的柚子采纳,获得10
1秒前
年轻上线完成签到,获得积分10
1秒前
我是老大应助kkk采纳,获得10
1秒前
紫苏艾草22完成签到,获得积分10
1秒前
2秒前
鸿俦鹤侣完成签到,获得积分10
2秒前
Guomin完成签到,获得积分10
2秒前
lily发布了新的文献求助10
2秒前
2秒前
3秒前
JamesPei应助Shawn采纳,获得10
3秒前
mg完成签到,获得积分10
4秒前
4秒前
Alav0314完成签到,获得积分10
5秒前
欢喜晓蕾完成签到,获得积分20
5秒前
张欢馨应助Willa采纳,获得30
6秒前
发AM完成签到 ,获得积分10
7秒前
科研通AI2S应助细雨清心采纳,获得10
7秒前
zhk完成签到,获得积分10
8秒前
科研通AI6.2应助7777饭采纳,获得10
8秒前
GG酱发布了新的文献求助10
8秒前
9秒前
chen发布了新的文献求助10
10秒前
Wang应助milan采纳,获得10
10秒前
bobecust完成签到,获得积分10
10秒前
2052669099发布了新的文献求助50
10秒前
白凌风完成签到 ,获得积分10
10秒前
10秒前
11秒前
惔惔惔完成签到,获得积分10
11秒前
lzl008完成签到 ,获得积分10
11秒前
11秒前
姜彩秀完成签到,获得积分10
12秒前
Sunny发布了新的文献求助10
12秒前
12秒前
没有神的过往完成签到,获得积分10
12秒前
呱呱完成签到,获得积分10
12秒前
乐观元彤发布了新的文献求助10
12秒前
高分求助中
Malcolm Fraser : a biography 700
Signals, Systems, and Signal Processing 610
天津市智库成果选编 600
Climate change and sports: Statistics report on climate change and sports 500
Forced degradation and stability indicating LC method for Letrozole: A stress testing guide 500
Organic Reactions Volume 118 400
A Foreign Missionary on the Long March: The Unpublished Memoirs of Arnolis Hayman of the China Inland Mission 400
热门求助领域 (近24小时)
化学 材料科学 医学 生物 纳米技术 工程类 有机化学 化学工程 生物化学 计算机科学 物理 内科学 复合材料 催化作用 物理化学 光电子学 电极 细胞生物学 基因 无机化学
热门帖子
关注 科研通微信公众号,转发送积分 6460823
求助须知:如何正确求助?哪些是违规求助? 8269470
关于积分的说明 17627903
捐赠科研通 5530898
什么是DOI,文献DOI怎么找? 2906316
邀请新用户注册赠送积分活动 1883147
关于科研通互助平台的介绍 1728709