机器学习
人工神经网络
计算机科学
人工智能
维数之咒
相关性(法律)
特征选择
图形
多层感知器
感知器
降维
特征(语言学)
数据挖掘
指纹(计算)
利用
模式识别(心理学)
深度学习
特征学习
标记数据
注意力网络
维数(图论)
特征提取
深层神经网络
选择(遗传算法)
支持向量机
作者
Qingtian Zhang,Dangxin Mao,Yusong Tu,Yuanyan Wu
标识
DOI:10.1021/acs.jcim.4c00586
摘要
Machine learning plays a role in accelerating drug discovery, and the design of effective machine learning models is crucial for accurately predicting molecular properties. Characterizing molecules typically involves the use of molecular fingerprints and molecular graphs. These are input into a multilayer perceptron (MLP) and variants of graph neural networks, such as graph attention networks (GATs). Due to the diverse types and large dimension of fingerprints, models may contain many features that are relatively irrelevant or redundant; meanwhile, although the GAT excels in handling heterogeneous graph tasks, it lacks the ability to extract collaborative information from neighboring nodes, which is crucial in scenarios where it cannot capture the joint influence of adjacent groups on atoms. To overcome these challenges, we introduce a hybrid model, combining improved GAT and MLP. In GAT, the recurrent neural network is employed to capture collaborative information. To address the dimensionality issue, we propose a feature selection algorithm, which is based on the principle of maximizing relevance while minimizing redundancy. Through experiments on 13 public data sets and 14 breast cell lines, our model demonstrates superior performance compared to state-of-the-art deep learning and traditional machine learning algorithms. Additionally, a series of ablation experiments were conducted to demonstrate the advantages of our improved version, as well as its antinoise capability and interpretability. These results indicate that our model holds promising prospects for practical applications.
科研通智能强力驱动
Strongly Powered by AbleSci AI