过度拟合
计算机科学
特征选择
人工智能
机器学习
选择(遗传算法)
可扩展性
生物学数据
人工神经网络
特征(语言学)
高维数据聚类
高维
生物网络
模式识别(心理学)
数据挖掘
生物信息学
语言学
哲学
数据库
聚类分析
生物
作者
Dinesh Singh,Héctor Climente-González,Mathis Petrovich,Eiryo Kawakami,Makoto Yamada
标识
DOI:10.1109/ijcnn54540.2023.10191985
摘要
Biological data including gene expression data are generally high-dimensional and require efficient, generalizable, and scalable machine-learning methods to discover their complex nonlinear patterns. The recent advances in machine learning can be attributed to deep neural networks (DNNs), which excel in various tasks in terms of computer vision and natural language processing. However, standard DNNs are not appropriate for high-dimensional datasets generated in biology because they have many parameters, which in turn require many samples. In this paper, we propose a DNN-based, nonlinear feature selection method, called the feature selection network (FsNet), for high-dimensional and small number of sample data. Specifically, FsNet comprises a selection layer that selects features and a reconstruction layer that stabilizes the training. Because a large number of parameters in the selection and reconstruction layers can easily result in overfitting under a limited number of samples, we use two tiny networks to predict the large, virtual weight matrices of the selection and reconstruction layers. Experimental results on several real-world, high-dimensional biological datasets demonstrate the efficacy of the proposed method.
科研通智能强力驱动
Strongly Powered by AbleSci AI