特征选择
过度拟合
大数据
计算机科学
降维
SPARK(编程语言)
选择(遗传算法)
数据科学
特征(语言学)
机器学习
维数之咒
人工智能
数据挖掘
互联网
过程(计算)
维数(图论)
数据预处理
万维网
人工神经网络
哲学
操作系统
程序设计语言
纯数学
数学
语言学
作者
Siva Sankari Subbiah,C. Jayakumar
出处
期刊:Ingénierie Des Systèmes D'information
[International Information and Engineering Technology Association]
日期:2021-02-28
卷期号:26 (1): 67-77
被引量:6
摘要
Now a day, all the organizations collecting huge volume of data without knowing its usefulness. The fast development of Internet helps the organizations to capture data in many different formats through Internet of Things (IoT), social media and from other disparate sources. The dimension of the dataset increases day by day at an extraordinary rate resulting in large scale dataset with high dimensionality. The present paper reviews the opportunities and challenges of feature selection for processing the high dimensional data with reduced complexity and improved accuracy. In the modern big data world the feature selection has a significance in reducing the dimensionality and overfitting of the learning process. Many feature selection methods have been proposed by researchers for obtaining more relevant features especially from the big datasets that helps to provide accurate learning results without degradation in performance. This paper discusses the importance of feature selection, basic feature selection approaches, centralized and distributed big data processing using Hadoop and Spark, challenges of feature selection and provides the summary of the related research work done by various researchers. As a result, the big data analysis with the feature selection improves the accuracy of the learning.
科研通智能强力驱动
Strongly Powered by AbleSci AI