计算机科学
机器学习
人工智能
决策树
维数(图论)
数据挖掘
算法
构造(python库)
元学习(计算机科学)
特征(语言学)
知识抽取
任务(项目管理)
数学
哲学
经济
管理
程序设计语言
纯数学
语言学
标识
DOI:10.1109/icdmw.2015.43
摘要
In performing data mining, a common task is to search for the most appropriate algorithm(s) to retrieve important information from data. With an increasing number of available data mining techniques, it may be impractical to experiment with many techniques on a specific dataset of interest to find the best algorithm(s). In this paper, we demonstrate the suitability of tree-based multi-variable linear regression in predicting algorithm performance. We take into account prior machine learning experience to construct meta-knowledge for supervised learning. The idea is to use summary knowledge about datasets along with past performance of algorithms on these datasets to build this meta-knowledge. We augment pure statistical summaries with descriptive features and a misclassification cost, and discover that transformed datasets obtained by reducing a high dimensional feature space to a smaller dimension still retain significant characteristic knowledge necessary to predict algorithm performance. Our approach works well for both numerical and nominal data obtained from real world environments.
科研通智能强力驱动
Strongly Powered by AbleSci AI