On the induction of decision trees for multiple concept learning

增量决策树 概率逻辑 离散化 决策树 数学 ID3算法 决策树学习 熵(时间箭头) 计算机科学 启发式 回溯 数学优化 算法 人工智能 量子力学 物理 数学分析
作者
Usama M. Fayyad
出处
期刊:University of Michigan - Deep Blue 被引量:82
摘要

We focus on developing improvements to algorithms that generate decision trees from training data. This dissertation makes four contributions to the theory and practice of the top-down non-backtracking induction of decision trees for multiple concept learning. First, we provide formal results for determining how one generated tree is better than another. We consider several performance measures on decision trees and show that the most important measure to minimize is the number of leaves. Notably, we derive a probabilistic relation between the number of leaves of the decision tree and its expected error rate. The second contribution deals with improving tree generation by avoiding problems inherent in the current popular approaches to tree induction. We formulate algorithms GID3 and GID3$\\sp*$ that are capable of grouping irrelevant attribute values in subsets rather than branching on them individually. We empirically demonstrate that better trees are obtained. Thirdly, we present results applicable to the binary discretization of continuous-valued attributes using the information entropy minimization heuristic. The results serve to give a better understanding of the entropy measure, to point out desirable properties that justify its usage in a formal sense, and to improve the efficiency of evaluating continuous-valued attributes for cut point selection. We then proceed to extend the binary discretization algorithm to derive multiple interval quantizations. We justify our criterion for deciding the intervals using decision-theoretic principles. Empirical results demonstrate improved efficiency and that the multiple interval discretization algorithm allows GID3$\\sp*$ to find better trees. Finally, we analyze the merits and limitations of using the entropy measure (and others from the family of impurity measures) for attribute selection. We argue that the currently used family of measures is not particularly well-suited for attribute selection. We motivate and formulate a new family of measures: C-SEP. The new algorithm, O-BTREE, that uses a selection measure from this family is empirically demonstrated to produce better trees. Ample experimental results are provided to demonstrate the utility of the above contributions by applying them to synthetic and real-world problems. Some applications come from our involvement in the automation of semiconductor manufacturing techniques.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
管某发布了新的文献求助10
刚刚
GraysonG发布了新的文献求助10
1秒前
2秒前
2秒前
逢春发布了新的文献求助10
3秒前
5秒前
ding应助可耐的道之采纳,获得30
5秒前
7秒前
Jasper应助大万采纳,获得10
8秒前
桐桐应助Lisby采纳,获得10
9秒前
张铭完成签到,获得积分10
10秒前
晓风残月123完成签到,获得积分10
10秒前
SnownS发布了新的文献求助20
10秒前
12秒前
14秒前
热情礼貌一问三不知完成签到 ,获得积分10
14秒前
CodeCraft应助烟雨醉巷采纳,获得10
14秒前
泽出森完成签到 ,获得积分20
14秒前
科研小魏完成签到,获得积分10
14秒前
16秒前
禹无极发布了新的文献求助10
17秒前
17秒前
蓝天发布了新的文献求助10
19秒前
赘婿应助崔鑫采纳,获得10
20秒前
20秒前
SnownS完成签到,获得积分10
21秒前
彬彬完成签到 ,获得积分10
21秒前
易水完成签到 ,获得积分10
21秒前
自由的冰夏完成签到,获得积分10
22秒前
大万发布了新的文献求助10
23秒前
Wenge发布了新的文献求助10
24秒前
李健的小迷弟应助小白采纳,获得10
24秒前
wenqin发布了新的文献求助10
25秒前
yjh123应助姚克婷采纳,获得10
27秒前
28秒前
28秒前
qiu发布了新的文献求助10
28秒前
28秒前
ysm完成签到,获得积分10
29秒前
30秒前
高分求助中
液晶指向矢仿真分析数据集 8888
Invited Discussant 63O and 64O 1000
Ideology and Meaning-Making under the Putin Regime 750
Petrology and Plate Tectonics 500
Writing Systems 500
A Handbook of User Experience Research & Design in Libraries 400
Understanding Modeling and Simulation of Polymerization Reactions 400
热门求助领域 (近24小时)
化学 材料科学 医学 生物 纳米技术 工程类 有机化学 计算机科学 化学工程 生物化学 物理 内科学 复合材料 催化作用 光电子学 物理化学 电极 细胞生物学 基因 遗传学
热门帖子
关注 科研通微信公众号,转发送积分 6883437
求助须知:如何正确求助?哪些是违规求助? 8582342
关于积分的说明 18233200
捐赠科研通 6269227
什么是DOI,文献DOI怎么找? 3056086
关于科研通互助平台的介绍 2067890
邀请新用户注册赠送积分活动 2033742