计算机科学
特征向量
增量决策树
序列(生物学)
决策树
模式识别(心理学)
特征(语言学)
区间树
构造(python库)
决策树学习
树(集合论)
机器学习
树形结构
人工智能
数据挖掘
算法
二叉树
数学
遗传学
生物
数学分析
哲学
语言学
程序设计语言
作者
Zengyou He,Ziyao Wu,Guangyao Xu,Yan Liu,Quan Zou
标识
DOI:10.1109/tkde.2021.3075023
摘要
Current decision trees such as C4.5 and CART are widely used in different fields due to their simplicity, accuracy and intuitive interpretation. Similar to other popular classifiers, these tree-based classification algorithms are developed for fixed-length vector data and suffer from intrinsic limitations in handling complex data such as sequences. To tackle the discrete sequence classification task, the dominant strategy is to adopt a two-step procedure: first transform the sequential dataset into a vector dataset and then apply existing tree-based classifiers on the new vector data. However, such methods are highly dependent on the feature generation procedure and some features that are critical to the tree construction may be missed. To alleviate these issues, we present a new tree-based sequence classification method, which is able to construct a concise decision tree from the feature space that is composed of all subsequences present in the training sequences. Experimental results on fourteen real datasets show that our method can achieve better performance than those state-of-the-art sequence classification algorithms. The source codes of our method are available at: https://github.com/ZiyaoWu/SeqDT.
科研通智能强力驱动
Strongly Powered by AbleSci AI