Consistent, Balanced, and Overlapping Label Trees for Extreme Multi-label Learning
计算机科学
人工智能
作者
Zhiqi Ge,Yuanyuan Guan,Ximing Li,Bo Fu
标识
DOI:10.1145/3511808.3557261
摘要
The emerging eXtreme Multi-label Learning (XML) aims to induce multi-label predictive models from big datasets with extremely large numbers of instances, features, and especially labels. To meet the great efficiency challenge of XML, one flexible solution is the methodology of label tree, which, as its name suggests, is technically defined as a tree hierarchy of label subsets, partitioning the original large-scale XML problem into a number of small-scale sub-problems (i.e., denoted by leaf nodes) and then reducing the complexity to logarithmic time. Notably, the expected label trees should accurately find the right leaf nodes for future instances (i.e., effectiveness) and generate balanced leaf nodes (i.e., efficiency). To achieve this, we propose a novel generic method of label tree, namely Consistent, Balanced, and Overlapping Label Tree (CBOLT). To enhance the precision, we employ the weighted clustering to partition non-leaf nodes and allow overlapping label subsets, enabling to alleviate the inconsistent path and disjoint label subset issues. To improve the efficiency, we propose a new concept of a balanced problem scale and implement it with a balanced regularization for non-leaf nodes partition. We conduct extensive experiments on several benchmark XML datasets. Empirical results demonstrate that CBOLT is superior to the existing methods of label trees, and it can be applied to existing XML methods and achieve competitive performance with strong baselines.