范畴变量
数学
离群值
特征(语言学)
统计
连续变量
一致性(知识库)
线性判别分析
排名(信息检索)
数据挖掘
模式识别(心理学)
索引(排版)
计算机科学
人工智能
语言学
万维网
哲学
几何学
作者
Weidong Ma,Jingsong Xiao,Ying Yang,Fei Ye
标识
DOI:10.1080/00949655.2022.2062358
摘要
For ultrahigh dimensional data, we propose a model-free marginal feature screening procedure, which can handle continuous, categorical and discrete response variables, based on the integral Pearson chi-square (IPC) index. The IPC index can be regarded as an extension of the AD index studied by He et al. [A modified mean-variance feature-screening procedure for ultrahigh-dimensional discriminant analysis. Comput Stat Data Anal. 2019;137:155–169]. When the response variable is categorical, we extend He et al.'s work to the case of allowing a diverging number of response categories. However, the IPC index is difficult to estimate when the response is continuous. Thus we modify it and define the fused IPC index using the slice-and-fuse technique. Our feature screening procedure ranking the IPC or fused IPC index is robust to heavy-tailed features and outliers. The sure screening properties and the ranking consistency properties are established for both categorical and continuous responses under mild conditions. The finite sample performance of the proposed procedure is demonstrated through various numerical simulations and two real data applications.
科研通智能强力驱动
Strongly Powered by AbleSci AI