Two-stage subsampling variable selection for sparse high-dimensional generalized linear models

阿卡克信息准则 偏差(统计) 特征选择 选型 估计员 线性回归 偏最小二乘回归 广义线性模型 数学 Lasso(编程语言) 数据集 信息标准 计算机科学 回归 统计 人工智能 万维网
作者
Marinela Capanu,Mihai Giurcanu,Colin B. Begg,Mithat Gönen
出处
期刊:Statistical Methods in Medical Research [SAGE Publishing]
卷期号:34 (7): 1504-1521
标识
DOI:10.1177/09622802251343597
摘要

Although high-dimensional data analysis has received a lot of attention after the advent of omics data, model selection in this setting continues to be challenging and there is still substantial room for improvement. Through a novel combination of existing methods, we propose here a two-stage subsampling approach for variable selection in high-dimensional generalized linear regression models. In the first stage, we screen the variables using smoothly clipped absolute deviance penalty regularization followed by partial least squares regression on repeated subsamples of the data; we include in the second stage only those predictors that were most frequently selected over the subsamples either by smoothly clipped absolute deviance or for having the top loadings in either of the first two partial least squares regression components. In the second stage, we again repeatedly subsample the data and, for each subsample, we find the best Akaike information criterion model based on an exhaustive search of all possible models on the reduced set of predictors. We then include in the final model those predictors with high selection probability across the subsamples. We prove that the proposed first-stage estimator is n 1 / 2 -consistent and that the true predictors are included in the first stage with probability converging to 1. In an extensive simulation study, we show that this two-stage approach outperforms the competitors yielding among the highest probability of selecting the true model while having one of the lowest number of false positives in the settings of logistic, Poisson, and linear regression. We illustrate the proposed method on two gene expression cancer datasets.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
科研通AI6.2应助czj采纳,获得10
刚刚
地球发布了新的文献求助10
刚刚
xqssll完成签到,获得积分10
1秒前
1秒前
大苏打发布了新的文献求助10
3秒前
感动冰淇淋完成签到,获得积分10
3秒前
Four_twos应助polar_star采纳,获得10
3秒前
张晓晗完成签到,获得积分10
3秒前
领导范儿应助polar_star采纳,获得10
3秒前
汉堡包应助polar_star采纳,获得10
3秒前
慕青应助polar_star采纳,获得10
4秒前
4秒前
dopamine发布了新的文献求助10
6秒前
clownnn发布了新的文献求助10
6秒前
烽火中的狼完成签到,获得积分10
7秒前
科目三应助polar_star采纳,获得10
9秒前
怡然的怜烟应助polar_star采纳,获得10
9秒前
大个应助polar_star采纳,获得10
9秒前
molihuakai应助polar_star采纳,获得10
9秒前
Lucas应助polar_star采纳,获得10
9秒前
小蘑菇应助polar_star采纳,获得10
9秒前
Lucas应助polar_star采纳,获得10
9秒前
怡然的怜烟应助polar_star采纳,获得10
9秒前
treeman发布了新的文献求助10
9秒前
小马甲应助polar_star采纳,获得10
9秒前
小二郎应助polar_star采纳,获得10
9秒前
猎空完成签到,获得积分10
9秒前
完美世界应助舟舟采纳,获得30
10秒前
12秒前
clownnn完成签到,获得积分10
12秒前
xnz完成签到,获得积分10
14秒前
18秒前
18秒前
小马甲应助polar_star采纳,获得10
18秒前
orixero应助polar_star采纳,获得10
18秒前
星辰大海应助polar_star采纳,获得10
19秒前
田様应助polar_star采纳,获得10
19秒前
丘比特应助polar_star采纳,获得10
19秒前
上官若男应助polar_star采纳,获得10
19秒前
香蕉觅云应助polar_star采纳,获得10
19秒前
高分求助中
(应助此贴封号)【重要!!请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000
A Research Agenda for Law, Finance and the Environment 800
Development Across Adulthood 800
Chemistry and Physics of Carbon Volume 18 800
The Organometallic Chemistry of the Transition Metals 800
A Time to Mourn, A Time to Dance: The Expression of Grief and Joy in Israelite Religion 700
The formation of Australian attitudes towards China, 1918-1941 640
热门求助领域 (近24小时)
化学 材料科学 医学 生物 纳米技术 工程类 有机化学 化学工程 生物化学 计算机科学 物理 内科学 复合材料 催化作用 物理化学 光电子学 电极 细胞生物学 基因 无机化学
热门帖子
关注 科研通微信公众号,转发送积分 6446313
求助须知:如何正确求助?哪些是违规求助? 8259776
关于积分的说明 17596184
捐赠科研通 5507457
什么是DOI,文献DOI怎么找? 2901975
邀请新用户注册赠送积分活动 1879043
关于科研通互助平台的介绍 1719210