清晨好,您是今天最早来到科研通的研友!由于当前在线用户较少,发布求助请尽量完整地填写文献信息,科研通机器人24小时在线,伴您科研之路漫漫前行!

MULTIPLE ways to correct for MULTIPLE comparisons in MULTIPLE types of studies

I类和II类错误 邦费罗尼校正 无效假设 多重比较问题 统计 统计能力 统计假设检验 数学 标称水平 样本量测定 空(SQL) p值 统计显著性 错误发现率 多重性(数学) 替代假设 字错误率 计量经济学 计算机科学 人工智能 数据挖掘 置信区间 基因 生物化学 数学分析 化学
作者
Loes M. Hollestein,Serigne Lo,Jo Leonardi‐Bee,Saharon Rosset,Noam Shomron,Dominique‐Laurent Couturier,Sonia Gran
出处
期刊:British Journal of Dermatology [Oxford University Press]
卷期号:185 (6): 1081-1083 被引量:23
标识
DOI:10.1111/bjd.20600
摘要

Research articles typically present the results of several hypothesis tests and often state 'all tests with P-values < 0·05 were considered statistically significant'. This ignores that multiple tests were performed, which can induce false-positive findings. Indeed, when multiple true null hypotheses are tested, the probability of rejecting at least one null hypothesis [referred to as the overall type I error rate or family-wise error rate (FWER)] increases with the number of tests. For instance, if 20 independent statistical tests are performed at the 0·05 significance level in a scenario in which all null hypotheses are true, the probability of rejecting at least one null hypothesis is almost 65%. This inflation of the type I error rate, known as a multiple testing problem or multiplicity, constitutes a real challenge to researchers and partly explains the lack of reproducibility of scientific findings.1 Many procedures have been developed to overcome multiplicity.2 Due to its simplicity, the most widely used approach is the Bonferroni procedure, where the type I error for each test equals the target overall type I error level (usually 0·05) divided by the number of tests. This multiplicity correction leads to an FWER close to the target overall type I error level when all tests are independent, but it is known to be overly conservative when the tested hypotheses are related, leading to an unnecessary loss of power (i.e. lower probability of finding true associations). Therefore, multiplicity correction methods taking their dependence into account are generally preferred in order to gain power (e.g. resampling methods such as bootstrap and permutation tests).3, 4 When the number of tests is very large, like in omics studies (e.g. genomics or transcriptomics), control of the false discovery rate (FDR; i.e. the proportion of true null hypotheses among all rejected null hypotheses) is usually preferred to the control of the FWER as it allows notable gains in power.5 The choice regarding which method to use depends on the type of study and the hypotheses to be tested. The aim of this editorial is to briefly discuss the use of multiplicity correction in different contexts and to state the multiplicity requirements for publication in the BJD. Sample sizes of clinical trials are based on a single endpoint or coprimary endpoints.6, 7 A trial with coprimary endpoints is considered negative if the result related to any of the coprimary endpoints is not significant. The use of multiple primary endpoints for a given sample size induces a loss of power but does not increase the type I error rate. In addition to the primary endpoint(s), a set of secondary and exploratory endpoints, for which no a priori sample size calculation was performed, is usually tested as well. In order to prevent false-positive findings among the set of secondary endpoints, a clear distinction between the true secondary endpoints (which may support the primary endpoint and/or show additional effects after success of the primary endpoint) and the exploratory endpoints (hypothesis generating or endpoints with very low event rates) should be made.7 Hypothesis testing for exploratory endpoints is not recommended,6 but the type I error rate should be controlled for secondary endpoints, typically by means of a FWER approach. If there is no effect on the primary endpoint(s), no effect on related secondary endpoints may be expected, so that one may decide to stop statistical testing after a nonsignificant result (a fixed-sequence or serial-gatekeeping approach).8 Endpoints may also be grouped into families (e.g. a family of multiple effectiveness outcomes and a family of multiple quality-of-life scores). All endpoints within a family can be tested with a correction for multiple comparisons, and one may only proceed to the next family when there is statistical success in the preceding family (a fixed-sequence approach applied to families). Omics studies investigate the relationship between a particular type of sample molecule and a sample attribute. Examples are genome-wide association studies, in which a large set of single-nucleotide polymorphisms is tested for the association with an outcome of interest (e.g. skin cancer), or RNA-Seq experiments, in which differences in gene or protein expression between conditions (e.g. treated vs. not treated) are investigated. As such studies typically involve hundreds to millions of (usually dependent) simultaneous tests, FWER control of the type I error would lead to a drastic loss of power, explaining why FDR approaches are preferred, as they control for the fraction of false discoveries among the rejected hypotheses.9 The most commonly used FDR multiplicity correction is the one introduced by Benjamini and Hochberg and is valid for independent10 or positively dependent test statistics,11 such as test statistics (positively) correlated due to measurement errors affecting all or some parameters of interest in a common way. As other dependence structures may be observed in practice, an FDR approach valid under more general dependence structures was later introduced by Benjamini and Yekutieli at the price of some loss of power.11 False-positive findings may occur in studies where subgroup analyses are performed without multiplicity adjustment (e.g. a meta-analyses stratified by timepoints of an outcome). As tests of such analyses typically involve correlated outcomes and/or comparisons repeatedly involving the same groups, a resampling-based FWER multiplicity correction would provide the greatest power. To maintain a high power, a limited number of subgroup analyses should be prespecified in the protocol, where the subgroups chosen should be based on a clear hypothesis with a pre-existing biological rationale. If regression models are used for causal inference, hypotheses of the association between an exposure and outcome are tested and multiplicity should be addressed, if there is more than one outcome, using the methods mentioned above. Note that in parametric models (e.g. generalized linear models and survival models), the dependence between the tests of interest can usually be obtained under standard asymptotic normality assumptions, allowing the dependence between them (e.g. middle age vs. young age, and old age vs. young age) to be taken into account when performing FWER multiplicity corrections.2 This leads to a gain in power compared with Bonferroni-like multiplicity corrections. When developing prediction models, the number of subjects (linear regression), cases (logistic regression) or events (survival models) determines the amount of statistical power and thus how many variables can be included in the model.12, 13 As a rule of thumb, 10 subjects, cases or events are needed per variable. When developing a prediction model with a multiplicity of variables and a too low number of events, there is a risk of predicting random error (i.e. overfitting) and very poor performance of the prediction model in another patient sample. In those situations, even more than 10 subjects, cases or events per variable may be required.14 Multiple comparisons can be foreseen at the design phase of a study, when multiple hypotheses are formulated. Therefore, methods to correct for multiple comparisons should be prespecified in the protocol and/or the statistical analysis plan. The BJD requires that clinical trials and systematic reviews are preregistered and encourages that the protocols of trials are published elsewhere and submitted as a supplementary file. We encourage authors of any type of study to consider multiple-testing strategies before the start of the study and to clearly report the strategy of choice in the methods. Loes Maria Hollestein: Writing-original draft (lead); Writing-review & editing (lead). Serigne Lo: Writing-original draft (equal); Writing-review & editing (equal). Jo Leonardi-Bee: Writing-original draft (equal); Writing-review & editing (equal). Saharon Rosset: Writing-original draft (equal); Writing-review & editing (equal). Noam Shomron: Writing-original draft (equal); Writing-review & editing (equal). Dominique-Laurent Couturier: Writing-original draft (equal); Writing-review & editing (equal). Sonia Gran: Writing-original draft (equal); Writing-review & editing (equal).
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
王世卉完成签到,获得积分10
1秒前
3秒前
xingqing完成签到 ,获得积分10
5秒前
如歌完成签到,获得积分10
28秒前
arsenal完成签到 ,获得积分10
32秒前
老戎完成签到 ,获得积分10
37秒前
czj发布了新的文献求助10
1分钟前
1分钟前
1分钟前
常有李完成签到,获得积分10
1分钟前
czj发布了新的文献求助10
1分钟前
whitepiece完成签到,获得积分10
1分钟前
2分钟前
Criminology34应助jcksonzhj采纳,获得10
2分钟前
蝎子莱莱xth完成签到,获得积分10
2分钟前
氢锂钠钾铷铯钫完成签到,获得积分10
2分钟前
Eatanicecube完成签到,获得积分10
2分钟前
Square完成签到,获得积分10
2分钟前
学渣前进应助科研通管家采纳,获得10
2分钟前
Yewen完成签到,获得积分10
2分钟前
turnado完成签到 ,获得积分10
2分钟前
潇洒的惋清应助彦成采纳,获得10
3分钟前
丘比特应助czj采纳,获得10
3分钟前
彦成完成签到,获得积分10
3分钟前
3分钟前
小孟要努力完成签到,获得积分20
3分钟前
Magic完成签到 ,获得积分10
4分钟前
自然亦凝完成签到,获得积分10
4分钟前
naczx完成签到,获得积分0
4分钟前
Copyright应助科研通管家采纳,获得10
4分钟前
研友_VZG7GZ应助科研通管家采纳,获得10
4分钟前
saqi应助hahasun采纳,获得10
4分钟前
神经蛙完成签到 ,获得积分10
5分钟前
cmc完成签到,获得积分10
5分钟前
蛋卷完成签到 ,获得积分0
5分钟前
5分钟前
玛卡巴卡爱吃饭完成签到 ,获得积分10
6分钟前
DrSong完成签到 ,获得积分10
6分钟前
6分钟前
蓝意完成签到,获得积分0
6分钟前
高分求助中
Principles of Economics, 11th Edition 10000
Prescott's Microbiology: 2026 Release ISE 10000
University Physics with Modern Physics, 16th edition 10000
(应助此贴封号)【重要!!请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000
Environmental Leverage in Times of Climate Crisis: Product Standards, Carbon Border Measures and Preferential Trade Agreements 1000
Erwählung und Berufung bei Paulus: Bedeutung, Entwicklung und Funktion einer Vorstellung in ihrem frühjüdischen und griechisch-römischen Kontext 850
Matrix Methods in Data Mining and Pattern Recognition 510
热门求助领域 (近24小时)
化学 材料科学 医学 生物 纳米技术 工程类 有机化学 化学工程 生物化学 计算机科学 内科学 物理 复合材料 催化作用 细胞生物学 无机化学 光电子学 物理化学 电极 基因
热门帖子
关注 科研通微信公众号,转发送积分 7203044
求助须知:如何正确求助?哪些是违规求助? 8837177
关于积分的说明 18651240
捐赠科研通 6848004
什么是DOI,文献DOI怎么找? 3179622
关于科研通互助平台的介绍 2337025
邀请新用户注册赠送积分活动 2154084