检验统计量
核(代数)
变核密度估计
数学
统计
分布的核嵌入
I类和II类错误
统计的
核方法
样本量测定
序贯概率比检验
选择(遗传算法)
统计假设检验
计算机科学
人工智能
组合数学
支持向量机
作者
Arthur Gretton,Dino Sejdinović,Heiko Strathmann,Sivaraman Balakrishnan,Massimiliano Pontil,Kenji Fukumizu,Bharath K. Sriperumbudur
摘要
Given samples from distributions p and q, a two-sample test determines whether to reject the null hypothesis that p = q, based on the value of a test statistic measuring the distance between the samples. One choice of test statistic is the maximum mean discrepancy (MMD), which is a distance between embeddings of the probability distributions in a reproducing kernel Hilbert space. The kernel used in obtaining these embeddings is critical in ensuring the test has high power, and correctly distinguishes unlike distributions with high probability. A means of parameter selection for the two-sample test based on the MMD is proposed. For a given test level (an upper bound on the probability of making a Type I error), the kernel is chosen so as to maximize the test power, and minimize the probability of making a Type II error. The test statistic, test threshold, and optimization over the kernel parameters are obtained with cost linear in the sample size. These properties make the kernel selection and test procedures suited to data streams, where the observations cannot all be stored in memory. In experiments, the new kernel selection approach yields a more powerful test than earlier kernel selection heuristics.
科研通智能强力驱动
Strongly Powered by AbleSci AI