kNNDM: k-fold Nearest Neighbour Distance Matching Cross-Validation for map accuracy estimation

厕所 交叉验证 数学 匹配(统计) 统计 模式识别(心理学) 最近的邻居 人工智能 k-最近邻算法 计算机科学 机器学习 数量结构-活动关系
作者
Jan Linnenbrink,Carles Milà,Marvin Ludwig,Hanna Meyer
标识
DOI:10.5194/egusphere-2023-1308
摘要

Abstract. Random and spatial Cross-Validation (CV) methods are commonly used to evaluate machine learning-based spatial prediction models, and the obtained performance values are often interpreted as map accuracy estimates. However, the appropriateness of such approaches is currently the subject of controversy. For the common case where no probability sample for validation purposes is available, in Milà et al. (2022) we proposed the Nearest Neighbour Distance Matching (NNDM) Leave-One-Out (LOO) CV method. This method produces a distribution of geographical Nearest Neighbour Distances (NND) between test and train locations during CV that matches the distribution of NND between prediction and training locations. Hence, it creates predictive conditions during CV that are comparable to what is required when predicting a defined area. Although NNDM LOO CV produced largely reliable map accuracy estimates in our analysis, as a LOO-based method, it cannot be applied to large datasets found in many studies. Here, we propose a novel k-fold CV strategy for map accuracy estimation inspired by the concepts of NNDM LOO CV: the k-fold NNDM (kNNDM) CV. The kNNDM algorithm tries to find a k-fold configuration such that the Empirical Cumulative Distribution Function (ECDF) of NND between test and train locations during CV is matched to the ECDF of NND between prediction and training locations. We tested kNNDM CV in a simulation study with different sampling distributions and compared it to other CV methods including NNDM LOO CV. We found that kNNDM CV performed similarly to NNDM LOO CV and produced reasonably reliable map accuracy estimates across sampling patterns with strong reductions in computation time for large sample sizes. Furthermore, we found a positive linear association between the quality of the match of the two ECDFs in kNNDM and the reliability of the map accuracy estimates. kNNDM provided the advantages of our original NNDM LOO CV strategy while bypassing its sample size limitations.

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
tlqsysu发布了新的文献求助10
刚刚
刚刚
汤姆完成签到,获得积分10
1秒前
2秒前
Hello应助寒冷的迎南采纳,获得20
3秒前
3秒前
3秒前
医学事业完成签到,获得积分10
4秒前
4秒前
汤姆发布了新的文献求助10
5秒前
Rainyin发布了新的文献求助30
5秒前
我是老大应助羊羊羊采纳,获得10
6秒前
_hyl完成签到,获得积分10
6秒前
6秒前
张莹完成签到,获得积分20
8秒前
8秒前
小骨头完成签到,获得积分10
9秒前
ywt完成签到,获得积分10
9秒前
仓促过客发布了新的文献求助10
9秒前
10秒前
田様应助一只小锦鲤采纳,获得10
10秒前
眼睛大的芹菜完成签到 ,获得积分10
10秒前
巴斯巴斯发布了新的文献求助10
11秒前
CJY关闭了CJY文献求助
12秒前
852应助会飞的鱼采纳,获得10
13秒前
许丫丫发布了新的文献求助10
13秒前
cc2004bj应助Ann采纳,获得10
13秒前
hochorsin完成签到,获得积分10
14秒前
14秒前
wjh应助呆萌芙蓉采纳,获得10
14秒前
小伏发布了新的文献求助20
16秒前
SCH_zhu完成签到,获得积分10
17秒前
shaperly完成签到,获得积分10
18秒前
19秒前
微笑语山发布了新的文献求助10
19秒前
CScs25发布了新的文献求助10
20秒前
赵恩琪完成签到 ,获得积分10
20秒前
20秒前
每天都不想读文献完成签到,获得积分10
22秒前
领导范儿应助blue采纳,获得10
22秒前
高分求助中
(应助此贴封号)【重要!!请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000
Les Mantodea de Guyane Insecta, Polyneoptera 2000
Leading Academic-Practice Partnerships in Nursing and Healthcare: A Paradigm for Change 800
Signals, Systems, and Signal Processing 610
Research Methods for Business: A Skill Building Approach, 9th Edition 500
Research Methods for Applied Linguistics 500
Picture Books with Same-sex Parented Families Unintentional Censorship 444
热门求助领域 (近24小时)
化学 材料科学 医学 生物 纳米技术 工程类 有机化学 化学工程 生物化学 计算机科学 物理 内科学 复合材料 催化作用 物理化学 光电子学 电极 细胞生物学 基因 无机化学
热门帖子
关注 科研通微信公众号,转发送积分 6413325
求助须知:如何正确求助?哪些是违规求助? 8232272
关于积分的说明 17474264
捐赠科研通 5466019
什么是DOI,文献DOI怎么找? 2888153
邀请新用户注册赠送积分活动 1864840
关于科研通互助平台的介绍 1703108