Importance of spatial predictor variable selection in machine learning applications – Moving from data reproduction to spatial prediction

过度拟合 空间分析 计算机科学 随机森林 空间生态学 空间相关性 机器学习 选择(遗传算法) 变量(数学) 人工智能 空间相关性 地质统计学 特征选择 空间变异性 数据挖掘 统计 生态学 数学 人工神经网络 数学分析 生物 电信
作者
Hanna Meyer,Christoph Reudenbach,Stephan Wöllauer,Thomas Nauß
出处
期刊:Ecological Modelling [Elsevier]
卷期号:411: 108815-108815 被引量:175
标识
DOI:10.1016/j.ecolmodel.2019.108815
摘要

Machine learning algorithms find frequent application in spatial prediction of biotic and abiotic environmental variables. However, the characteristics of spatial data, especially spatial autocorrelation, are widely ignored. We hypothesize that this is problematic and results in models that can reproduce training data but are unable to make spatial predictions beyond the locations of the training samples. We assume that not only spatial validation strategies but also spatial variable selection is essential for reliable spatial predictions. We introduce two case studies that use remote sensing to predict land cover and the leaf area index for the "Marburg Open Forest", an open research and education site of Marburg University, Germany. We use the machine learning algorithm Random Forests to train models using non-spatial and spatial cross-validation strategies to understand how spatial variable selection affects the predictions. Our findings confirm that spatial cross-validation is essential in preventing overoptimistic model performance. We further show that highly autocorrelated predictors (such as geolocation variables, e.g. latitude, longitude) can lead to considerable overfitting and result in models that can reproduce the training data but fail in making spatial predictions. The problem becomes apparent in the visual assessment of the spatial predictions that show clear artefacts that can be traced back to a misinterpretation of the spatially autocorrelated predictors by the algorithm. Spatial variable selection could automatically detect and remove such variables that lead to overfitting, resulting in reliable spatial prediction patterns and improved statistical spatial model performance. We conclude that in addition to spatial validation, a spatial variable selection must be considered in spatial predictions of ecological data to produce reliable predictions.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
大幅提高文件上传限制,最高150M (2024-4-1)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
Jasper应助MCC采纳,获得10
刚刚
SOLOMON应助可可采纳,获得20
1秒前
SciGPT应助可可采纳,获得10
1秒前
云瑾应助可可采纳,获得20
1秒前
cctv18应助可可采纳,获得10
1秒前
Lucas应助可可采纳,获得10
1秒前
丘比特应助可可采纳,获得10
1秒前
Ava应助可可采纳,获得10
1秒前
benben应助可可采纳,获得10
1秒前
巴山夜雨发布了新的文献求助10
3秒前
小二郎应助Misty采纳,获得10
4秒前
Jolin完成签到,获得积分10
5秒前
6秒前
6秒前
小马甲应助nanonamo采纳,获得10
6秒前
舒适冰之完成签到,获得积分10
7秒前
旺仔完成签到,获得积分10
8秒前
9秒前
Woai_4845464完成签到,获得积分10
9秒前
郭志强发布了新的文献求助10
9秒前
任性半凡完成签到,获得积分10
10秒前
Jasper应助add采纳,获得10
10秒前
12秒前
12秒前
13秒前
13秒前
希望天下0贩的0应助vuig采纳,获得10
16秒前
zmx1025发布了新的文献求助10
16秒前
17秒前
可心发布了新的文献求助10
18秒前
18秒前
18秒前
19秒前
19秒前
Orange应助qiu采纳,获得10
19秒前
坚强雅绿发布了新的文献求助20
21秒前
迟早完成签到 ,获得积分10
21秒前
yang完成签到,获得积分10
21秒前
JamesPei应助今夜无人入眠采纳,获得10
23秒前
23秒前
高分求助中
Manual of Clinical Microbiology, 4 Volume Set (ASM Books) 13th Edition 1000
Cross-Cultural Psychology: Critical Thinking and Contemporary Applications (8th edition) 800
Counseling With Immigrants, Refugees, and Their Families From Social Justice Perspectives pages 800
マンネンタケ科植物由来メロテルペノイド類の網羅的全合成/Collective Synthesis of Meroterpenoids Derived from Ganoderma Family 500
Electrochemistry 500
[Lambert-Eaton syndrome without calcium channel autoantibodies] 400
Statistical Procedures for the Medical Device Industry 400
热门求助领域 (近24小时)
化学 材料科学 医学 生物 有机化学 工程类 生物化学 纳米技术 物理 内科学 计算机科学 化学工程 复合材料 遗传学 基因 物理化学 催化作用 电极 光电子学 量子力学
热门帖子
关注 科研通微信公众号,转发送积分 2374825
求助须知:如何正确求助?哪些是违规求助? 2082413
关于积分的说明 5220393
捐赠科研通 1809741
什么是DOI,文献DOI怎么找? 903295
版权声明 558423
科研通“疑难数据库(出版商)”最低求助积分说明 482216