已入深夜,您辛苦了!由于当前在线用户较少,发布求助请尽量完整的填写文献信息,科研通机器人24小时在线,伴您度过漫漫科研夜!祝你早点完成任务,早点休息,好梦!

Explaining anomalies in coal proximity and coal processing data with Shapley and tree-based models

离群值 主成分分析 异常(物理) 异常检测 数据挖掘 计算机科学 鉴定(生物学) 组分(热力学) 计量经济学 统计 数学 人工智能 生物 热力学 植物 物理 凝聚态物理
作者
Xiu Liu,Chris Aldrich
出处
期刊:Fuel [Elsevier BV]
卷期号:335: 126891-126891 被引量:12
标识
DOI:10.1016/j.fuel.2022.126891
摘要

Modelling the characteristics and composition of coal is important, as proximity data and other measurements to do so are typically expensive or hard to acquire in real-time. Understanding anomalies in these relatively small data sets are important, as removal may result in an unnecessary loss of data or bias in the data used in the model. Although anomaly detection has been considered in-depth in the literature, very little work has been devoted to the explanation of anomalies. In this paper, a general anomaly detection and identification methodology is considered, based on three models, viz an isolation forest, a random forest and a tree SHAP explanatory model. Three case studies related to the composition of coal and coal processing are considered. In these case studies, the IF-RF-SHAP approach identified outliers of data anomalies not identifiable with principal component analysis. The model is a new variant of some of the integrated approaches that have recently been considered. Further contribution of the study lies in the empirical comparison of IF anomaly scores with distance-based and reconstruction-based anomaly scores generated with principal component models. In the case studies considered, the IF anomaly scores were better able to identify anomalies in the data than the scores derived from the principal component models. As a result, the methodology can complement distance-based approaches, such as principal component analysis, to explain anomalies or outliers detected in data. Apart from the proposed IF-RF-SHAP approach, four approaches to compare the contributions of variables in random forest models are considered as well. These were simple correlation of individual predictors with anomaly scores of samples, random forest prediction based on an impurity criterion, random forest prediction based on a permutation criterion, as well as the tree SHAP approach. If the latter is considered as a benchmark, then the impurity criterion gave the most reliable results, while simple predictor correlations gave the least reliable results.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
1秒前
mof发布了新的文献求助10
4秒前
5秒前
HMethod完成签到 ,获得积分10
5秒前
科研通AI5应助小笛ing采纳,获得10
5秒前
6秒前
英俊的铭应助热心的沂采纳,获得10
6秒前
7秒前
10秒前
陈麦子完成签到,获得积分10
10秒前
xiao晓发布了新的文献求助10
12秒前
13秒前
科研通AI5应助mof采纳,获得10
14秒前
14秒前
ycy2019发布了新的文献求助30
17秒前
17秒前
稳重向南发布了新的文献求助10
19秒前
大模型应助alrist采纳,获得10
24秒前
无畏完成签到 ,获得积分10
28秒前
搜集达人应助Kvolu29采纳,获得10
30秒前
阿桂完成签到 ,获得积分10
31秒前
学术卷心菜完成签到,获得积分10
33秒前
番薯发布了新的文献求助10
36秒前
Owen应助辛勤的谷云采纳,获得10
36秒前
Jay发布了新的文献求助10
38秒前
孟繁荣发布了新的文献求助10
38秒前
NTz完成签到 ,获得积分10
40秒前
可莉完成签到 ,获得积分10
41秒前
43秒前
47秒前
鹤野完成签到 ,获得积分10
48秒前
CHENXIN532完成签到,获得积分10
52秒前
Aspirin发布了新的文献求助10
54秒前
Ryujinisfine完成签到,获得积分10
56秒前
科研通AI5应助斐嘿嘿采纳,获得10
56秒前
牛牛发布了新的文献求助10
57秒前
57秒前
Lucas应助杀手猪猫采纳,获得10
58秒前
李爱国应助稳重向南采纳,获得10
58秒前
稳重母鸡完成签到 ,获得积分10
59秒前
高分求助中
Applied Survey Data Analysis (第三版, 2025) 800
Assessing and Diagnosing Young Children with Neurodevelopmental Disorders (2nd Edition) 700
The Martian climate revisited: atmosphere and environment of a desert planet 500
Images that translate 500
Handbook of Innovations in Political Psychology 400
Mapping the Stars: Celebrity, Metonymy, and the Networked Politics of Identity 400
Towards a spatial history of contemporary art in China 300
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 物理 生物化学 纳米技术 计算机科学 化学工程 内科学 复合材料 物理化学 电极 遗传学 量子力学 基因 冶金 催化作用
热门帖子
关注 科研通微信公众号,转发送积分 3843083
求助须知:如何正确求助?哪些是违规求助? 3385237
关于积分的说明 10539696
捐赠科研通 3105846
什么是DOI,文献DOI怎么找? 1710642
邀请新用户注册赠送积分活动 823719
科研通“疑难数据库(出版商)”最低求助积分说明 774205