Method for Selecting a Data Imputation Model Based on Programming by Example for Data Analysts

离群值 计算机科学 插补(统计学) 数据预处理 缺少数据 数据挖掘 数据类型 预处理器 推论 机器学习 数据建模 人工智能 数据库 程序设计语言
作者
Hiroko Nagashima,Yuka Kato
出处
期刊:International Conference on Big Data 被引量:1
标识
DOI:10.1109/bigdata50022.2020.9377818
摘要

Recent years have seen an increase in the use of data acquired by sensors and wearable devices. However, depending on the type of sensor or wearable device, the data may be irregular with missing data, outliers, and different units of measurement. The use of these data as direct input into a machine-learning model would not produce the correct results. Therefore, analysts would be required to pre-process the data before data analysis to obtain accurate results. In particular, sensor data may contain more outliers and missing data because of network congestion and the limited life of sensor batteries than data acquired by other means. To efficiently perform such preprocessing, we previously proposed APREP-S (automatic preprocessing of sensor data) using Bayesian inference based on programming by example. APREP-S defines one model for each imputation method, as the workflow selects models based on the features of the imputation area. Therefore, this APREP-S model must be regenerated when data with a different periodicity are used. In other words, depending on whether the data are affected by the weekday or weekend, weather conditions, seasons, etc., the imputation model would have to be generated to consider these features. In this study, we enhanced the method for selecting the optimal imputation model in APREP-S, allowing multiple models to be defined for each input method. We evaluated APREP-S, which uses two types of data, by the mean squared error of these data: 1) human activity data as short-term periodic data, and 2) temperature and humidity data as long-term periodic data. As a result, we concluded that APREP-S is an efficient imputation method.

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
lee完成签到,获得积分10
刚刚
1秒前
1秒前
1秒前
2秒前
3秒前
yeran37发布了新的文献求助10
3秒前
Wang发布了新的文献求助10
3秒前
危机发布了新的文献求助10
4秒前
krkczs发布了新的文献求助10
4秒前
橙子发布了新的文献求助30
5秒前
5秒前
岁岁菌发布了新的文献求助10
5秒前
科研通AI6.2应助glory0510采纳,获得10
6秒前
6秒前
柠檬不萌完成签到,获得积分10
7秒前
cx完成签到,获得积分10
7秒前
大模型应助武明进采纳,获得10
7秒前
Gotye0829发布了新的文献求助10
7秒前
8秒前
9秒前
无极微光应助吴毅采纳,获得20
10秒前
10秒前
不喝咖啡会死完成签到,获得积分10
10秒前
大个应助xxx采纳,获得10
10秒前
岁岁菌发布了新的文献求助10
11秒前
kk发布了新的文献求助10
11秒前
核桃发布了新的文献求助10
11秒前
Tunny完成签到 ,获得积分10
11秒前
布丁发布了新的文献求助10
11秒前
青冥之外发布了新的文献求助10
11秒前
顾矜应助zhangzhang采纳,获得10
11秒前
充电宝应助明亮的嚣采纳,获得10
13秒前
共享精神应助krkczs采纳,获得10
13秒前
雷总发布了新的文献求助10
13秒前
Akim应助JIAO采纳,获得10
14秒前
老猪佩奇发布了新的文献求助10
14秒前
15秒前
16秒前
16秒前
高分求助中
Adhesion Science: Principles & Practice 1234
Signals, Systems, and Signal Processing 610
The Resilient Mindset 400
Impact of Storage Orientation and Duration on Prefilled Syringe Performance: Break-Loose and Glide Forces, and Injection Time Across Multiple Time Points 360
Programming for Chemical Engineers Using C, C++, and MATLAB 300
Upland Kenya wild flowers and ferns: a flora of the flowers, ferns, grasses, and sedges of highland Kenya 300
Disturbing the Quiet Life? Competition and CEO Incentives 300
热门求助领域 (近24小时)
化学 材料科学 医学 生物 纳米技术 工程类 有机化学 化学工程 生物化学 计算机科学 物理 内科学 复合材料 催化作用 物理化学 光电子学 电极 细胞生物学 基因 无机化学
热门帖子
关注 科研通微信公众号,转发送积分 6653013
求助须知:如何正确求助?哪些是违规求助? 8406837
关于积分的说明 17975618
捐赠科研通 5848877
什么是DOI,文献DOI怎么找? 2971903
邀请新用户注册赠送积分活动 1947460
关于科研通互助平台的介绍 1868125