Effect of feature optimization on performance of machine learning models for predicting traffic incident duration

计算机科学 离群值 人工智能 特征选择 支持向量机 机器学习 特征(语言学) 预测建模 数据挖掘 异方差 特征工程 主成分分析 人工神经网络 偏斜 深度学习 统计 哲学 语言学 数学
作者
Lubna Obaid,Khaled Hamad,Mohamad Ali Khalil,Ali Bou Nassif
出处
期刊:Engineering Applications of Artificial Intelligence [Elsevier]
卷期号:131: 107845-107845
标识
DOI:10.1016/j.engappai.2024.107845
摘要

Developing a high-performing traffic incident-duration prediction model is considered a key component for evaluating the impact of these incidents on the roadway network. Various research studies have developed robust incident-duration prediction models. Still, they have faced many issues in providing an accurate prediction result due to the countless data modeling issues, such as complex correlations, highly skewed data distributions, heteroscedasticity, and outliers. This paper investigates the impact of feature optimization (FO) - a relatively new term encompassing two already-known topics: feature engineering (FE) and feature selection (FS) techniques - on the performance of several machine learning models developed for predicting incident durations. The models developed included multivariate linear regression, decision trees, support vector regressors, K-Nearest Neighbors regression, ensembles, and artificial neural networks. Various FO techniques have been used for each model to derive the massive traffic incidents dataset and repeat the prediction process. Our results show that the proposed filtering, wrapper, and embedded FS techniques can successfully reduce the number of features without sacrificing the prediction performance. Using log-normal transformation to deal with continuous data skewness, min-max normalization to deal with data variability, and principal component analysis (PCA) to reform the dataset into a smaller independent feature subset, FE techniques can enhance the accuracy of incident duration estimation over the assessed ML models. The best-performing FE technique was the PCA since performance improvements were observed across all developed ML models. The best-performing FS technique was the Recursive Feature Elimination, outperforming other tested techniques in reducing model complexity while maintaining model accuracy.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
大幅提高文件上传限制,最高150M (2024-4-1)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
刚刚
大大小完成签到,获得积分10
2秒前
萌妹完成签到 ,获得积分10
5秒前
1地点完成签到,获得积分20
8秒前
个性的紫菜应助小杨采纳,获得10
8秒前
敬老院1号应助优雅灵波采纳,获得50
9秒前
9秒前
12秒前
为天地立心完成签到,获得积分20
15秒前
方非笑应助YYC2022采纳,获得30
15秒前
17秒前
21秒前
21秒前
21秒前
小蘑菇应助lfg采纳,获得10
23秒前
Zzz完成签到 ,获得积分10
23秒前
称心映阳完成签到 ,获得积分10
27秒前
栗苒发布了新的文献求助30
27秒前
研友_VZG7GZ应助tisansmar采纳,获得50
27秒前
29秒前
30秒前
31秒前
儒雅的夜雪完成签到,获得积分10
32秒前
lfg发布了新的文献求助10
35秒前
江岸与城发布了新的文献求助20
36秒前
keyanwang完成签到 ,获得积分10
38秒前
Lucas应助飞翔的桃仔采纳,获得10
40秒前
诚心小虾米完成签到,获得积分10
40秒前
在水一方应助yy采纳,获得10
41秒前
42秒前
可爱的函函应助wsbkeyanTong采纳,获得10
46秒前
48秒前
49秒前
53秒前
55秒前
57秒前
59秒前
NexusExplorer应助专一的白凝采纳,获得10
1分钟前
1分钟前
陈强完成签到,获得积分10
1分钟前
高分求助中
Teaching Social and Emotional Learning in Physical Education 900
Chinese-English Translation Lexicon Version 3.0 500
[Lambert-Eaton syndrome without calcium channel autoantibodies] 440
Plesiosaur extinction cycles; events that mark the beginning, middle and end of the Cretaceous 400
Two-sample Mendelian randomization analysis reveals causal relationships between blood lipids and venous thromboembolism 400
薩提亞模式團體方案對青年情侶輔導效果之研究 400
3X3 Basketball: Everything You Need to Know 310
热门求助领域 (近24小时)
化学 材料科学 医学 生物 有机化学 工程类 生物化学 纳米技术 物理 内科学 计算机科学 化学工程 复合材料 遗传学 基因 物理化学 催化作用 电极 光电子学 量子力学
热门帖子
关注 科研通微信公众号,转发送积分 2386566
求助须知:如何正确求助?哪些是违规求助? 2093010
关于积分的说明 5266833
捐赠科研通 1819839
什么是DOI,文献DOI怎么找? 907803
版权声明 559181
科研通“疑难数据库(出版商)”最低求助积分说明 484911