Effect of feature optimization on performance of machine learning models for predicting traffic incident duration

计算机科学离群值人工智能特征选择支持向量机机器学习特征（语言学）预测建模数据挖掘异方差特征工程主成分分析人工神经网络偏斜深度学习统计哲学语言学数学

作者

Lubna Obaid,Khaled Hamad,Mohamad Ali Khalil,Ali Bou Nassif

出处

期刊：Engineering Applications of Artificial Intelligence [Elsevier]
日期：2024-05-01 卷期号：131: 107845-107845

标识

DOI：10.1016/j.engappai.2024.107845

摘要

Developing a high-performing traffic incident-duration prediction model is considered a key component for evaluating the impact of these incidents on the roadway network. Various research studies have developed robust incident-duration prediction models. Still, they have faced many issues in providing an accurate prediction result due to the countless data modeling issues, such as complex correlations, highly skewed data distributions, heteroscedasticity, and outliers. This paper investigates the impact of feature optimization (FO) - a relatively new term encompassing two already-known topics: feature engineering (FE) and feature selection (FS) techniques - on the performance of several machine learning models developed for predicting incident durations. The models developed included multivariate linear regression, decision trees, support vector regressors, K-Nearest Neighbors regression, ensembles, and artificial neural networks. Various FO techniques have been used for each model to derive the massive traffic incidents dataset and repeat the prediction process. Our results show that the proposed filtering, wrapper, and embedded FS techniques can successfully reduce the number of features without sacrificing the prediction performance. Using log-normal transformation to deal with continuous data skewness, min-max normalization to deal with data variability, and principal component analysis (PCA) to reform the dataset into a smaller independent feature subset, FE techniques can enhance the accuracy of incident duration estimation over the assessed ML models. The best-performing FE technique was the PCA since performance improvements were observed across all developed ML models. The best-performing FS technique was the Recursive Feature Elimination, outperforming other tested techniques in reducing model complexity while maintaining model accuracy.

求助该文献

最长约 10秒，即可获得该文献文件

Effect of feature optimization on performance of machine learning models for predicting traffic incident duration

今日热心研友