作者
Yuliia Kotsyubynska,Nataliia Mykolaivna Kozan,Valeriia Chadiuk,Andrii Kostyshyn,Andrii Kotsyubynsky,V. Fentsyk
摘要
Machine learning and deep learning methods show promise for injury severity prediction, but comprehensive synthesis of their effectiveness, appropriate evaluation metrics, and optimal methodological approaches are lacking. This study was a systematic review and meta-analysis of machine learning and deep learning methods for predicting traffic crash injury severity conducted following PRISMA 2020 guidelines and TRIPOD+AI standards for prediction model reporting. Eligible studies were published between 2014 and 2025 that met the inclusion criteria of: observational studies using neural networks for crash injury severity prediction with reported F1-score, G-mean, sensitivity, or confusion matrix data. In total, 74 studies were analysed including 2,127,059 crash cases. Pooled macro F1-score was 78.6 percent (95% CI: 76.2-81.0%, I²=84%). Transfer learning achieved highest performance (83.2%), followed by transformer/LLM methods (84.7%), hybrid CNN-RNN (81.8%), RNN/LSTM (81.2%), CNN (79.5%), shallow neural networks (76.8%), and conventional machine learning (73.5%). Deep learning significantly outperformed conventional ML (pooled difference 7.7 percentage points, 95% CI: 4.9-10.5, p<0.001). Sample size showed moderate correlation with F1-score (r=0.524, p<0.001). Combined imbalance handling (SMOTE/ADASYN plus class weighting) achieved 81.3% F1 versus 69.8% without handling (difference 11.5 percentage points, p<0.001), raising fatal crash sensitivity from 42.1 to 73.6 percent. Meta-regression explained half (56%) of between-study heterogeneity through sample size, imbalance handling, algorithm type, and study quality. Machine learning and deep learning effectively predict crash injury severity when using appropriate evaluation metrics and adequate samples. Transfer learning and transformers represent state-of-the-art. Sample requirements depend on model complexity rather than fixed thresholds. Combined imbalance handling is essential for minority class detection. Future research should adopt TRIPOD+AI standards, emphasise minority class metrics, assess fairness, and explore multimodal approaches. Implementation should prioritise interpretability and continuous monitoring.