Improving 3-day deterministic air pollution forecasts using machine learning algorithms

空气质量指数 梯度升压 气象学 空气污染 随机森林 污染 环境科学 污染物 计算机科学 算法 机器学习 地理 生态学 化学 有机化学 生物
作者
Zhiguo Zhang,Christer Johansson,Magnuz Engardt,Massimo Stafoggia,Xiaoliang Ma
出处
期刊:Atmospheric Chemistry and Physics 卷期号:24 (2): 807-851
标识
DOI:10.5194/acp-24-807-2024
摘要

Abstract. As air pollution is regarded as the single largest environmental health risk in Europe it is important that communication to the public is up to date and accurate and provides means to avoid exposure to high air pollution levels. Long- and short-term exposure to outdoor air pollution is associated with increased risks of mortality and morbidity. Up-to-date information on present and coming days' air quality helps people avoid exposure during episodes with high levels of air pollution. Air quality forecasts can be based on deterministic dispersion modelling, but to be accurate this requires detailed information on future emissions, meteorological conditions and process-oriented dispersion modelling. In this paper, we apply different machine learning (ML) algorithms – random forest (RF), extreme gradient boosting (XGB), and long short-term memory (LSTM) – to improve 1, 2, and 3 d deterministic forecasts of PM10, NOx, and O3 at different sites in Greater Stockholm, Sweden. It is shown that the deterministic forecasts can be significantly improved using the ML models but that the degree of improvement of the deterministic forecasts depends more on pollutant and site than on what ML algorithm is applied. Also, four feature importance methods, namely the mean decrease in impurity (MDI) method, permutation method, gradient-based method, and Shapley additive explanations (SHAP) method, are utilized to identify significant features that are common and robust across all models and methods for a pollutant. Deterministic forecasts of PM10 are improved by the ML models through the input of lagged measurements and Julian day partly reflecting seasonal variations not properly parameterized in the deterministic forecasts. A systematic discrepancy by the deterministic forecasts in the diurnal cycle of NOx is removed by the ML models considering lagged measurements and calendar data like hour and weekday, reflecting the influence of local traffic emissions. For O3 at the urban background site, the local photochemistry is not properly accounted for by the relatively coarse Copernicus Atmosphere Monitoring Service ensemble model (CAMS) used here for forecasting O3 but is compensated for using the ML models by taking lagged measurements into account. Through multiple repetitions of the training process, the resulting ML models achieved improvements for all sites and pollutants. For NOx at street canyon sites, mean squared error (MSE) decreased by up to 60 %, and seven metrics, such as R2 and mean absolute percentage error (MAPE), exhibited consistent results. The prediction of PM10 is improved significantly at the urban background site, whereas the ML models at street sites have difficulty capturing more information. The prediction accuracy of O3 also modestly increased, with differences between metrics. Further work is needed to reduce deviations between model results and measurements for short periods with relatively high concentrations (peaks) at the street canyon sites. Such peaks can be due to a combination of non-typical emissions and unfavourable meteorological conditions, which are rather difficult to forecast. Furthermore, we show that general models trained using data from selected street sites can improve the deterministic forecasts of NOx at the station not involved in model training. For PM10 this was only possible using more complex LSTM models. An important aspect to consider when choosing ML algorithms is the computational requirements for training the models in the deployment of the system. Tree-based models (RF and XGB) require fewer computational resources and yield comparable performance in comparison to LSTM. Therefore, tree-based models are now implemented operationally in the forecasts of air pollution and health risks in Stockholm. Nevertheless, there is big potential to develop generic models using advanced ML to take into account not only local temporal variation but also spatial variation at different stations.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
大幅提高文件上传限制,最高150M (2024-4-1)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
奋斗雁枫完成签到,获得积分10
3秒前
3秒前
林夕完成签到 ,获得积分10
3秒前
莉亚发布了新的文献求助10
3秒前
蔡孟发布了新的文献求助10
3秒前
Smes完成签到,获得积分10
7秒前
jie发布了新的文献求助10
7秒前
欢呼便当发布了新的文献求助30
8秒前
雪碧呀发布了新的文献求助10
8秒前
牧心24完成签到,获得积分10
8秒前
8秒前
9秒前
药化发布了新的文献求助10
11秒前
JamesPei应助jie采纳,获得10
12秒前
强度发布了新的文献求助10
13秒前
SciGPT应助鲸鱼采纳,获得10
13秒前
15秒前
小蘑菇应助xin采纳,获得10
15秒前
16秒前
linjiebro发布了新的文献求助10
18秒前
Hao应助积极的爆米花采纳,获得10
18秒前
Mr_W完成签到,获得积分10
18秒前
莉亚完成签到,获得积分10
18秒前
幽默皮皮虾完成签到 ,获得积分10
20秒前
22秒前
22秒前
笨笨甜瓜发布了新的文献求助10
24秒前
大胆的兔子关注了科研通微信公众号
25秒前
25秒前
娃哈哈完成签到,获得积分10
25秒前
李健的小迷弟应助xiw采纳,获得30
26秒前
鲸鱼发布了新的文献求助10
26秒前
26秒前
沉默的老虎完成签到 ,获得积分10
26秒前
27秒前
27秒前
lalala应助积极的爆米花采纳,获得10
28秒前
钱浩发布了新的文献求助20
28秒前
潘爱玲发布了新的文献求助10
29秒前
30秒前
高分求助中
【本贴是提醒信息,请勿应助】请在求助之前详细阅读求助说明!!!! 20000
One Man Talking: Selected Essays of Shao Xunmei, 1929–1939 1000
The Three Stars Each: The Astrolabes and Related Texts 900
Yuwu Song, Biographical Dictionary of the People's Republic of China 800
Multifunctional Agriculture, A New Paradigm for European Agriculture and Rural Development 600
Challenges, Strategies, and Resiliency in Disaster and Risk Management 500
Bernd Ziesemer - Maos deutscher Topagent: Wie China die Bundesrepublik eroberte 500
热门求助领域 (近24小时)
化学 材料科学 医学 生物 有机化学 工程类 生物化学 纳米技术 物理 内科学 计算机科学 化学工程 复合材料 遗传学 基因 物理化学 催化作用 电极 光电子学 量子力学
热门帖子
关注 科研通微信公众号,转发送积分 2482115
求助须知:如何正确求助?哪些是违规求助? 2144570
关于积分的说明 5470479
捐赠科研通 1867037
什么是DOI,文献DOI怎么找? 928005
版权声明 563071
科研通“疑难数据库(出版商)”最低求助积分说明 496485