摘要
The shape of microplastics (MPs) matters. Yet, the expert-based shape classification is labor intensive, time consuming, and susceptible to human biases. In this study, we investigated deep learning-based approaches for automating shape classifications of MP hyperspectral images, thus achieving a faster and more accurate classification procedure. Here, nine deep learning architectures (NN1.1, NN 1.2, CNN 1.1, CNN 1.2, CNN 1.3, VGG16, ResNet50, ResNet50 V2, and MobileNet) were tested and further compared in terms of their performance discrepancies across four data sets (original, augmented original, refined, and augmented refined data sets). Our sample images comprise the hyperspectral images of 11,042 environmental MP, (particle sizes down to 10 μm) analyzed with micro-Fourier transform infrared spectroscopy, covering seven environmental matrices (wastewater influent, effluent, sludge, marine water, stormwater, sediments from stormwater ponds, and indoor air). Nine shape categories, including fiber, rod, ellipse, oval, sphere, quadrilateral, triangle, free-form, and unidentifiable were applied as reference shapes. Based on the comparison test, three main findings are outlined: (a) Model architecture influences MP shape classification significantly, where CNNs outperform NNs and transfer learning-based models outperform nontransfer learning-based models. Notably, MobileNet achieves the highest accuracy of 0.93 and 1.00 in validation/test data sets, respectively. (b) Data quality matters for shape classification, where complex models demonstrate robust performance across data sets while simple models are more sensitive to data quality changes. (c) In contrast to manual assessment, the deep learning approach has achieved an automated shape classification process for hyperspectral images, which reduces the consumption of labor and time while increasing efficiency significantly. Yet, challenges and potential remain, particularly regarding model architecture and data quality, highlighting the need for robust designs and complementing high-quality data sets for optimal classification.