摘要
InfoMetricsFiguresRef. Environmental Science & TechnologyASAPArticle This publication is free to access through this site. Learn More CiteCitationCitation and abstractCitation and referencesMore citation options ShareShare onFacebookX (Twitter)WeChatLinkedInRedditEmailJump toExpandCollapse ViewpointFebruary 12, 2025Small Data Insights for Groundwater ManagementClick to copy article linkArticle link copied!Zi ZhanZi ZhanSchool of Environmental and Chemical Engineering, Shanghai University, Shanghai 200444, ChinaMore by Zi ZhanView BiographyYaqiang Wei*Yaqiang WeiSchool of Environmental and Chemical Engineering, Shanghai University, Shanghai 200444, ChinaState Environmental Protection Key Laboratory of Source Apportionment and Control of Aquatic Pollution, China University of Geosciences, Wuhan 430078, China*Email: [email protected]More by Yaqiang Weihttps://orcid.org/0000-0001-6317-4735Tian-Chyi Jim YehTian-Chyi Jim YehDepartment of Hydrology and Atmospheric Science, University of Arizona, Tucson, Arizona 85721, United StatesMore by Tian-Chyi Jim YehYiran ChenYiran ChenSchool of Environmental and Chemical Engineering, Shanghai University, Shanghai 200444, ChinaMore by Yiran ChenYuling ChenYuling ChenSchool of Environmental and Chemical Engineering, Shanghai University, Shanghai 200444, ChinaMore by Yuling ChenYu LiYu LiSchool of Environmental and Chemical Engineering, Shanghai University, Shanghai 200444, ChinaMore by Yu LiJiao ZhangJiao ZhangSchool of Environmental and Chemical Engineering, Shanghai University, Shanghai 200444, ChinaMore by Jiao ZhangYi Wen*Yi WenTechnical Centre for Soil, Agriculture and Rural Ecology and Environment, Ministry of Ecology and Environment, Beijing 100012, China*Email: [email protected]More by Yi WenHui Li*Hui LiSchool of Environmental and Chemical Engineering, Shanghai University, Shanghai 200444, China*Email: [email protected]More by Hui LiOpen PDFEnvironmental Science & TechnologyCite this: Environ. Sci. Technol. 2025, XXXX, XXX, XXX-XXXClick to copy citationCitation copied!https://pubs.acs.org/doi/10.1021/acs.est.5c01025https://doi.org/10.1021/acs.est.5c01025Published February 12, 2025 Publication History Received 21 January 2025Published online 12 February 2025article-commentary© 2025 American Chemical Society. This publication is available under these Terms of Use. Request reuse permissionsThis publication is licensed for personal use by The American Chemical Society. ACS Publications© 2025 American Chemical SocietySubjectswhat are subjectsArticle subjects are automatically applied from the ACS Subject Taxonomy and describe the scientific concepts and themes of the article.GroundwatersImpuritiesOptimizationPhysical and chemical propertiesRemediationData scarcity poses significant challenges in environmental fields, including agricultural systems, (1) ecosystem management, (2) and water resource assessment, (3) where incomplete or fragmented data sets often hinder accurate analysis and decision-making. These challenges are particularly pronounced in groundwater systems, where the concealed nature of subsurface environments, logistical difficulties, and high monitoring costs severely limit data collection. Furthermore, the spatial heterogeneity of geological formations and the fragmented nature of available data sets amplify uncertainties in modeling groundwater flow and contaminant transport. Such complexities necessitate the development of a small data approach that can extract meaningful insights from limited data sets, enabling timely, cost-effective decision-making in groundwater management and enhancing the efficiency of environmental remediation efforts under data-constrained conditions.Origins and Issues of Small DataClick to copy section linkSection link copied!The hidden characteristics of subsurface systems create significant challenges for directly observing soil and groundwater contamination, particularly in complex underground media. (4) Monitoring techniques, such as borehole sampling and intermittent water quality testing, provide localized data but are often fragmented by spatiotemporal discontinuities (5) (Figure 1). This results in substantial data gaps that increase uncertainties in contaminant migration studies in heterogeneous environments with heterogeneity in geological properties. (6) Additionally, the confined nature of groundwater flow often delays contaminant detection, complicating efforts to accurately trace source information. (7) These challenges along with the high costs and logistical difficulties of continuous monitoring frequently result in small data scenarios. Furthermore, gathering large volumes of data quickly is rarely feasible in practical applications, especially under time constraints. (8) Despite their limitations, small data sets often serve as the only available information during critical moments. (9) Consequently, their importance increases as they offer timely and actionable insights that align with the immediate needs of real-world engineering and environmental management. Therefore, effectively leveraging these limited data sets is essential for decision-makers to respond efficiently, optimizing resource allocation and environmental impact.Figure 1Figure 1. Small data generation, enhancement methods, and data analysis within groundwater systems are essential for improving remediation outcomes at contaminated sites. Constrained by high sampling costs and logistical challenges, limited monitoring data collected from boreholes can be leveraged for in-depth analysis at a site with contaminated groundwater. Optimized methods for maximizing the utility of small data not only enhance the evaluation of subsequent remediation efforts but also support the development of efficient, high-precision, and cost-effective remediation strategies.High Resolution ImageDownload MS PowerPoint SlideNecessity of Optimizing Small Data UtilizationClick to copy section linkSection link copied!Researchers have explored the impact of varying the number of pumping tests and monitoring wells to understand how additional data points influence model accuracy in the field of groundwater system characterization. For instance, increasing the number of pumping tests from four to nine and the number of monitoring wells from 49 to 158 yields a <10% improvement in predictive accuracy, (10−12) highlighting the diminishing returns of additional data volume. This phenomenon of diminishing returns suggests that after a balance point, (9) accumulating more data adds limited value, especially when weighed against the associated costs in time, equipment, and computational complexity. (8) Additionally, the expanding data volume can inadvertently introduce redundancy, saturating information to the point where the marginal utility of each new data point decreases. (13) Such redundancy not only increases the time and resources required for model training but also escalates the risk of overfitting, ultimately reducing generalizability and prediction reliability in real-world applications.Researchers have applied Bayesian–MCMC (Markov chain Monte Carlo) methods to pinpoint critical balance points in data accumulation, (9) ensuring that model accuracy is maximized without incurring disproportionate costs or resource demands. Identifying this optimal data threshold becomes essential for guiding practical engineering decisions, particularly in resource-constrained scenarios in which efficiency in data use is paramount. Consequently, small data approaches not only enhance decision accuracy but also significantly reduce costs and improve decision-making efficiency. While methods that rely on large-scale data collection require substantial time and resources for decision-making, a small data approach can extract valuable insights from limited data, enabling quicker responses and more efficient decisions. This is particularly important in resource-constrained environments, where small data approaches help decision-makers strike a better balance between data acquisition and cost investment. By optimizing resource allocation, small data approaches ensure more forward-thinking and reliable decisions in groundwater management. Ultimately, the application of small data approaches improves decision efficiency while maintaining accuracy, providing robust support for sustainable development.Lessons from Small Data Applications across FieldsClick to copy section linkSection link copied!Recognizing the importance of small data research, existing studies show that small data-based prediction techniques can deliver accurate contaminant migration models even without extensive data sets, offering reliable guidance for remediation strategies. (14) For instance, in molecular science, in which data acquisition can be highly constrained, machine learning models like random forests and support vector machines have been applied to predict drug–target interactions and drug toxicity with promising results. (15−17) Similarly, researchers have addressed small data challenges by utilizing data augmentation and transfer learning, combining convolutional neural networks with molecular image data to enhance quantitative structure–activity relationship modeling, enabling accurate predictions of molecular quantum structures and activity relationships in the field of chemistry. (18,19) Ecology has similarly benefited from small sample learning, where it has played a crucial role in understanding habitat requirements and population dynamics of rare species, supporting biodiversity conservation efforts. (20,21) Meanwhile, the shift toward personalized production and small batch manufacturing has highlighted the limitations of traditional big data-based predictive models, which are often prone to overfitting in quality management. To counter this, researchers have incorporated data augmentation techniques, such as recurrent variational autoencoders and conditional generative adversarial networks, alongside optimization algorithms like proximal policy optimization and support vector regression. This approach has improved product quality predictions in small sample environments. (22,23) Additionally, in environmental science and hydrological modeling, innovative methods combining long short-term memory networks with prototypical networks have been successfully employed to address runoff prediction challenges in data-scarce river basins, effectively overcoming the limitations posed by sparse hydrological data. (24,25) With respect to hydrogeology, timely optimization of resource utilization and minimization of response times can enhance the understanding of contaminant migration in groundwater systems. Approaches that focus on extracting insights from limited data sets have significant potential, not only in addressing challenges in predictive medicine, ecology, and manufacturing but also in advancing the analysis of data-limited systems in fields like climate modeling, urban planning, and biodiversity monitoring.Small Data Opportunities in GroundwaterClick to copy section linkSection link copied!Building on the achievements of small data applications across disciplines, promising and underutilized opportunities exist within hydrogeology, particularly in complex and concealed groundwater systems where traditional large data approaches often fall short. Small sample learning has demonstrated its value in managing data scarcity in fields such as molecular science, where predictive accuracy is achieved through limited data sets. However, similar approaches are not yet widely adopted in hydrogeological studies, where modeling often depends on well-established, data-intensive models like MODFLOW and MT3DMS or on machine learning surrogates trained with extensive data sets. (26) These models can become unreliable in data-limited situations, highlighting a gap that small data techniques can address.Due to the concealment and heterogeneity of aquifers, as well as logistical constraints and high costs associated with data collection, groundwater systems present an effective test case for small data methods. (27,28) These factors hinder large data set acquisition, thereby increasing uncertainty in contaminant transport modeling. Furthermore, traditional hydrogeological surveys face challenges from geological heterogeneity and limited monitoring data. (29,30) Common interpolation methods, such as kriging and inverse distance weighting, assume data stationarity, which limits their ability to address data complexity and nonstationarity. (27) While expanding the monitoring network can improve data availability, the high costs and logistical challenges restrict the potential for accuracy improvements in data-scarce situations. (31)Given these limitations, hydraulic tomography (HT) differs from interpolation methods by directly capturing the heterogeneity and connectivity of permeable zones, enhancing parameter estimation and contaminant transport predictions under data-scarce conditions. (32−35) Despite these advantages, residual uncertainties remain a major challenge under small data conditions. To address these uncertainties, advanced optimization methods like genetic algorithms, particle swarm optimization, and Bayesian optimization are increasingly used to improve contaminant distribution characterization. (27) Among these, Bayesian inversion excels at refining posterior distributions and mitigating uncertainties, particularly in small data scenarios. (30,31) Therefore, exploring the combination of these technologies can characterize both the heterogeneity of the aquifer and the distribution of contaminants, while quantifying the associated uncertainties. This approach could establish a promising framework that plays a significant role in groundwater management, where small data phenomena are common. Building on these advancements, small data approaches have proven to be valuable in not only groundwater modeling but also other fields of groundwater research. Data scarcity often arises in environmental monitoring due to limited sampling points, low monitoring frequency, and the infrequency of contamination events, which impacts the timely and accurate identification of sources and contaminant distribution predictions. (36) To address this issue, small data methods such as transfer learning, generative adversarial networks (GANs), and data augmentation can extract knowledge or generate additional data from limited data sets, (37,38) filling data gaps and thereby enhancing model generalization and predictive accuracy (Figure 1).Challenges and ProspectsClick to copy section linkSection link copied!Achieving model accuracy and generalizability is fraught with several distinct challenges in small data scenarios within concealed groundwater systems. First, the heterogeneity of aquifer parameters and the spatial variability of hydrogeological properties hinder model generalization, causing overfitting and reduced accuracy in broader applications. (39) Second, as the volume of data increases, the marginal benefit of additional data diminishes, making it difficult to identify the point at which further data introduce redundancy and additional costs without significantly improving model performance. (9) Lastly, determining the optimal volume of data for groundwater management is crucial to balance model accuracy and resource efficiency, enabling cost-effective practices without compromising predictive capabilities.Addressing these challenges requires strategies to enhance model robustness, including integrating physical models with regularization techniques and data augmentation to reduce overfitting and improve generalizability. (14,40) These methods allow models to adapt more effectively to diverse hydrogeological conditions and mitigate overfitting risks. Advanced Bayesian and filtering methods optimize data use by dynamically estimating parameters and quantifying uncertainty, (9,41) allowing for efficient monitoring with limited data sets and identifying when further data collection becomes redundant. In combination, expanding data sets through augmentation and imputation, (15) alongside ensemble learning techniques, (42) further enhances model stability. (43) In addition, semisupervised techniques enhance generalization by leveraging unlabeled data. (44)While the existing methods offer advantages in addressing the challenges posed by small data, they still have inherent limitations. Regularization techniques designed to mitigate overfitting in data-scarce scenarios may not effectively address data gaps in regions with significant heterogeneity. (14) The extreme variability in hydraulic properties in such areas means that regularization cannot fully capture local differences, and data augmentation may not accurately replicate this complexity, potentially introducing artificial patterns that fail to reflect the true characteristics of the aquifer. (15) Similarly, Bayesian methods are valuable for uncertainty quantification and are highly dependent on the quality of prior knowledge. (45) In groundwater studies with sparse or imprecise prior information, this dependency can bias posterior distributions, compromising parameter estimates like hydraulic conductivity. When prior data are limited or unreliable, Bayesian methods may yield misleading results, complicating decision-making in groundwater management. (46) Advances in small data methodologies are transforming groundwater management by enhancing the understanding of complex systems, enabling precise environmental control, and fostering sustainable practices. These innovations address data-limited challenges, promote resilient ecosystems, and support informed, long-term resource conservation policies.Author InformationClick to copy section linkSection link copied!Corresponding AuthorsYaqiang Wei - School of Environmental and Chemical Engineering, Shanghai University, Shanghai 200444, China; State Environmental Protection Key Laboratory of Source Apportionment and Control of Aquatic Pollution, China University of Geosciences, Wuhan 430078, China; https://orcid.org/0000-0001-6317-4735; Email: [email protected]Yi Wen - Technical Centre for Soil, Agriculture and Rural Ecology and Environment, Ministry of Ecology and Environment, Beijing 100012, China; Email: [email protected]Hui Li - School of Environmental and Chemical Engineering, Shanghai University, Shanghai 200444, China; Email: [email protected]AuthorsZi Zhan - School of Environmental and Chemical Engineering, Shanghai University, Shanghai 200444, ChinaTian-Chyi Jim Yeh - Department of Hydrology and Atmospheric Science, University of Arizona, Tucson, Arizona 85721, United StatesYiran Chen - School of Environmental and Chemical Engineering, Shanghai University, Shanghai 200444, ChinaYuling Chen - School of Environmental and Chemical Engineering, Shanghai University, Shanghai 200444, ChinaYu Li - School of Environmental and Chemical Engineering, Shanghai University, Shanghai 200444, ChinaJiao Zhang - School of Environmental and Chemical Engineering, Shanghai University, Shanghai 200444, ChinaAuthor ContributionsZ.Z. and Y.W. led the conceptualization, writing, and figure drafting. Y.C., Y.C., and J.Z. assisted with writing and figures. H.L. and Y.W. conceived the ideas and designed the framework. T.-C.J.Y. and Y.L. helped to improve the writing of the paper. All authors approved the final form for publication.NotesThe authors declare no competing financial interest.BiographyClick to copy section linkSection link copied!Zi ZhanHigh Resolution ImageDownload MS PowerPoint SlideDr. Yaqiang Wei is currently an Associate Professor at Shanghai University. He completed his postdoctoral research at Shanghai Jiao Tong University and obtained his Ph.D. from the University of Chinese Academy of Sciences in 2017, with a joint training program at the University of Arizona, United States. His primary research focus is on the migration, transformation, and fate of contaminants in soil and groundwater. He has led several research projects, including a project of key fund and a Youth Fund from the National Natural Science Foundation, and two subprojects under the National Key Research and Development Program for Soil Pollution Causes and Control Technologies. He also serves as a young editorial board member for journals such as Eco-Environment & Health and Agriculture Communications.AcknowledgmentsClick to copy section linkSection link copied!This work was supported by the National Natural Science Foundation of China (42477004, 42330706, and 42125706).ReferencesClick to copy section linkSection link copied! This article references 46 other publications. 1Pradeleix, L.; Roux, P.; Bouarfa, S.; Bellon-Maurel, V. Multilevel Environmental Assessment of Regional Farming Activities with Life Cycle Assessment: Tackling Data Scarcity and Farm Diversity with Life Cycle Inventories Based on Agrarian System Diagnosis. Agricultural Systems 2022, 196, 103328, DOI: 10.1016/j.agsy.2021.103328 Google ScholarThere is no corresponding record for this reference.2Zuquim, G.; Stropp, J.; Moulatlet, G. M.; Van doninck, J.; Quesada, C. A.; Figueiredo, F. O. G.; Costa, F. R. C.; Ruokolainen, K.; Tuomisto, H. Making the Most of Scarce Data: Mapping Soil Gradients in Data-Poor Areas Using Species Occurrence Records. Methods Ecol. Evol. 2019, 10 (6), 788– 801, DOI: 10.1111/2041-210X.13178 Google ScholarThere is no corresponding record for this reference.3Dutta, S.; Das, M. Remote Sensing Scene Classification under Scarcity of Labelled Samples─A Survey of the State-of-the-Arts. Comput. Geosci. 2023, 171, 105295, DOI: 10.1016/j.cageo.2022.105295 Google ScholarThere is no corresponding record for this reference.4Wu, Y.; Xu, M.; Liu, S. Generative Artificial Intelligence: A New Engine for Advancing Environmental Science and Engineering. Environ. Sci. Technol. 2024, 58, 17524, DOI: 10.1021/acs.est.4c07216 Google ScholarThere is no corresponding record for this reference.5Berg, S. J.; Illman, W. A. Capturing Aquifer Heterogeneity: Comparison of Approaches through Controlled Sandbox Experiments. Water Resour. Res. 2011, 47 (9), 1– 17, DOI: 10.1029/2011WR010429 Google ScholarThere is no corresponding record for this reference.6Liu, X.; Illman, W. A.; Craig, A. J.; Zhu, J.; Yeh, T. C. J. Laboratory Sandbox Validation of Transient Hydraulic Tomography. Water Resour. Res. 2007, 43 (5), 1– 13, DOI: 10.1029/2006WR005144 Google ScholarThere is no corresponding record for this reference.7Anshuman, A.; Eldho, T. I. A Parallel Workflow Framework Using Encoder-Decoder LSTMs for Uncertainty Quantification in Contaminant Source Identification in Groundwater. J. Hydrol. 2023, 619, 129296, DOI: 10.1016/j.jhydrol.2023.129296 Google ScholarThere is no corresponding record for this reference.8Liu, F.; Yeh, T. C. J.; Wang, Y. L.; Hao, Y.; Wen, J. C.; Wang, W. Characterization of Basin-Scale Aquifer Heterogeneity Using Transient Hydraulic Tomography with Aquifer Responses Induced by Groundwater Exploitation Reduction. J. Hydrol. 2020, 588, 125137, DOI: 10.1016/j.jhydrol.2020.125137 Google ScholarThere is no corresponding record for this reference.9Yang, R.; Jiang, J.; Pang, T.; Yang, Z.; Han, F.; Li, H.; Wang, H.; Zheng, Y. Crucial Time of Emergency Monitoring for Reliable Numerical Pollution Source Identification. Water Res. 2024, 265 (April), 122303, DOI: 10.1016/j.watres.2024.122303 Google ScholarThere is no corresponding record for this reference.10Hao, Y.; Yeh, T. C. J.; Xiang, J.; Illman, W. A.; Ando, K.; Hsu, K. C.; Lee, C. H. Hydraulic Tomography for Detecting Fracture Zone Connectivity. Ground Water 2008, 46 (2), 183– 192, DOI: 10.1111/j.1745-6584.2007.00388.x Google Scholar10Hydraulic tomography for detecting fracture zone connectivityHao, Yonghong; Yeh, Tian-Chyi J.; Xiang, Jianwei; Illman, Walter A.; Ando, Kenichi; Hsu, Kuo-Chin; Lee, Cheng-HawGround Water (2008), 46 (2), 183-192CODEN: GRWAAP; ISSN:0017-467X. (Blackwell Publishing, Inc.) Fracture zones and their connectivity in geol. media are of great importance to ground water resources management as well as ground water contamination prevention and remediation. In this paper, we applied a recently developed hydraulic tomog. (HT) technique and an anal. algorithm (sequential successive linear estimator) to synthetic fractured media. The application aims to explore the potential utility of the technique and the algorithm for characterizing fracture zone distribution and their connectivity. Results of this investigation showed that using HT with a limited no. of wells, the fracture zone distribution and its connectivity (general pattern) can be mapped satisfactorily although estd. hydraulic property fields are smooth. As the no. of wells and monitoring ports increases, the fracture zone distribution and connectivity become vivid and the estd. hydraulic properties approach true values. We hope that the success of this application may promote the development and application of the new generations of technol. (i.e., hydraulic, tracer, pneumatic tomog. surveys) for mapping fractures and other features in geol. media. >> More from SciFinder ®https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1cXjsVKms7k%253D&md5=d572f0c652ec3f3c86a6043313c0f06a11Wei, Y.; Chen, J.; Li, L.; Zhu, G.; Wen, Y. Estimation of the Hydraulic Properties of a Fractured Aquifer Using Numerical Experiments with the Discrete Fracture Network Model. Hydrol. Sci. J. 2021, 66 (11), 1685– 1694, DOI: 10.1080/02626667.2021.1962887 Google ScholarThere is no corresponding record for this reference.12Zha, Y.; Yeh, T. C. J.; Mao, D.; Yang, J.; Lu, W. Usefulness of Flux Measurements during Hydraulic Tomographic Survey for Mapping Hydraulic Conductivity Distribution in a Fractured Medium. Adv. Water Resour. 2014, 71, 162– 176, DOI: 10.1016/j.advwatres.2014.06.008 Google ScholarThere is no corresponding record for this reference.13Aliouache, M.; Wang, X.; Fischer, P.; Massonnat, G.; Jourde, H. An Inverse Approach Integrating Flowmeter and Pumping Test Data for Three-Dimensional Aquifer Characterization. J. Hydrol. 2021, 603 (PB), 126939, DOI: 10.1016/j.jhydrol.2021.126939 Google ScholarThere is no corresponding record for this reference.14Xu, P.; Ji, X.; Li, M.; Lu, W. Small Data Machine Learning in Materials Science. npj Comput. Mater. 2023, 9 (1), 1– 15, DOI: 10.1038/s41524-023-01000-z Google ScholarThere is no corresponding record for this reference.15Dou, B.; Zhu, Z.; Merkurjev, E.; Ke, L.; Chen, L.; Jiang, J.; Zhu, Y.; Liu, J.; Zhang, B.; Wei, G. W. Machine Learning Methods for Small Data Challenges in Molecular Science. Chem. Rev. 2023, 123 (13), 8736– 8780, DOI: 10.1021/acs.chemrev.3c00189 Google Scholar15Machine Learning Methods for Small Data Challenges in Molecular ScienceDou, Bozheng; Zhu, Zailiang; Merkurjev, Ekaterina; Ke, Lu; Chen, Long; Jiang, Jian; Zhu, Yueying; Liu, Jie; Zhang, Bengong; Wei, Guo-WeiChemical Reviews (Washington, DC, United States) (2023), 123 (13), 8736-8780CODEN: CHREAY; ISSN:0009-2665. (American Chemical Society) A review. Small data are often used in scientific and engineering research due to the presence of various constraints, such as time, cost, ethics, privacy, security, and tech. limitations in data acquisition. However, big data has been the focus for the past decade; small data and its challenges have received little attention, even though they are tech. more severe in machine learning (ML) and deep learning (DL) studies. Overall, the small data challenge is often compounded by issues, such as data diversity, imputation, noise, imbalance, and high-dimensionality. Fortunately, the current big data era is characterized by technol. breakthroughs in ML, DL, and artificial intelligence (AI), which enable data-driven scientific discovery, and many advanced ML and DL technologies developed for big data have inadvertently provided solns. for small data problems. As a result, significant progress has been made in ML and DL for small data challenges in the past decade. In this review, we summarize and analyze several emerging potential solns. to small data challenges in mol. science, including chem. and biol. sciences. We review both basic machine learning algorithms, such as linear regression, logistic regression (LR), k-nearest neighbors (KNN), support vector machine (SVM), kernel learning (KL), random forest (RF), and gradient boosting trees (GBT), and more advanced techniques, including artificial neural network (ANN), convolutional neural network (CNN), U-Net, graph neural network (GNN), generative adversarial network (GAN), long short-term memory (LSTM), autoencoder, transformer, transfer learning, active learning, graph-based semisupervised learning, combining deep learning with traditional machine learning, and phys. model-based data augmentation. We also briefly discuss the latest advances in these methods. Finally, we conclude the survey with a discussion of promising trends in small data challenges in mol. science. >> More from SciFinder ®https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=