数据收集
计算机科学
数据质量
质量保证
脚本语言
自动化
领域(数学)
数据挖掘
软件
数据科学
统计
工程类
外部质量评估
纯数学
程序设计语言
公制(单位)
操作系统
机械工程
数学
运营管理
作者
Albert Y. Kim,Valentine Herrmann,Ross Barreto,Brianna Calkins,Erika Gonzalez‐Akre,Daniel J. Johnson,Jennifer A. Jordan,Lukas Magee,Ian R. McGregor,Nicolle Montero,Karl Novak,Teagan Rogers,Jessica Shue,Kristina J. Anderson‐Teixeira
标识
DOI:10.1111/2041-210x.13982
摘要
Abstract Accurate field data are essential to understanding ecological systems and forecasting their responses to global change. Yet, data collection errors are common, and data analysis often lags far enough behind its collection that many errors can no longer be corrected, nor can anomalous observations be revisited. Needed is a system in which data quality assurance and control (QA/QC), along with the production of basic data summaries, can be automated immediately following data collection. Here, we implement and test a system to satisfy these needs. For two annual tree mortality censuses and a dendrometer band survey at two forest research sites, we used GitHub Actions continuous integration (CI) to automate data QA/QC and run routine data wrangling scripts to produce cleaned datasets ready for analysis. This system automation had numerous benefits, including (1) the production of near real‐time information on data collection status and errors requiring correction, resulting in final datasets free of detectable errors, (2) an apparent learning effect among field technicians, wherein original error rates in field data collection declined significantly following implementation of the system, and (3) an assurance of computational reproducibility—that is, robustness of the system to changes in code, data and software. By implementing CI, researchers can ensure that datasets are free of any errors for which a test can be coded. The result is dramatically improved data quality, increased skill among field technicians, and reduced need for expert oversight. Furthermore, we view CI implementation as a first step towards a data collection and analysis pipeline that is also more responsive to rapidly changing ecological dynamics, making it better suited to study ecological systems in the current era of rapid environmental change.
科研通智能强力驱动
Strongly Powered by AbleSci AI