医学
数据质量
一致性
前列腺切除术
医学物理学
前列腺癌
缺少数据
癌症
肿瘤科
计算机科学
内科学
数据挖掘
机器学习
运营管理
经济
公制(单位)
作者
Henry M. Spotnitz,John Giannini,Emily Clark,Yechiam Ostchega,Tamara R. Litwin,Stephanie L. Goff,Lewis E. Berman
摘要
PURPOSE Cancer is a leading cause of morbidity and mortality in the United States. Mapping electronic health record (EHR) data to the Observational Medical Outcomes Partnership Common Data Model (OMOP CDM) may standardize data structure and allow for multiple database oncology studies. However, the number of oncology studies produced with the OMOP CDM has been low. To investigate the discrepancy between the public health impact of cancer and the output of OMOP CDM clinical cancer studies, we evaluated (EHR) data quality of five surgical oncology cohorts in the All of Us Research Program: mastectomy, prostatectomy, colectomy, melanoma excision, and lung cancer resection. METHODS We selected procedure codes that were the basis of each phenotype. We used a data quality checklist to evaluate five domains systematically: conformance, completeness, concordance, plausibility, and temporality. RESULTS Most phenotype-defining source codes were mapped to Current Procedural Terminology 4, which is an EHR standard. All cohorts had low concept prevalence. Most bivariate correlations between concepts were weak (⍴ ≤ 0.5). The small number of biomarkers available for use limited our plausibility analysis. The median time between biopsy and surgery varied across cohorts. CONCLUSION We identified multiple data completeness issues, which limited the fitness for use evaluation. Also, using the OMOP CDM procedure concepts and mappings presented challenges for our study. Variable amounts of missingness in OMOP CDM surgical oncology data may affect the fitness for use of cancer data. Further research is warranted to improve the quality of that data.
科研通智能强力驱动
Strongly Powered by AbleSci AI