计算机科学                        
                
                                
                        
                            数据仓库                        
                
                                
                        
                            数据质量                        
                
                                
                        
                            数据科学                        
                
                                
                        
                            数据库                        
                
                                
                        
                            数据转换                        
                
                                
                        
                            分析                        
                
                                
                        
                            数据提取                        
                
                                
                        
                            质量(理念)                        
                
                                
                        
                            数据挖掘                        
                
                                
                        
                            工程类                        
                
                                
                        
                            梅德林                        
                
                                
                        
                            公制(单位)                        
                
                                
                        
                            哲学                        
                
                                
                        
                            运营管理                        
                
                                
                        
                            认识论                        
                
                                
                        
                            法学                        
                
                                
                        
                            政治学                        
                
                        
                    
            作者
            
                Helmut Spengler,Ingrid Gatz,Florian Kohlmayer,Klaus A. Kuhn,Fabian Praßer            
         
            
    
            
            标识
            
                                    DOI:10.1109/cbms49503.2020.00085
                                    
                                
                                 
         
        
                
            摘要
            
            Clinical and translational data warehouses are important infrastructure building blocks for modern data-driven approaches in medical research. These analytics-oriented databases have been designed to integrate heterogeneous biomedical datasets from different sources and to support use cases such as cohort selection and ad-hoc data analyses. However, the lack of clear definitions of source data and controlled data collection procedures often raises concerns about the quality of data provided in such environments and, consequently, about the evidence level of related findings. To address these problems, we present an architecture that helps to monitor data quality issues when importing data into warehousing solutions using ETL (Extraction, Transformation, Load) processes. Our approach provides software developers with an API (Application Programming Interface) for logging detailed and structured information about data quality issues encountered. This information can then be displayed in dynamic dashboards, the evolution of data quality can be monitored over time, and quality issues can be traced back to their source. Our architecture supports several well-known data quality dimensions, addressing conformance, completeness, and plausibility. We present an open-source implementation, which is compatible with common clinical and translational data warehousing platforms, such as i2b2 and tranSMART, and which can be used in conjunction with many ETL environments.
         
            
 
                 
                
                    
                    科研通智能强力驱动
Strongly Powered by AbleSci AI