交叉验证
计算机科学
模型验证
集合(抽象数据类型)
数据科学
医疗保健
跨平台
数据挖掘
人工智能
程序设计语言
经济增长
经济
作者
Drew Wilimitis,Colin G. Walsh
摘要
Cross-validation remains a popular means of developing and validating artificial intelligence for health care. Numerous subtypes of cross-validation exist. Although tutorials on this validation strategy have been published and some with applied examples, we present here a practical tutorial comparing multiple forms of cross-validation using a widely accessible, real-world electronic health care data set: Medical Information Mart for Intensive Care-III (MIMIC-III). This tutorial explored methods such as K-fold cross-validation and nested cross-validation, highlighting their advantages and disadvantages across 2 common predictive modeling use cases: classification (mortality) and regression (length of stay). We aimed to provide readers with reproducible notebooks and best practices for modeling with electronic health care data. We also described sets of useful recommendations as we demonstrated that nested cross-validation reduces optimistic bias but comes with additional computational challenges. This tutorial might improve the community’s understanding of these important methods while catalyzing the modeling community to apply these guides directly in their work using the published code.
科研通智能强力驱动
Strongly Powered by AbleSci AI