基本事实
计算机科学
质量(理念)
领域(数学)
工作(物理)
共同点
需要知道
培训(气象学)
知识管理
心理学
数据科学
人工智能
工程类
数学
认识论
社会心理学
物理
哲学
气象学
纯数学
机械工程
计算机安全
作者
Sarah Lebovitz,Natalia Levina,Hila Lifshitz-Assa
标识
DOI:10.25300/misq/2021/16564
摘要
Organizational decision-makers need to evaluate AI tools in light of increasing claims that such tools out-perform human experts. Yet, measuring the quality of knowledge work is challenging, raising the question of how to evaluate AI performance in such contexts. We investigate this question through a field study of a major U.S. hospital, observing how managers evaluated five different machine-learning (ML) based AI tools. Each tool reported high performance according to standard AI accuracy measures, which were based on ground truth labels provided by qualified experts. Trying these tools out in practice, however, revealed that none of them met expectations. Searching for explanations, managers began confronting the high uncertainty of experts’ know-what knowledge captured in ground truth labels used to train and validate ML models. In practice, experts address this uncertainty by drawing on rich know-how practices, which were not incorporated into these ML-based tools. Discovering the disconnect between AI’s know-what and experts’ know-how enabled managers to better understand the risks and benefits of each tool. This study shows dangers of treating ground truth labels used in ML models objectively when the underlying knowledge is uncertain. We outline implications of our study for developing, training, and evaluating AI for knowledge work.
科研通智能强力驱动
Strongly Powered by AbleSci AI