可解释性
计算机科学
稳健性(进化)
机器学习
人工智能
过程(计算)
人气
数据科学
心理学
社会心理学
生物化学
化学
基因
操作系统
作者
Mengdi Huai,Jinduo Liu,Chenglin Miao,Liuyi Yao,Aidong Zhang
出处
期刊:Proceedings of the ... AAAI Conference on Artificial Intelligence
[Association for the Advancement of Artificial Intelligence (AAAI)]
日期:2022-06-28
卷期号:36 (6): 6935-6943
被引量:2
标识
DOI:10.1609/aaai.v36i6.20651
摘要
Providing model explanations has gained significant popularity recently. In contrast with the traditional feature-level model explanations, concept-based explanations can provide explanations in the form of high-level human concepts. However, existing concept-based explanation methods implicitly follow a two-step procedure that involves human intervention. Specifically, they first need the human to be involved to define (or extract) the high-level concepts, and then manually compute the importance scores of these identified concepts in a post-hoc way. This laborious process requires significant human effort and resource expenditure due to manual work, which hinders their large-scale deployability. In practice, it is challenging to automatically generate the concept-based explanations without human intervention due to the subjectivity of defining the units of concept-based interpretability. In addition, due to its data-driven nature, the interpretability itself is also potentially susceptible to malicious manipulations. Hence, our goal in this paper is to free human from this tedious process, while ensuring that the generated explanations are provably robust to adversarial perturbations. We propose a novel concept-based interpretation method, which can not only automatically provide the prototype-based concept explanations but also provide certified robustness guarantees for the generated prototype-based explanations. We also conduct extensive experiments on real-world datasets to verify the desirable properties of the proposed method.
科研通智能强力驱动
Strongly Powered by AbleSci AI