计算机科学
人工神经网络
相关性
测试套件
一套
机器学习
监狱
实证研究
质量(理念)
考试(生物学)
人工智能
公平性度量
深层神经网络
计量经济学
统计
测试用例
数学
心理学
回归分析
生物
考古
历史
哲学
认识论
犯罪学
无线
几何学
古生物学
电信
吞吐量
作者
Wei Zheng,Lidan Lin,Xiaoxue Wu,Xiang Chen
标识
DOI:10.1109/tse.2023.3349001
摘要
Recently, with the widespread use of deep neural networks (DNNs) in high-stakes decision-making systems (such as fraud detection and prison sentencing), concerns have arisen about the fairness of DNNs in terms of the potential negative impact they may have on individuals and society. Therefore, fairness testing has become an important research topic in DNN testing. At the same time, the neural network coverage criteria (such as criteria based on neuronal activation) is considered as an adequacy test for DNN white-box testing. It is implicitly assumed that improving the coverage can enhance the quality of test suites. Nevertheless, the correlation between DNN fairness (a test property) and coverage criteria (a test method) has not been adequately explored. To address this issue, we conducted a systematic empirical study on seven coverage criteria, six fairness metrics, three fairness testing techniques, and five bias mitigation methods on five DNN models and nine fairness datasets to assess the correlation between coverage criteria and DNN fairness. Our study achieved the following findings: 1) with the increase in the size of the test suite, some of the coverage and fairness metrics changed significantly, as the size of the test suite increased; 2) the statistical correlation between coverage criteria and DNN fairness is limited; and 3) after bias mitigation for improving the fairness of DNN, the change pattern in coverage criteria is different; 4) Models debiased by different bias mitigation methods have a lower correlation between coverage and fairness compared to the original models. Our findings cast doubt on the validity of coverage criteria concerning DNN fairness (i.e., increasing the coverage may even have a negative impact on the fairness of DNNs). Therefore, we warn DNN testers against blindly pursuing higher coverage of coverage criteria at the cost of test properties of DNNs (such as fairness).
科研通智能强力驱动
Strongly Powered by AbleSci AI