计算机科学
一致性(知识库)
列联表
原始数据
表(数据库)
意外事故
集合(抽象数据类型)
数据挖掘
信息隐私
情报检索
互联网隐私
人工智能
作者
Boaz Barak,Kamalika Chaudhuri,Cynthia Dwork,Satyen Kale,Frank McSherry,Kunal Talwar
出处
期刊:Symposium on Principles of Database Systems
日期:2007-06-11
被引量:449
标识
DOI:10.1145/1265530.1265569
摘要
The contingency table is a work horse of official statistics, the format of reported data for the US Census, Bureau of Labor Statistics, and the Internal Revenue Service. In many settings such as these privacy is not only ethically mandated, but frequently legally as well. Consequently there is an extensive and diverse literature dedicated to the problems of statistical disclosure control in contingency table release. However, all current techniques for reporting contingency tables fall short on at leas one of privacy, accuracy, and consistency (among multiple released tables). We propose a solution that provides strong guarantees for all three desiderata simultaneously.Our approach can be viewed as a special case of a more general approach for producing synthetic data: Any privacy-preserving mechanism for contingency table release begins with raw data and produces a (possibly inconsistent) privacy-preserving set of marginals. From these tables alone-and hence without weakening privacy--we will find and output the nearest consistent set of marginals. Interestingly, this set is no farther than the tables of the raw data, and consequently the additional error introduced by the imposition of consistency is no more than the error introduced by the privacy mechanism itself.The privacy mechanism of [20] gives the strongest known privacy guarantees, with very little error. Combined with the techniques of the current paper, we therefore obtain excellent privacy, accuracy, and consistency among the tables. Moreover, our techniques are surprisingly efficient. Our techniques apply equally well to the logical cousin of the contingency table, the OLAP cube.
科研通智能强力驱动
Strongly Powered by AbleSci AI