The Price of Privately Releasing Contingency Tables
and the Spectra of Random Matrices with Correlated Rows

Shiva Prasad Kasiviswanathan, Mark Rudelson, Adam Smith, Jonathan Ullman

Abstract:

Marginal (contingency) tables are the method of choice for government agencies releasing statistical summaries of categorical data. In this paper, we derive lower bounds on how much distortion (noise) is necessary in these tables to ensure the privacy of sensitive data. We extend a line of recent work on impossibility results for private data analysis to a natural and important class of functionalities. Consider a database consisting of n rows (one per individual), each row comprising d binary attributes. For any subset of T attributes of size |T| = k, the marginal table for T has 2k entries; each entry counts how many times in the database a particular setting of these attributes occurs. We provide lower bounds for releasing all d-choose-k k-attribute marginal tables under several different notions of privacy.

Versions: