The Price of Privately Releasing Contingency Tables
and the Spectra of Random Matrices with Correlated Rows
Shiva Prasad Kasiviswanathan, Mark Rudelson, Adam Smith, Jonathan Ullman
Abstract:
Marginal (contingency) tables are the method of choice for
government agencies releasing statistical summaries of categorical
data. In this paper, we derive lower bounds on how
much distortion (noise) is necessary in these tables to ensure
the privacy of sensitive data. We extend a line of recent work
on impossibility results for private data analysis to a natural and
important class of functionalities.
Consider a database consisting of n rows (one per individual),
each row comprising d binary attributes. For any
subset of T attributes of size |T| = k, the marginal table
for T has 2k entries; each entry counts how many times in
the database a particular setting of these attributes occurs.
We provide lower bounds for releasing all d-choose-k
k-attribute
marginal tables under several different notions of privacy.
- We give efficient polynomial time attacks which allow
an adversary to reconstruct sensitive information given insufficiently
perturbed marginal table releases. In particular,
for a constant k, we obtain a tight bound of \Omega(min{n^{1/2}, d^{(k-1)/2}})
on the average distortion per entry for any mechanism
that releases all k-attribute marginals while providing
"attribute" privacy (a weak notion implied by most privacy
definitions).
- Our reconstruction attacks require a new lower bound
on the least singular value of a random matrix with correlated
rows. Let M(k) be a matrix with d-choose-k
rows formed by
taking all possible k-way entry-wise products of an underlying
set of d random vectors from {0,1}^n. For constant
k, we show that the least singular value of M(k) is \Omega(d^{k/2})
with high probability (the same asymptotic bound as for
independent rows).
- We obtain stronger lower bounds for marginal tables
satisfying differential privacy. We give a lower bound of
\Omega(min{n^{1/2}, d^{k/2}), which is tight for n = \Omega(d^k).
We extend our analysis to obtain stronger results for mechanisms that
add instance-independent noise and weaker results when k
is super-constant.
Versions: