Privately Solving Linear Programs [Abstract] [arXiv]
Justin Hsu, Aaron Roth, Tim Roughgarden, Jonathan Ullman
To appear in ICALP 2014 (Track A)  International Colloquium on Automata, Languages, and Programming
Abstract:
In this paper, we initiate the systematic study of solving linear programs under differential privacy. The first step is simply to define the problem: to this end, we introduce several natural classes of private linear programs that capture different ways sensitive data can be incorporated into a linear program. For each class of linear programs we give an efficient, differentially private solver based on the multiplicative weights framework, or we give an impossibility result.

An AntiFolk Theorem for Large Repeated Games with Imperfect Monitoring [Abstract][arXiv]
Mallesh Pai, Aaron Roth, Jonathan Ullman
Manuscript
Abstract:
We study infinitely repeated games in settings of imperfect monitoring. We first prove a family of theorems that show that when the signals observed by the players satisfy a condition known as differential privacy, that the folk theorem has little bite: for sufficiently small values of the privacy parameters, for a fixed discount factor, any equilibrium of the repeated game involve players playing approximate equilibria of the stage game in every period. Next, we argue that in large games (n player games in which unilateral deviations by single players have only a small impact on the utility of other players), many monitoring settings naturally lead to signals that satisfy differential privacy, for privacy parameters tending to zero as the number of players n grows large. We conclude that in such settings, the set of equilibria of the repeated game collapse to the set of equilibria of the stage game.

Fingerprinting Codes and the Price of Approximate Differential Privacy [Abstract][arXiv]
Mark Bun, Jonathan Ullman, Salil Vadhan
To appear in STOC 2014  Symposium on Theory of Computing
Abstract:
We show new lower bounds on the sample complexity of $(\varepsilon, \delta)$differentially private algorithms that accurately answer large sets of counting queries. A counting query on a database $D \in (\{0,1\}^d)^n$ has the form "What fraction of the individual records in the database satisfy the property $q$?" We show that in order to answer an arbitrary set $\mathcal{Q}$ of $\gg nd$ counting queries on $D$ to within error $\pm \alpha$ it is necessary that
$$
n \geq \tilde{\Omega}\left( \frac{\sqrt{d} \log \mathcal{Q}}{\alpha^2 \varepsilon} \right).
$$
This bound is optimal up to polylogarithmic factors, as demonstrated by the Private Multiplicative Weights algorithm (Hardt and Rothblum, FOCS'10). In particular, our lower bound is the first to show that the sample complexity required for accuracy and $(\varepsilon, \delta)$differential privacy is asymptotically larger than what is required merely for accuracy, which is $O(\log \mathcal{Q} / \alpha^2)$. In addition, we show that our lower bound holds for the specific case of $k$way marginal queries (where $\mathcal{Q} = 2^k \binom{d}{k}$) when $\alpha$ is a constant.
Our results rely on the existence of short fingerprinting codes (Boneh and Shaw, CRYPTO'95; Tardos, STOC'03), which we show are closely connected to the sample complexity of differentially private data release. We also give a new method for combining certain types of sample complexity lower bounds into stronger lower bounds.

Faster Private Release of Marginals on Small Databases [Abstract][arXiv]
Karthekeyan Chandrasekaran, Justin Thaler, Jonathan Ullman, Andrew Wan
ITCS 2014  Innovations in Theoretical Computer Science
Abstract:
We study the problem of answering $k$way marginal queries on a database $D \in (\{0,1\}^d)^n$, while preserving differential privacy. The answer to a $k$way marginal query is the fraction of the database's records $x \in \{0,1\}^d$ with a given value in each of a given set of up to $k$ columns. Marginal queries enable a rich class of statistical analyses on a dataset, and designing efficient algorithms for privately answering marginal queries has been identified as an important open problem in private data analysis. For any $k$, we give a differentially private online algorithm that runs in time $$\mathrm{poly}\left(n, 2^{o(d)} \right)$$ per query and answers any sequence of $\mathrm{poly}(n)$ many $k$way marginal queries with error at most $\pm 0.01$ on every query, provided $n \gtrsim d^{0.51} $. To the best of our knowledge, this is the first algorithm capable of privately answering marginal queries with a nontrivial worstcase accuracy guarantee for databases containing $\mathrm{poly}(d, k)$ records in time $\exp(o(d))$. Our algorithm runs the private multiplicative weights algorithm (Hardt and Rothblum, FOCS '10) on a new approximate polynomial representation of the database.
We derive our representation for the database by approximating the OR function restricted to low Hamming weight inputs using lowdegree polynomials with coefficients of bounded $L_1$norm. In doing so, we show new upper and lower bounds on the degree of such polynomials, which may be of independent approximationtheoretic interest.

Differential Privacy for the Analyst via Private Equilibrium Computation [Abstract][arXiv]
Justin Hsu, Aaron Roth, Jonathan Ullman
STOC 2013  Symposium on Theory of Computing
Abstract:
We give new mechanisms for answering exponentially many queries from multiple analysts on a private database, while protecting differential privacy both for the individuals in the database and for the analysts. That is, our mechanism's answer to each query is nearly insensitive to changes in the queries asked by other analysts. Our mechanism is the first to offer differential privacy on the joint distribution over analysts' answers, providing privacy for data analysts even if the other data analysts collude or register multiple accounts. In some settings, we are able to achieve nearly optimal error rates (even compared to mechanisms which do not offer analyst privacy), and we are able to extend our techniques to handle nonlinear queries. Our analysis is based on a novel view of the private queryrelease problem as a twoplayer zerosum game, which may be of independent interest.

Answering n^{2+o(1)} Counting Queries with Differential Privacy is Hard [Abstract][arXiv]
Jonathan Ullman
STOC 2013  Symposium on Theory of Computing
Invited to the special issue of SICOMP for selected papers from STOC 2013.
To appear in SICOMP  Special issue for selected papers from STOC 2013
Abstract:
A central problem in differentially private data analysis is how to design efficient algorithms capable of answering large numbers of counting queries on a sensitive database. Counting queries are of the form "What fraction of individual records in the database satisfy the property $q$?" We prove that if oneway functions exist, then there is no algorithm that takes as input a database $D \in (\{0,1\}^d)^n$, and $k = \tilde{\Theta}(n^2)$ arbitrary efficiently computable counting queries, runs in time $\mathrm{poly}(d, n)$, and returns an approximate answer to each query, while satisfying differential privacy. We also consider the complexity of answering "simple" counting queries, and make some progress in this direction by showing that the above result holds even when we require that the queries are computable by constantdepth ($AC^0$) circuits.
Our result is almost tight because it is known that $\tilde{\Omega}(n^2)$ counting queries can be answered efficiently while satisfying differential privacy. Moreover, many more than $n^2$ queries (even exponential in $n$) can be answered in exponential time.
We prove our results by extending the connection between differentially private query release and cryptographic traitortracing schemes to the setting where the queries are given to the algorithm as input, and by constructing a traitortracing scheme that is secure in this setting.

Mechanism Design in Large Games: Incentives and Privacy [Abstract][arXiv]
Michael Kearns, Mallesh Pai, Aaron Roth, Jonathan Ullman
ITCS 2014  Innovations in Theoretical Computer Science
Abstract:
We study the problem of implementing equilibria of complete information games in settings of incomplete information, and address this problem using "recommender mechanisms." A recommender mechanism is one that does not have the power to enforce outcomes or to force participation, rather it only has the power to suggestion outcomes on the basis of voluntary participation. We show that despite these restrictions, recommender mechanisms can implement equilibria of complete information games in settings of incomplete information under the condition that the game is largei.e. that there are a large number of players, and any player's action affects any other's payoff by at most a small amount. Our result follows from a novel application of differential privacy. We show that any algorithm that computes a correlated equilibrium of a complete information game while satisfying a variant of differential privacywhich we call joint differential privacycan be used as a recommender mechanism while satisfying our desired incentive properties. Our main technical result is an algorithm for computing a correlated equilibrium of a large game while satisfying joint differential privacy. Although our recommender mechanisms are designed to satisfy gametheoretic properties, our solution ends up satisfying a strong privacy property as well. No group of players can learn "much" about the type of any player outside the group from the recommendations of the mechanism, even if these players collude in an arbitrary way. As such, our algorithm is able to implement equilibria of complete information games, without revealing information about the realized types.

Faster Algorithms for Privately Releasing Marginals [Abstract][arXiv]
Justin Thaler, Jonathan Ullman, Salil Vadhan
ICALP 2012 (Track A)  International Colloquium on Automata, Languages, and Programming
Abstract:
We study the problem of releasing $k$way marginals of a database $D \in (\{0,1\}^d)^n$, while preserving differential privacy. The answer to a $k$way marginal query is the fraction of $D$'s records $x \in \{0,1\}^d$ with a given value in each of a given set of up to $k$ columns. Marginal queries enable a rich class of statistical analyses of a dataset, and designing efficient algorithms for privately releasing marginal queries has been identified as an important open problem in private data analysis (cf.~Barak et.~al., PODS '07).
We give an algorithm that runs in time $d^{O(\sqrt{k})}$ and releases a private summary capable of answering any $k$way marginal query with at most $\pm .01$ error on every query as long as $n \geq d^{O(\sqrt{k})}$. To our knowledge, ours is the first algorithm capable of privately releasing marginal queries with nontrivial worstcase accuracy guarantees in time substantially smaller than the number of $k$way marginal queries, which is $d^{\Theta(k)}$ (for $k \ll d$).

Iterative Constructions and Private Data Release [Abstract][arXiv]
Anupam Gupta, Aaron Roth, Jonathan Ullman
TCC 2012  Theory of Cryptography Conference
Abstract:
In this paper we study the problem of approximately releasing the cut function
of a graph while preserving differential privacy, and give new algorithms
(and new analyses of existing algorithms) in both the interactive
and noninteractive settings.
Our algorithms in the interactive setting are achieved by
revisiting the problem of releasing differentially private, approximate
answers to a large number of queries on a database. We show
that several algorithms for this problem fall into the same basic framework, and
are based on the existence of objects which we call iterative database construction
algorithms. We give a new generic framework in which new (efficient) IDC algorithms give
rise to new (efficient) interactive private query release mechanisms. Our modular
analysis simplifies and tightens the analysis of previous algorithms, leading to
improved bounds. We then give a new IDC algorithm (and therefore a new
private, interactive query release mechanism) based on the Frieze/Kannan lowrank matrix
decomposition. This new release mechanism gives an improvement on prior work in
a range of parameters where the size of the database is comparable to the
size of the data universe (such as releasing all cut queries on dense graphs).
We also give a noninteractive algorithm for efficiently releasing
private synthetic data for graph cuts with error $O(V^{1.5})$.
Our algorithm is based on randomized response and a nonprivate implementation
of the SDPbased, constantfactor approximation algorithm for cutnorm due to
Alon and Naor. Finally, we give a reduction based on the IDC framework showing
that an efficient, private algorithm for computing sufficiently accurate rank1 matrix
approximations would lead to an improved efficient algorithm for releasing private
synthetic data for graph cuts. We leave finding such an algorithm as our main open problem.

On the ZeroError Capacity Threshold for Deletion Channels [Abstract][arXiv]
Ian Kash, Michael Mitzenmacher, Justin Thaler, Jonathan Ullman
ITA 2011  Information Theory and Applications Workshop
Abstract:
We consider the zeroerror capacity of deletion channels. Specifically, we consider the setting where we choose
a codebook C consisting of strings of n bits, and our model of the channel corresponds to an adversary who
may delete up to pn of these bits for a constant p. Our goal is to decode correctly without error regardless
of the actions of the adversary. We consider what values of p allow nonzero capacity in this setting. We suggest
multiple approaches, one of which makes use of the natural connection between this problem and the problem of
finding the expected length of the longest common subsequence of two random sequences.

Privately Releasing Conjunctions and the Statistical Query Barrier [Abstract][arXiv]
Anupam Gupta, Moritz Hardt, Aaron Roth, Jonathan Ullman
STOC 2011  Symposium on Theory of Computing
SICOMP  SIAM Journal on Computing
Abstract:
Suppose we would like to know all answers to a set of statistical queries~$C$ on a data set up to small error, but we can only access the data itself using statistical queries. A trivial solution is to exhaustively ask all queries in~$C$. In this paper, we investigate how and when we can do better than this naive approach. We show that the number of statistical queries necessary and sufficient for this task isup to polynomial factorsequal to the agnostic learning complexity of $C$ in Kearns' statisticalquery (SQ) model. This gives a complete answer to the question when running time is not a concern. We then show that the problem can be solved efficiently (allowing arbitrary error on a small fraction of queries) whenever the answers to $C$ can be described by a submodular function. This includes many natural concept classes, such as graph cuts and Boolean disjunctions and conjunctions.
These results are interesting not only from a learning theoretic point of view, but also from the perspective of privacypreserving data analysis. In this context, our second result leads to an algorithm that efficiently releases differentially private answers to all Boolean conjunctions with 1$\%$ average additive error. This presents significant progress on a key open problem in privacypreserving data analysis. Our first result on the other hand gives unconditional lower bounds on any differentially private algorithm that admits a (potentially nonprivacypreserving) implementation using only statistical queries. Not only our algorithms, but also most known private algorithms can be implemented using only statistical queries, and hence are constrained by these lower bounds. Our result therefore isolates the complexity of agnostic learning in the SQ model as a new barrier in the design of differentially private algorithms.

Course Allocation by Proxy Auction [Abstract]
Scott Kominers, Mike Ruberry, Jonathan Ullman
WINE 2010  Workshop on Internet and Economics
Abstract:
We propose a new proxy bidding mechanism to allocate courses to students given students' reported preferences. Our mechanism is motivated by a specific strategic downgrading manipulation observed in the course allocation mechanism currently used at Harvard Business School (HBS). The proxy bidding mechanism simplifes students' decisions by directly incorporating downgrading into the mechanism.
Our proxy bidding mechanism is Pareto efficient with respect to lexicographic preferences and can be extended to allow for a more expressive preference language which allows complementarity, substitutability, and indifference. Simulations suggest that the proxy bidding mechanism is robust to the manipulations observed in the HBS mechanism and may yield welfare improvements.

PCPs and the Hardness of Generating Private Synthetic Data [Abstract] [ECCC]
Salil Vadhan, Jonathan Ullman
TCC 2011  Theory of Cryptography Conference
Invited to the Journal of Cryptology
Abstract:
Assuming the existence of oneway functions, we show that there is no polynomialtime, differentially private algorithm $\mathcal{A}$ that takes a database $D\in (\{0,1\}^d)^n$ and outputs a "synthetic database" $\hat{D}$ all of whose twoway marginals are approximately equal to those of $D$. (A twoway marginal is the fraction of database rows $x \in \{0,1\}^d$ with a given pair of values in a given pair of columns.) This answers a question of Barak et al. (PODS `07), who gave an algorithm running in time $\mathrm{poly}(n,2^d)$.
Our proof combines a construction of hardtosanitize databases based on digital signatures (by Dwork et al., STOC `09) with encodings based on the PCP Theorem.
We also present both negative and positive results for generating "relaxed" synthetic data, where the fraction of rows in $D$ satisfying a predicate $c$ are estimated by applying $c$ to each row of $\hat{D}$ and aggregating the results in some way.

The Price of Privately Releasing Contingency Tables and the Spectra of Random Matrices with Correlated Rows [Abstract]
Shiva Kasiviswanathan, Mark Rudelson, Adam Smith, Jonathan Ullman
STOC 2010  Symposium on Theory of Computing
Abstract:
Marginal (contingency) tables are the method of choice for government agencies releasing statistical summaries of categorical data. In this paper, we derive lower bounds on how much distortion (noise) is necessary in these tables to ensure the privacy of sensitive data. We extend a line of recent work on impossibility results for private data analysis to a natural and important class of functionalities. Consider a database consisting of $n$ rows (one per individual), each row comprising $d$ binary attributes. For any subset of $T$ attributes of size $T = k$, the marginal table for $T$ has $2^k$ entries; each entry counts how many times in the database a particular setting of these attributes occurs. We provide lower bounds for releasing all $\binom{d}{k}$ $k$attribute marginal tables under several different notions of privacy.
We give efficient polynomial time attacks which allow an adversary to reconstruct sensitive information given insufficiently perturbed marginal table releases. In particular, for a constant $k$, we obtain a tight bound of $\Omega(\min{n^{1/2}, d^{(k1)/2}})$ on the average distortion per entry for any mechanism that releases all $k$attribute marginals while providing
attribute privacy (a weak notion implied by most privacy definitions).
Our reconstruction attacks require a new lower bound on the least singular value of a random matrix with correlated rows. Let $M(k)$ be a matrix with $\binom{d}{k}$ rows formed by taking all possible $k$way entrywise products of an underlying set of $d$ random vectors from $\{0,1\}^n$. For constant $k$, we show that the least singular value of $M(k)$ is $\Omega(d^{k/2}) with high probability (the same asymptotic bound as for independent rows).
We obtain stronger lower bounds for marginal tables satisfying differential privacy. We give a lower bound of $\Omega(\min{n^{1/2}, d^{k/2}})$, which is tight for $n = \Omega(d^k)$. We extend our analysis to obtain stronger results for mechanisms that add instanceindependent noise and weaker results when $k$ is superconstant.
