\documentclass{article}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsthm}
\setlength{\oddsidemargin}{.25in}
\setlength{\evensidemargin}{.25in}
\setlength{\textwidth}{6in}
\setlength{\topmargin}{-0.5in}
\setlength{\textheight}{8.5in}
\newcommand{\header}[5]{
\renewcommand{\thepage}{\arabic{page}}
\noindent
\begin{center}
\framebox{
\vbox{
\hbox to 5.78in {\bf {#1} \hfill {#2}}
\vspace{2mm}
\hbox to 5.78in { {\Large \hfill Lecture {#3}\hfill} }
\vspace{2mm}
\hbox to 5.78in { {\it Instructor: {#4} \hfill Scribe: {#5}} }
}
}
\end{center}
\vspace*{2mm}
}
\newtheorem{theorem}{Theorem}
\newtheorem{claim}{Claim}
\newtheorem{lemma}{Lemma}
\newtheorem{definition}{Definition}
\newcommand{\lang}[1]{{\sc #1}}
\newcommand{\anglebrackets}[1]{{< \!\! #1 \!\! >}}
\begin{document}
\header{6.841/18.405J: Advanced Complexity Theory}
{Wed, Apr 09, 2003}
{16}{Madhu Sudan}{Daniel Corson}
\subsection*{Today}
%Average-case complexity
\begin{list}{-}{Average-case complexity}
\item permanent
\item DNP-completeness
\end{list}
\subsection*{Distributional complexity}
Optimization problems may be easier to solve practically
than previous theoretical treatment would indicate,
because:
\begin{list}{-}{}
\item near-optimal solutions may be easier to find than optimal ones, and
\item random instances may be easier to solve than worst-case instances.
\end{list}
(Optimizing the bin-packing problem, for example,
is presumably hard, but we know that we can easily find solutions
with a constant number more than the optimal number of bins,
and it may even be easy if we only allow one more bin than the
optimal number.)
So for the purposes of distributional complexity,
our instances are pairs $(\Pi,D)$ where
$\Pi$ is a computational problem (often a language
like \lang{SAT}) and $D$ is a distribution, which we
will treat more specifically later. It is important,
of course, that the distribution matches the application.
The big question is if it is easy to solve $\Pi$ over the
distribution $D$. We would like to allow
that very rare instances are say exponentially hard,
for then we would still certainly say that the problem is well-solved
for (almost?) all practical purposes,
but we also don't want to make our definition so broad that
problems with undecidable instances are considered well-solved.
In the end, we choose the following. A problem and distribution pair
$(\Pi, D) \in$ \lang{Avg-P} (informally, it is easy) means
there exists an algorithm A$(\cdot,\cdot)$ (first $\cdot$ input,
second $\cdot$ confidence) such that for all instance lengths
$n$ and confidences $\delta$, Pr$_{x \in D, |x| = n}
[$A$(x,\delta)$ solves $\Pi] \ge 1 - \delta$, and A$(x, \delta)$
runs in time polynomial in both $|x|$ and $\frac{1}{\delta}$.
(We also notate $\{x \in D : |x| = n\}$ by $D_n$).
The distributional complexity of \lang{SAT}
on various distributions is something
interesting that we may come back to later in the course.
\subsection*{The permanent}
For $n \in \mathbb{N}$,
$[n]$ denotes $\{1, 2, \cdots, n\}$ and
$S_n$ denotes $\{$bijections $\pi:[n]\rightarrow [n]\}$.
The permanent of an $n \times n$ matrix $M$ is given as
perm$(M) = \sum_{\pi \in S_n} \prod_{i \in [n]} M_{i, \pi(i)}$.
Notice when $M$ is a $\{0,1\}$-matrix that the permanent counts
the perfect matchings of the $(n,n)$-bipartite graph whose
(condensed, because it is bipartite) adjacency matrix is $M$.
We know that the (non-distributional) problem of calculating the permanent
of a $\{0,1\}$-matrix is \lang{\#P}-hard [Valiant].
The following is a rare instance where we have been able to say
anything about implications of distributional complexity on
classical complexity class relationships. It is a worst case to
average-case reduction!
\begin{theorem}
[Lipton]
If the problem of computing permanents of
uniformly distributed $\{0,1\}$-matrices
is in \lang{Avg-P}, then \lang{P$^{\#P}$} $\subseteq BPP$.
\end{theorem}
%First we need to specify what we mean by uniform distribution.
%The formulation we will use is bizarre, but
We will use without proof the infamous Chinese remainder theorem (CRT),
which states that given relatively prime
integers $p_1, \dots, p_n$, and the
values of some integer $a$
mod $p_i$ for $i \in [n]$, we can determine via a polynomial
time computation $a$ mod $\prod_{i\in [n]} p_i$.
\begin{proof}
Let $B$ be our \lang{Avg-P} algorithm. We'll construct a \lang{BPP}
one.
Given a matrix $A$, we will pick $n$ different primes $p$
all with $n + 1 < p \le n^{10}$. Then we will determine
perm$(A)$ mod $p$ for each one. By CRT, we can determine
from this perm$(A)$, since there are no more than $n!$ possible
values (think of the correspondencse to perfect matchings
in $(n,n)$-bipartite graphs).
But we will do the picking in a funny way, in order to maintain
uniform distribution. We will pick uniformly distributed random
integers from $[n^{10}]$ until we have $n$ distinct ``good ones'',
and those that are too small or are composite, we will end up ignoring
(though we will still go through the same motions with {\em all}
the randomly choosen numbers,
so it is always clear that we are never biasing our uniform
distribution, which is important if we wish to be allowed to use $B$,
though when we cover distributional problem reductions,
we will be able to see that it did not actually matter, which
makes intuitive sense anyway, since doing computations and
throwing them away without looking is unlikely to help you solve
problems).
So, how do we determine each perm$(A)$ mod $p$?
Pick a random matrix $R \in \mathbb{Z}_p^{n\times n}$,
then construct the $n + 1$ matrices $M_i = A + iR$ $($mod $p)$,
$i \in [n+1]$. Notice that, individually, the $M_i$ are
uniformly distributed (seeing any {\em one} of them tells you nothing
about $A$) as long as $p$ is prime and greater than $n + 1$.
Also notice that perm$(M_x)$ depends on $x$ as a polynomial $g(x)$
of degree $n$. So we can run $B$ to determine $g(x) =$ perm$(M_x)$
for all $x \in [n+1]$ (with high probability, of course;
we'll take care of that accounting at the end).
We have enough values of $g$ to determine it's coefficients,
so we can in this way obtain $g(0) =$ perm$(M_0) =$ perm$(A$ mod $p) =$
perm$(A)$ mod $p$.
And remember if $p$ is less than $n + 1$ or composite, we can do the
same things anyway, just don't count on any of it to work (which we don't,
since we ignore the result).
So if we pick $k$ $p$'s total in the beginning, then run $B$ $n+1$ times
for each $k$, and $B$ has error probability $\delta$, then we know
the total error probability is $1 - k(n+1)\delta$, which is sufficiently
high for our purposes.
\end{proof}
\subsection*{Distributions, reductions, and DNP-completeness}
We wish to define a distributional analogue to the notion of reductions.
Allowing all statistical distributions to be problem distributions
makes defining such an analogue less useful because it allows
contrived adversorial distributions that would never come up
in practice which trivialize the theory.
So instead we restrict our attention to a more feasible class of
distributions, but we obviously don't want to lose the uniform
distribution, nor what it can become under polynomial transformation.
We call \lang{P}-sampleable a distribution that
is spat out by some polynomial-time algorithm which is fed
a uniform distribution. From now on, we only work with such distributions.
Before we define reductions (which will only fully happen
next time), we need another definition.
Where $D'$ and $D$ are (\lang{P}-sampleable) distributions,
for $D'$ to $\alpha$-dominate $D$
($\alpha \ge 1$) means that for all $x$, Pr$[x$ accords to $D']
\ge \frac{1}{\alpha}$ Pr$[x$ accords to $D]$.
Even now, before we have a fully general distributional reduction theory,
we can see that if $D'$ $\alpha$-dominates $D$, then if
an algorithm $B$ solves $\Pi$ on $D'$ with probability
$1 - \delta$, then it also solves $\Pi$ on $D$ with probability
$1 - \alpha\delta$.
\subsection*{Next time}
\begin{list}{-}{}
%Next time, we will:}
\item use this to formulate a finer definition of reduction
\item show that a particular problem is \lang{DNP}-complete
\end{list}
\end{document}