\documentclass[10pt]{article}
\usepackage{amsfonts,amsthm,amsmath,amssymb}
\usepackage{array}
\usepackage{epsfig}
\usepackage{fullpage}
\usepackage{amssymb}
\usepackage[colorlinks = false]{hyperref}
\newcommand{\1}{\mathbbm{1}}
\DeclareMathOperator*{\argmin}{argmin}
\DeclareMathOperator*{\argmax}{argmax}
\newcommand{\x}{\times}
\newcommand{\Z}{\mathbb{Z}}
\newcommand{\Q}{\mathbb{Q}}
\newcommand{\R}{\mathbb{R}}
\newcommand{\N}{\mathbb{N}}
\newcommand{\F}{\mathbb{F}}
\newcommand{\E}{\mathop{\mathbb{E}}}
\renewcommand{\bar}{\overline}
\renewcommand{\epsilon}{\varepsilon}
\newcommand{\eps}{\varepsilon}
\newcommand{\DTIME}{\textbf{DTIME}}
\renewcommand{\P}{\textbf{P}}
\newcommand{\SPACE}{\textbf{SPACE}}
\begin{document}
\input{preamble.tex}
\newtheorem{example}[theorem]{Example}
\theoremstyle{definition}
\newtheorem{defn}[theorem]{Definition}
\handout{CS 229r Information Theory in Computer Science}{April 25, 2019}{Instructor:
Madhu Sudan}{Scribe: Shuran Zheng}{Lecture 24: ``Barriers'' to Optimization}
\section{Overview}
Today we're going to talk about a line of works started a little bit before 1985, when some people tried to show that the Travel Salesman Problem could be solved by LP in polynomial time. Finally \cite{} came up with the beautiful paper that killed the entire approach. We'll talk about how the approach was supposed to be and how it was killed. Below is the outline for today's class.
\begin{itemize}
\item LP, Max Cut, \emph{extended formulations}.
\item Lower bound for extended formulation via nondeterministic communication complexity.
\item Non-Det-CC(Unique Disj) = $\Omega(n)$.
\item Max Cut extended formulation lower bound = $2^{\Omega(n)}$.
\end{itemize}
Next time we'll talk about Quantum Information.
\section{Extended Formulations}
\subsection{Linear Programming}
We'll start with the \emph{linear programming}. A linear program can be defined as
\begin{eqnarray}
& \max_{x \in \bbR^n} & c^T x \notag\\
& \textrm{s.t.} & Ax \le b \notag
\end{eqnarray}
where $A\in \bbR^{m\times n}$, $b \in \bbR^{m}$, $c\in \bbR^n$. Geometrically, each constraint restricts the feasible solutions to a half-space, and the feasible region of a LP is the convex set defined by the intersection of these half-spaces. Our goal is to maximize a linear function within this convex set. It is known that linear programs can be optimized in polynomial time.
\subsection{Cut Polytope}
We now show how LP can possibly be used to solve Max Cut, which is also a NP-hard problem as TSP. Consider a complete graph on $n$ nodes with $n(n-1)/2$ edges. The characteristic vector $\chi_S$ of a cut $S, \overline{S}$ is a vector with length equal to the number of edges, each coordinate of which represents whether an edge is in the cut-set. More specifically,
\begin{eqnarray}
\chi_S(e) = \left\{ \begin{array}{ll} 1, & \textrm{if $e$ goes from one of $S$, $\overline{S}$ to the other}\\
0, & \textrm{otherwise} \end{array}
\right.\notag
\end{eqnarray}
The cut polytope is defined as the convex hull of all possible characteristic vectors of cuts,
\begin{eqnarray}
\mathcal{P}_{cut} = ConvexHull(\chi_S|S\subseteq[n]).
\end{eqnarray}
Then the Max Cut problem can be defined as a LP. Given a vector $w\in \bbR^{n(n-1)/2}$ that represents the weights of the edges, the Max Cut problem is just
\begin{eqnarray}
& \max & w^T x \notag\\
&\text{s.t.} & x \in \mathcal{P}_{cut} \notag
\end{eqnarray}
But we know that the Max Cut problem is NP-hard, so it cannot be solved efficiently by a LP. Where does the gap come from? One guess is that the solution needs to be integral. This is actually not a problem as we can always find an optimal solution that is an extreme point, which should automatically be integral. The other guess is that this formulation may take exponentially many constraints to define $\mathcal{P}_{cut}$. In other words, $\mathcal{P}_{cut}$ may have exponentially many facets. But it turns out that this is also not enough for the problem to be hard.
\subsection{Definition}
We introduce a method that can possibly give a polynomial-size description of a polytope $P$ with exponentially many facets. The approach is to add some auxiliary decision variables and define constraints in a higher-dimensional space. The idea is that this complicated polytope $P$ may be a ``shadow'' of some simple polytope $Q$ from a higher-dimensional space, as projecting onto a subset of variables can
blow up the number of facets. Here is the formal definition,
\begin{definition}[Extended formulations]
For a polytope $P = \{x | Ax \le b \} \subseteq \bbR^n$, polytope $Q = \{z|A'z \le b'\} \subseteq \bbR^{n+m}$ is an extended formulation of $P$ if $m = poly(n)$, and
$$
P= \{x~|~ \exists y \in \bbR^{m} \text{ s.t. } (x,y)\in Q\}
$$
\end{definition}
\begin{exercise}
Consider the polytope $P \subseteq \{(x_1, \dots, x_n)\} = \bbR^n$ defined by the exponentially many constraints
\begin{align*}
& \sum_{i\in S} x_i \ge B, \text{ for all } S\subseteq [n], |S| \ge k\\
& 0 \le x_i \le 1.
\end{align*}
Give a small (polynomial-size) extended formulation $Q$ for $P$.
\end{exercise}
\section{Max Cut Extended Formulation Lower Bound}
So we want to prove that there exists \emph{no} small extended formulation of the cut polytope $\mathcal{P}_{cut}$. We will create an artificial hard communication problem Face-Vertex($P$) so that if there exists a nice extended formulation for polytope $P$, then there will exist an associated (very esoteric) communication protocol related to $P$ that is ``easy''.
We first define \emph{faces} of a polytope. Basically faces = all ``boundary'' surfaces. For example, a cube has
\begin{itemize}
\item eight $0$-dimensional faces, which are the vertices of the cube,
\item twelve $1$-dimensional faces, which are the edges,
\item six $2$-dimensional faces, which are the facets,
\item one $3$-dimensional face, which is the cube.
\end{itemize}
\subsection{Face-Vertex Problem}
The communication task Face-Vertex($P$) is as follows.
\begin{itemize}
\item Alice knows a vertex $v$ of $P$.
\item Bob knows a face $f$ of $P$.
\item The goal is to design a communication protocol, so that Alice and Bob output $1$ if $v\notin f$, and output $0$ if $v \in f$.
\end{itemize}
The number of bits required for this task is the deterministic communication complexity of the problem, denoted by Det-CC(Face-Vertex($P$)).
\subsection{Nondeterministic Communication Complexity}
Our proof will involve another CC notion, \emph{nondeterministic communication complexity}. Suppose now we have Merlin, who knows both $v$ and $f$. Merlin can send a message $m$ to both Alice and Bob at the beginning. The goal is to design the whole protocol so that
\begin{itemize}
\item If $v \notin f$, then there exists a message $m$ such that both Alice and Bob output $1$.
\item If $v \in f$, then for all message $m$, at least one of Alice and Bob outputs $0$.
\end{itemize}
The number of bits required for $m$ is just the nondeterministic communication complexity of the problem, denoted by Non-Det-CC(Face-Vertex($P$)). In this definition, Alice and Bob are given a ``proof'' from Merlin. What they do is not actually communicating with each other, but more like verifying the proof $m$.
A digression on the nondeterministic communication complexity: a well-known result for Non-Det-CC is that for any problem $\Pi$,
$$
\text{Det-CC}(\Pi) \le \text{Non-Det-CC}(\Pi) \cdot \text{Non-Det-CC}(\overline{\Pi}),
$$
where $\overline{\Pi}$ is the complement of $\Pi$ by flipping $0$s and $1$s. For a lot of problems (e.g. Set Disjointness, Equality) with high Det-CC, they have very low Non-Det-CC for one of $\Pi$, $\overline{\Pi}$. So by the above inequality, we can argue that the Non-Det-CC of the complement is high.
\subsection{Yannakakis's Lemma}
We're going to use the following lemma without a formal proof.
\begin{lemma}[Yannakakis's Lemma] \label{lem1}
If a polytope $P$ has an extended formulation $Q$ with $r$ facets, then Face-Vertex($P$) has Non-Det-CC $\le \log r$.
\end{lemma}
We outline the idea of the proof. Suppose a polytope $P$ has an extended formulation $Q$ with $r$ facets. For a vertex $v \in P$ and a face $f$ of $P$, we define convex sets $v^*, f^* \subseteq Q$,
\begin{eqnarray}
&v^* = \{ (v, y) | (v, y) \in Q \} \notag \\
&f^* = \{ (x, y) | (x, y) \in Q, x \in f \} \notag
\end{eqnarray}
So if $v \notin f$, $v^* \cap f^* = \emptyset$. It can be proved that (we're not going to prove this)
$$
v \notin f \text{ if and only if there exists a facet } \widetilde{f} \text{ of } Q \text{ s.t. } f^* \subseteq \widetilde{f} \text{ and } \widetilde{f} \cap v^* = \emptyset.
$$
Therefore Merlin can just send the index of $\tilde{f}$ with length $O(\log r)$, and Alice and Bob can compute $v^*$ and $f^*$ by themselves and test the above conditions $f^* \subseteq \widetilde{f}$ and $\widetilde{f} \cap v^* = \emptyset$ respectively.
\subsection{Extended Formulation Lower Bound}
We now prove the lower bound. Lemma~\ref{lem1} reduces the task of proving lower bounds on the size of extended formulations of the cut-polytope $\mathcal{P}_{cut}$ to proving lower bounds on the nondeterministic communication complexity of Face-Vertex($\mathcal{P}_{cut}$).
We will then prove a lower bound of Non-Det-CC(Face-Vertex($\mathcal{P}_{cut}$)) by reduction from the Unique Disjointness problem, which is known to be a high-complexity problem.
\begin{exercise}
Prove that Non-Det-CC(Unique Disjointness) $= \Omega(n)$.
\end{exercise}
To reduce the Unique Disjointness problem to Face-Vertex, for each Disjointness problem instance Disj($S, R$), we construct a communication problem where Alice has a vertex associated with $S$, denoted by $\chi_S$, and Bob has a facet associated with $R$, denoted by $H_R$, so that
\begin{eqnarray}
& \chi_S \in H_R, \text{ if and only if } |S \cap R | = 1. \notag
\end{eqnarray}
To find such $\chi_S$ and $H_R$, we first show that the cut polytope $\mathcal{P}_{cut}$ can be equivalently transformed into the \emph{correlation polytope}. Recall that the vertices of $\mathcal{P}_{cut}$ are the characteristic vectors of all the cuts, $\chi_S$. Here it would be more convenient if we represent $\chi_S$ by matrices. We abuse the notation a little bit by writing the characteristic vector of $S, \overline{S}$ as $$\chi_S \cdot \chi_{\overline{S}}^T,$$ where $\chi_S \in \bbR^n$ is the characteristic vector of set $S$.
The \emph{correlation polytope} is then the polytope we get by mapping vertices $\chi_S \cdot \chi_{\overline{S}}^T$ to $\chi_S \cdot \chi_{S}^T$. Then for each vertex $\chi_S \cdot \chi_{S}^T$ in this correlation polytope, we can find a supporting hyperplane $H_R$ so that
\begin{eqnarray*}
&H_R( \chi_S \cdot \chi_{S}^T) \ge 0, \forall S, R\\
&H_R( \chi_S \cdot \chi_{S}^T) = 0, \text{ if } |S \cap R | = 1.
\end{eqnarray*}
\end{document}