\documentclass[10pt]{article}
\usepackage{amsfonts,amsthm,amsmath,amssymb}
\usepackage{array}
\usepackage{epsfig}
\usepackage{fullpage}
\usepackage[colorlinks = false]{hyperref}
\usepackage{bbm}
\newcommand{\1}{\mathbbm{1}}
\DeclareMathOperator*{\argmin}{argmin}
\DeclareMathOperator*{\argmax}{argmax}
\newcommand{\x}{\times}
\newcommand{\Z}{\mathbb{Z}}
\newcommand{\Q}{\mathbb{Q}}
\newcommand{\R}{\mathbb{R}}
\newcommand{\N}{\mathbb{N}}
\newcommand{\F}{\mathbb{F}}
\newcommand{\E}{\mathop{\mathbb{E}}}
\renewcommand{\bar}{\overline}
\renewcommand{\epsilon}{\varepsilon}
\newcommand{\eps}{\varepsilon}
\newcommand{\DTIME}{\textbf{DTIME}}
\renewcommand{\P}{\textbf{P}}
\newcommand{\SPACE}{\textbf{SPACE}}
\begin{document}
\input{preamble.tex}
\newtheorem{example}[theorem]{Example}
\theoremstyle{definition}
\newtheorem{defn}[theorem]{Definition}
\handout{CS 229r Information Theory in Computer Science}{April 25, 2019}{Instructor: Madhu Sudan}{Scribe: Benjamin Edelman}{Lecture 24}
\section{Plan}
The focus of the lecture is using communication complexity to derive barriers to optimization. Specifically, we will study whether ``extended formulation linear programs'' are a viable approach to solve NP-complete problems when a naive linear program doesn't suffice.
The plan is to:
\begin{itemize}
\item Introduce linear programming (LP), the \textsc{Max Cut} problem, and extended formulations (EFs)
\item Show that lower bounds for extended formulations follow from non-deterministic communication complexity lower bounds
\end{itemize}
The main references for this lecture are Chapter 5 of Tim Roughgarden's communication complexity (for algorithm designers) survey \cite{rough}, Mihalis Yannakakis's pioneering 1991 paper \cite{ya} that introduced the connection between extended formulations and communication complexity, and finally Fiorini et al.'s 2015 paper that demonstrated that Yannakakis's technique could yield unconditional lower bounds for interesting linear programs like the \textsc{Traveling Salesman} and \textsc{Max Cut} polytopes.
\section{Definitions}
\subsection{Linear programming}
The Linear Programming problem (LP) is defined as follows:
\begin{definition}[LP]
Given the input $A \in \R^{n \times n}, b \in \R^n, c \in R^n$, output $x \in \R^n$ that maximizes $c^T x$ subject to the constraints $Ax \leq b$.
\end{definition}
In other words, the inequalities $Ax \leq b$ define a convex feasible set in the space $\R^n$, on which the linear function $c^T x$ is maximized. If $n=2$, the feasible set is a polygon. In higher dimensions, we call it a ``polytope''.
\begin{fact}[proved by Khachiyan, made practical by Karmarkar's interior point method]
Linear programs can be optimized in polynomial time.
\end{fact}
\subsection{The maximum cut problem}
Let $K_n$ be the complete graph on $n$ vertices. $K_n$ has $n\choose{2}$ edges. Let $S, \overline{S}$ be a cut of the vertices of $K$. The characteristic vector $\chi_S \in \{0,1\}^{n\choose{k}} \subseteq \R^{n\choose{k}}$ of $S$ is defined as $\chi_S(e) = \1[e\:\text{goes from}\:S \rightarrow \overline{S}\:\text{or}\:\overline{S} \rightarrow S]$.
We now define the \emph{cut polytope} as $P^n_\text{cut} = \text{Convex Hull}(\{\chi_S \mid S \subseteq [n]\})$.
\begin{definition}[\textsc{Max Cut}]
Given the input $w \in \R^{n\choose 2}, w \geq 0$, output $x \in \R^{n \choose 2}$ that maximizes $w^T x$ subject to the constraint that $x \in P^n_\text{cut}$.
\end{definition}
We can think of $w$ as an encoding of the edges of a graph $G \subseteq K_n$. The value of $w^T x$ achieved is the maximum number of edges that can go across a cut in $G$.\footnote{Because the optimization function is convex, the fact that $P^n_\text{cut}$ is a convex hull of the actual set we care about doesn't affect the maximum attained.}
Note that because $P^n_\text{cut}$ is a polytope, it can be encoded in a system of linear constraints, so a \textsc{Max Cut} instance is a linear program. But while $\textsc{LP} \in \text{P}$, \textsc{Max Cut} is NP-complete. How can this be? The answer is that the cut polytope $P^n_\text{cut}$ may have exponentially many (in $n$) constraints, so the linear program for a \textsc{Max Cut} instance may be exponentially large. But might there be a clever way to reduce the number of constraints? That is the idea behind ``extended formulations'' of polytopes, which we now introduce.
\section{Extended formulations}
Suppose we have a polytope $P \in \R^n$ that, like the cut polytope, needs $\exp(n)$ inequalities in order to be specified. The idea of extended formulations is to find a $Q \in \R^{n+m}$ such that $P$ is the ``shadow'' of $Q$, in the same way that a hexagon can be thought of as the shadow of a cube when light shines on it diagonally.
Formally, suppose $P$ is given by $Ax \leq b$. And suppose $Q$ is given by $A'{\binom{x}{y}} \leq \binom{b'}{c'}$. Then $P$ is the \emph{shadow} of $Q$ if $P = \{x \mid \exists y \:\text{s.t.}\: (x,y) \in Q\}$. We say $Q$ is an \emph{extended formulation} of $P$.
Amazingly, there are polytopes that require $\exp(n)$ inequalities to specify, but are the shadows of $\poly(n)$-dimensional polytopes with $\poly(n)$ constraints.
\begin{exercise}[Non-trivial but worthwhile]
Consider the polytope $P$ defined by the inequalities $\sum_{i \in S} x_i \geq B$ $\forall S \subseteq [n]$ s.t. $|S| \geq k$, and $0 \leq x_i \leq 1$. Give a polynomial-sized $Q$ that is an extended formulation for $P$.
\end{exercise}
Extended formulations are useful because maximizing $c^T x$ subject to $x \in P$ is equivalent to maximizing $c^T x$ subject to $(x,y) \in Q$. So even if $P$ can't be specified succinctly, as long as $Q$ is small we can solve the LP given by $(P,c)$. In particular, if we gave a small extended formulation for \textsc{Max Cut}, we would have proved $\textsc{Max Cut} \leq LP$, implying $P = NP$. Is there a chance that such an extended formulation exists?
\begin{theorem}[Yannakakis \cite{ya}, Fiorini et al. \cite{fi}]
No. There is no extended formulation for \textsc{Max Cut}\footnote{Or for the polytope corresponding to the traveling salesman problem.} of size $\poly(n)$.
\end{theorem}
The reason we are studying this in an information theory course is that Yannakakis proved that if we have a succinct extended formulation, this gives us a solution to a certain (rather esoteric) communication problem. Fiorini et al. proved that the communication problem corresponding to the cut polytope (well, actually a different polytope that is even easier to optimize over) is in fact hard.
\section{Yannakakis's lemma}
\subsection{The Face-Vertex communication problem}
A \emph{face} of a polytope $P$ is, roughly speaking, a bounding surface of $P$ of any dimension. Consider a cube. It has eight 0-dimensional faces, which we call \emph{vertices}; twelve 1-dimensional faces; and six 2-dimensional faces, which we call \emph{facets}. In general, vertices are 0-dimensional faces and facets are $(n-1)$-dimensional faces. When we refer to the number of inequalities needed to specify $P$, we mean the number of facets of $P$.
Here are formal definitions of faces and facets. A \emph{supporting hyperplane} $h$ of $P$ is a hyperplane $\{\textbf{x} \in \R^n \mid \textbf{a}\cdot\textbf{x} = b\}$ such that all of $P$ lies on one side of $h$ (for all $\textbf{x} \in P, \textbf{a}\cdot\textbf{x} \leq b$ and $h$ has nonempty intersection with $P$. Intuitively, $h$ is ``tangent'' to $P$. A face of $P$ is the intersection of $P$ with some supporting hyperplane $h$. A facet is a maximal face: a face that is not strictly contained in another face. In the non-degenerate setting, facets are equivalently the $(n-1)$-dimensional faces, as we said earlier.
We can rephrase our guiding question now as: given $P$ with $\exp(n)$ vertices and facets, is it the shadow of a higher-dimensional polytope with $\poly(n)$ facets?
\begin{definition}[The $\textsc{Face-Vertex}(P)$ problem]
Alice is given a vertex $v \in P$. Bob is given a face $f \in P$ in the form of the hyperplane $f$ lies on. The face-vertex function $FV(f,v)$ is defined as $\1[v \in f]$. Alice and Bob need to compute $FV(f,v)$.\footnote{Note that both Alice and Bob know $P$ to begin with.}
\end{definition}
It is natural to ask about the deterministic communication complexity of the $\textsc{Face-Vertex}(P)$ problem (we'll denote this $\text{CC}(FV)$). This is what we are familiar with in this class. However, we can also ask about the \emph{non-deterministic} communication complexity of the problem. Here is how non-deterministic communication works: Alice and Bob don't communicate with each other directly. Instead, they each receive identical messages $m$ (which we can think of as an advice string or proof) from a third character, Merlin, who is also on their team and knows both $v$ and $f$. After receiving $m$, Alice and Bob each output a bit, and they win if the AND of their bits is the correct answer.
Formally, $\text{NCC}(FV) \leq k$ if there is a protocol for Alice and Bob such that whenever $v \notin f$, there exists a message $m$, $|m| \leq k$, such that Alice and Bob both output 1 when given $m$, and whenever $v \in F$ then for all messages $|m|\leq k$ either Alice or Bob outputs 0.
Nondeterministic communication complexity is a much-studied notion, and there is a remarkable result that connects it to deterministic complexity:
\begin{fact}[Aho et al. \cite{aho}]
Let $F: X \times Y \rightarrow \{0,1\}$ be any function, and let $\overline{F}$ be $1-F$. Then
\[\text{CC}(F) \leq \text{NCC}(F) \cdot \text{NCC}(\overline{F})\]
\end{fact}
In other words, if a function has high deterministic complexity, than either it or its complement must have high non-deterministic complexity. For many classic hard communication problems like \textsc{Disjointness}, it is easy to come up with an efficient nondeterministic protocol for the problem (or its complement), so consequently the complement (or the original problem) must have high non-deterministic communication complexity.
Now we're ready to see Yannakakis's key lemma.
\subsection{The lemma}
\begin{lemma}[Yannakakis \cite{ya}]
If the polytope $P$ has an extended formulation $Q$ with $r$ facets, then
\[\text{NCC}(\textsc{FaceVertex}(P)) \leq \log r\]
\end{lemma}
Thus, if we can prove that $\textsc{Face-Vertex}(P)$ has $\Omega(\poly(n))$ non-deterministic communication complexity, it will immediately follow that $P$ does not have any extended formulation of size $o(\exp(n))$.
\begin{proof}[Proof sketch]
We are given $P$ with an extended formulation $Q$ that has $r$ facets, and we want to describe a protocol such that Alice and Bob can solve the $\textsc{Face-Vertex}(P)$ problem after receiving a message from Merlin of length $\leq \log r$.
Let $v^*$ be the ``lifting'' of Alice's vertex $v$ from $P$ to $Q$, and let $f^*$ be the lifting of Bob's face $f$:
\[v^* = \{(v,y) \mid (v,y) \in Q\}\]
\[f^* = \{(x,y) \mid x \in f, (x,y) \in Q\}\]
The basic idea of the protocol is for Merlin's message $m$ to be the index of a facet $\tilde{f}$ of $Q$. Remember: not all faces are facets. Merlin should, if possible, choose $\tilde{f}$ such that $f^* \subseteq \tilde{f}$ but $v^* \nsubseteq \tilde{f}$. It turns out that Merlin can only find such a $\tilde{f}$ if $v \notin f$. Thus, Alice's output should be $\1[v^* \nsubseteq \tilde{f}]$ and Bob's output should be $\1[f^* \subseteq \tilde{f}]$, and the protocol is complete.
\begin{exercise}
Prove that there exists a facet $\tilde{f} \subseteq Q$ satisfying $f^* \subseteq \tilde{f}$ and $v^* \nsubseteq \tilde{f}$ if and only if $v \notin f$.
\end{exercise}
\end{proof}
\section{Applying the lemma}
It took a few decades until Fiorini et al. proved that Yannakakis's lemma could be applied to prove lower bounds on the sizes of extended formulations for interesting polytopes like $P^n_\text{cut}$.
Fiorini et al. didn't actually work directly with the cut polytope; instead, they focused on the \emph{correlation polytope}, defined as follows:
\begin{definition}[correlation polytope]
$P^n_\text{cor} = \text{Convex Hull}(\{xx^T \in \R^n \mid x \in \{0,1\}^n\})$
\end{definition}
Note that $P^n_\text{cor}$ is in the space $\R^{n \times n}$ of $n \times n$ matrices. It is the convex hull of the set of all rank-1 binary symmetric matrices. We can relate the cut polytope and the correlation polytope with a neat theorem of De~Simone:
\begin{fact}[De Simone, 1990 \cite{ds}]
$P^{n+1}_\text{cut}$ and $P^n_\text{cor}$ are linearly isomorphic.
\end{fact}
In other words, there is an invertible linear map between $P^n_\text{cut}$ and $P^n_\text{cor}$. It is not too difficult to verify that extension complexity is preserved under linear isomorphism, so a bound on the extension complexity of the correlation polytope will imply a bound for the cut polytope.
The core of Fiorini et al.'s argument is the following lemma:
\begin{lemma}
For all subsets $S \in [n]$, $P^n_\text{cor}$ has a face $f_S$ such that for all subsets $R \in [n]$, letting $x_R \in [0,1]^n$ be the characteristic vector of $R$ and $v_R = x_R {x_R}^T$, then $v_R \in f_S$ if and only if $|S \cap R| = 1$.
\end{lemma}
In other words, for every subset $S$ the face $f_S$ in some sense encodes something that looks like the \textsc{Face-Vertex} problem into something that looks like \textsc{Unique Disjointness}, which is the variation of \textsc{Disjointness} where we are promised that Alice's set and Bob's set have an intersection of size $\leq 1$. This reduction can be made formal, to show that in fact $\text{NCC}(\textsc{FaceVertex}(P)) \geq \text{NCC}(\textsc{Unique Disjointness})$.
The final step of the proof is to show that $\text{NCC}(\textsc{Unique Disjointness}) = \Omega(n)$.
\begin{thebibliography}{}
\bibitem{rough}
Roughgarden, T. (2016). Communication complexity (for algorithm designers). Foundations and Trends in Theoretical Computer Science, 11(3–4), 217-404.
\bibitem{ya}
Yannakakis, M. (1991). Expressing combinatorial optimization problems by linear programs. Journal of Computer and System Sciences, 43(3), 441-466.
\bibitem{fi}
Fiorini, S., Massar, S., Pokutta, S., Tiwary, H. R., \& Wolf, R. D. (2015). Exponential lower bounds for polytopes in combinatorial optimization. Journal of the ACM (JACM), 62(2), 17.
\bibitem{aho}
Aho, A. V., Ullman, J. D., \& Yannakakis, M. (1983, December). On notions of information transfer in VLSI circuits. In Proceedings of the fifteenth annual ACM symposium on Theory of computing (pp. 133-139). ACM.
\bibitem{ds}
De Simone, C. (1990). The cut polytope and the Boolean quadric polytope. Discrete Mathematics, 79(1), 71-75.
\end{thebibliography}
\end{document}