\documentclass[10pt]{article}
\usepackage{amsfonts,amsthm,amsmath,amssymb}
\usepackage{array}
\usepackage{epsfig}
\usepackage{fullpage}
\usepackage{amssymb}
\newcommand{\1}{\mathbbm{1}}
\DeclareMathOperator*{\argmin}{argmin}
\DeclareMathOperator*{\argmax}{argmax}
\newcommand{\x}{\times}
\newcommand{\Z}{\mathbb{Z}}
\newcommand{\Q}{\mathbb{Q}}
\newcommand{\R}{\mathbb{R}}
\newcommand{\N}{\mathbb{N}}
\newcommand{\F}{\mathbb{F}}
\newcommand{\E}{\mathop{\mathbb{E}}}
\renewcommand{\bar}{\overline}
\renewcommand{\epsilon}{\varepsilon}
\newcommand{\eps}{\varepsilon}
\newcommand{\DTIME}{\textbf{DTIME}}
\newcommand{\NTIME}{\textbf{NTIME}}
\renewcommand{\P}{\textbf{P}}
\newcommand{\SPACE}{\textbf{SPACE}}
\usepackage{xcolor}
\usepackage{wrapfig}
\begin{document}
\input{preamble.tex}
\handout{CS 221 Computational Complexity, Lecture 2}{Jan 25, 2018}{Instructor:
Madhu Sudan}{Scribe: Garrett Tanzer}{Time/Space Hierarchy Theorems}
This lecture discusses the proof technique of diagonalization, particularly as it relates to time, space, and other resource hierarchies---``more of the same resources $\rightarrow$ more power''. Similar material is covered in Arora and Barak \cite{AroraBarak} Chapters 3.1 and 3.2.
\section{Definitions}
\subsection{Algorithms}
Before we get to content, we have to clarify exactly what we mean when we say ``algorithm''.
\subsubsection{Properties}
\begin{enumerate}
\item An algorithm is a \textit{finite description}: a constant-sized set of rules that solves a problem on inputs of all lengths. This is a \textit{uniform} model of computation, as opposed to a \textit{nonuniform} model like circuits, where each must solve the problem for one input length $n$.
\item An algorithm is \textit{interpretable}. This means that a second algorithm can receive the first's description as input and simulate its behavior. This gives rise to the notion of a ``universal'' Turing machine---a constant-sized machine that can simulate any other Turing machine on any input, with moderate overhead.
\item Algorithms are \textit{enumerable}. This follows from the previous two properties, and is reflected in the convention that we represent algorithms as binary strings. We can therefore conceive of a sequence $A_1, A_2, \dots, A_i, \dots$ of all possible algorithms. It is important to note that all items in this enumeration must be valid programs. We can ensure this trivially by specifying in the computational model that, for example, syntactically invalid programs output 0 on all inputs.
\end{enumerate}
\subsubsection{Models}
It is important to have established a rigorous computational model, like the canonical Turing machine, in order to precisely analyze the complexity of our algorithms. However, we will not focus too much on the details of these models because there is a large body of work showing their equivalence, with at most polynomial overhead. While this might be an important distinction in finer algorithmic analysis, it is negligible when comparing polynomial and exponential complexity, as we will often do.
\begin{description}
\item Turing Machines
The salient features of a Turing machine are a constant-sized machine---often called $M$---containing a set of rules for state transitions, and a series of memory tapes. We will assume one input tape, one output tape, and two independently addressable working tapes; there are many such variations that may have slight differences in complexity, but they are subsumed in the context of $poly(n)$.
\item NAND++ Programs
Those who took the most recent iteration of CS 121 will be more familiar with NAND++, which can simulate and be simulated by Turing machines with polynomial overhead.
\item C Programs
Higher-level languages in the abstract are also Turing equivalent; this can be useful in order to avoid getting bogged down in the details of a particular model.
\end{description}
\subsection{Languages}
A language is a set of binary strings: $L \subseteq \{ 0, 1 \}^{*} \equiv \underset{n \geq 0}{\bigcup} \{ 0, 1 \}^n$. \\
There is a natural decision problem associated with each language: given $x \in \{ 0, 1 \}^{*}$, decide if $x \in L$?
\subsection{Computability}
Once we have formal definitions, it makes sense to ask what problems we can actually solve on a computer. The first notion of this, developed from the `30s--`50s, was computability. \\
\noindent \textit{Definition:} $L$ is computable (decidable) if there exists an algorithm $A$ such that:
\begin{itemize}
\item $\forall x$, $x \in L \iff A(x) = 1$
\item $A(x)$ halts on all inputs
\end{itemize}
\noindent Note that ``halting'' just means running in finite time for an input, even if that finite time is $O(2^{2^{2^n}})$.
\subsection{Tractability}
Because a binary yes/no answer seemed to be insufficient to describe the hardness of computational problems, in the `60s there was a movement toward recognizing $\P$, defined below, as the set of tractable or efficiently solvable decision problems. See Cobham \cite{Cobham}, Edmonds \cite{Edmonds}, and Peterson \cite{Peterson}. We introduce the notion of \textit{time} and \textit{space complexity} in order to compare the hardness of deciding languages. \\
\noindent \textit{Definition:} $\textbf{TIME}(t(n)) \equiv \{ L \mid \exists A$ solving $L$ with running time $O(t(|x|))$ for every $x \}$
\noindent \textit{Definition:} $\textbf{SPACE}(s(n)) \equiv \{ L \mid \exists A$ solving $L$ using space $O(s(|x|))$ for every $x \}$\\
\noindent \textit{Definition:} $\P \equiv \underset{c > 0}{\bigcup} \textbf{TIME}(n^c)$
\section{Diagonalization}
\textit{Theorem:} $\textbf{TIME}(n^2) \subsetneq \textbf{TIME}(n^3)$ \\
\noindent While it is difficult to prove relationships comparing power of different types of resources, we \textit{are} able to prove that providing appreciably more of a single resource increases power. Today's main theorem, the Time Hierarchy Theorem, can be stated generally as $\textbf{TIME}(f(n)) \subsetneq \textbf{TIME}(f(n) \log(f(n)))$, while the Space Hierarchy Theorem is even tighter at $\textbf{SPACE}(f(n)) \subsetneq \textbf{SPACE}(\omega(f(n)))$; however, we will focus on a more concrete version to simplify the proof. We will develop the technique of ``proof by diagonalization'' in order to do so.
\subsection{Cantor's Theorem}
Cantor's Theorem states that the number of real numbers is greater than the number of integers, or that the set of real numbers is uncountably infinite ($|\mathbb{R}| > |\mathbb{Z}|$). We can without loss of generality restrict reals to the interval $[0, 1]$ and integers to natural numbers and derive the following:\\
\noindent \textit{Theorem:} There is no injective function $f : [0, 1] \rightarrow \mathbb{N}$.\\
\noindent \textbf{Proof:} We prove this by ``diagonalization''. Assuming for the purpose of contradiction that an injective function $f$ exists, we will enumerate $f^{-1}(\mathbb{N})$ and list for each its real-valued (possibly infinite) binary representation. If we come across a natural number for which $f^{-1}$ is undefined (which can happen because $f$ is assumed to be injective, not necessarily surjective), we can fill the row with $0$s or some other convention. \\
\noindent Now, take the complement of each number on the diagonal and treat the sequence as a new real number. The resulting number isn't equal to any row (because $\forall i$, $f(i)$ differs on the diagonal bit), but is still in $[0, 1]$. Therefore, $f$ isn't injective, which is a contradiction. So, there exists no injective function $f : [0, 1] \rightarrow \mathbb{N}$. $\blacksquare$ \\
\begin{tabular}{r | c c c c c c c}
& & & $\mathbb{R}$ \\
\hline
$f^{-1}(1)$ & \textcolor{red}{0} & 1 & 1 & 0 & 1 & $\dots$ \\
$f^{-1}(2)$ & 1 & \textcolor{red}{0} & 1 & 1 & 0 & \\
$f^{-1}(3)$ & 1 & 0 & \textcolor{red}{1} & 1 & 0 & \\
$f^{-1}(4)$ & 0 & 0 & 0 & \textcolor{red}{0} & 0 & \\
$f^{-1}(5)$ & 1 & 1 & 0 & 0 & \textcolor{red}{1} & \\
$\vdots$ & $\vdots$ & & & & & $\ddots$ \\
& \textcolor{red}{1} & \textcolor{red}{1} & \textcolor{red}{0} & \textcolor{red}{1} & \textcolor{red}{0} & \textcolor{red}{\dots}
\end{tabular} \\
\noindent A concern was brought up about the equivalence of $1$ and $0111\dots$ in binary decimals, but we can avoid this problem by accepting only one such representation or periodically adding a row of all $1$s so a $0$ is introduced into the diagonal complement.
\subsection{Weak Turing's Theorem}
We can apply an analogous version of this argument to the weak version of Turing's theorem.\\
\noindent \textit{Theorem:} $\exists L \subseteq \{ 0, 1 \}^{*} \mid L$ is not decidable.\\
\noindent \textit{Proof:} We will repeat the same process above, letting the number of languages correspond to reals and the number of algorithms correspond to the number of integers. If we relabel rows as algorithms and columns as inputs, we see that we get the same result. $\blacksquare$ \\
\begin{tabular}{r | c c c c c c c}
& $x_1$ & $x_2$ & $x_3$ & $x_4$ & $x_5$ \ & $\dots$ \\
\hline
$A_1$ & \textcolor{red}{0} & 1 & 1 & 0 & 1 & $\dots$ \\
$A_2$ & 1 & \textcolor{red}{0} & 1 & 1 & 0 & \\
$A_3$ & 1 & 0 & \textcolor{red}{1} & 1 & 0 & \\
$A_4$ & 0 & 0 & 0 & \textcolor{red}{0} & 0 & \\
$A_5$ & 1 & 1 & 0 & 0 & \textcolor{red}{1} & \\
$\vdots$ & $\vdots$ & & & & & $\ddots$ \\
& \textcolor{red}{1} & \textcolor{red}{1} & \textcolor{red}{0} & \textcolor{red}{1} & \textcolor{red}{0} & \textcolor{red}{\dots}
\end{tabular}
\subsection{Turing's Theorem}
Now we will prove the strong version of Turing's Theorem using the weak version.\\
\noindent \textit{Theorem:} $HALT = \{ (A, x) \mid A$ halts on input $x \}$ is undecidable.\\
\noindent We also define the diagonal halting problem and its complement. \\
\noindent $D$-$HALT = \{ A \mid (A, A) \in HALT\}$.
\noindent $\overline{D\text{-}HALT} = \{ A \mid (A, A) \not\in HALT\}$. \\
\begin{tabular}{r | c c c c c c c}
& $A_1$ & $A_2$ & $A_3$ & $A_4$ & $A_5$ \ & $\dots$ \\
\hline
$A_1$ & \textcolor{red}{0} & 1 & 1 & 0 & 1 & $\dots$ \\
$A_2$ & 1 & \textcolor{red}{0} & 1 & 1 & 0 & \\
$A_3$ & 1 & 0 & \textcolor{red}{1} & 1 & 0 & \\
$A_4$ & 0 & 0 & 0 & \textcolor{red}{0} & 0 & \\
$A_5$ & 1 & 1 & 0 & 0 & \textcolor{red}{1} & \\
$\vdots$ & $\vdots$ & & & & & $\ddots$ \\
& \textcolor{red}{1} & \textcolor{red}{1} & \textcolor{red}{0} & \textcolor{red}{1} & \textcolor{red}{0} & \textcolor{red}{\dots}
\end{tabular}\\
\noindent \textbf{Proof:} Assume for the sake of contradiction that $HALT$ is decidable. If $HALT$ is decidable, then $D$-$HALT$ is decidable because it is a special case of $HALT$. If $D$-$HALT$ is decidable, $\overline{D\text{-}HALT}$ is decidable using the complement. This is a contradiction, since we know by diagonalization that $\overline{D\text{-}HALT}$ is undecidable by any algorithm. Therefore, $HALT$ is undecidable. $\blacksquare$
\subsection{Time Hierarchy Theorem}
Now we want to modify this argument to prove $TIME(n^2) \neq TIME(n^3)$.
\subsubsection{Enumerating $TIME(n^2)$}
First, we need to figure out how to enumerate all algorithms that run in time $(O(n^2))$. It is not obvious how to do this, particularly because the language $\{A \mid A \in \mathbf{TIME}(n^2)\}$ is undecidable by Rice's Theorem, but it can be done. We will enumerate over all triplets $(A, n_0, c)$, which represent the algorithm $A$ run for $cn^2 + n_0$ time steps. If $A(x)$ halts in that time, output $A(x)$, else output $0$. \\
\noindent We know that this enumeration scheme is exhaustive by the definition of Big O notation:\\
\noindent \textit{Claim:} $\forall A, n_0, c$, $(A, n_0, c) \in \mathbf{TIME}(n^2)$
\noindent \textit{Claim:} $\forall A \in \mathbf{TIME}(n^2)$, $\exists n_0, c \mid A \equiv (A, n_0, c)$
\begin{tabular}{r | c c c c c c c}
& $x_1$ & $x_2$ & $x_3$ & $x_4$ & $x_5$ & $\dots$ \\
\hline
$(A_1, 3, 0)$ & \textcolor{red}{0} & 1 & 1 & 0 & 1 & $\dots$ \\
$\vdots$ & $\vdots$ \\
$(A_1, 5, 10)$ & 1 & \textcolor{red}{0} & 1 & 1 & 0 \\
$\vdots$ & $\vdots$ \\
$(A_2, 7, 6)$ & 1 & 0 & \textcolor{red}{1} & 1 & 0 \\
$\vdots$ & $\vdots$ & & & & & $\ddots$ \\
& \textcolor{red}{1} & \textcolor{red}{1} & \textcolor{red}{0} \textcolor{red}{\dots}
\end{tabular}\\
\noindent Therefore, we have proven that there exists a language that \textit{cannot} be decided in time $O(n^2)$. But we still need to prove that this language \textit{can} be decided in time $O(n^3)$.
\subsubsection{Simulating in $n^3$ Time}
Our strategy to compute this $L \not\in \mathbf{TIME}(n^2)$ we constructed by diagonalization is to simulate the algorithm $A_i$ on a universal interpreter, then negate the input. There are established results showing existence of a universal interpreter that takes $t(n)\log(t(n))$ time and $s(n)$ space to run a program that takes $t(n)$ time and $s(n)$ space, for ``nice'' values of $t(n)$ and $s(n)$. Therefore, we can interpret a machine in $\mathbf{TIME}(n^2)$ in $2n^2 \log{n} = o(n^3)$ time.\\
\begin{wrapfigure}{R}{0.24\textwidth}
\centering
\vspace{48pt}
\includegraphics[width=0.2\textwidth]{enumeration}
\caption{An example of an enumeration where each algorithm appears infinitely often.}
\end{wrapfigure}
\noindent The problem is that while this $A_i$ will take time $O(n^2)$, the hidden constant behind the $O$ may be $2^i$, $2^{2^i}$, etc., and $n = |i|$---so we can't necessarily simulate $A_i(x_i)$ in time $n^3$. We can resolve this issue by using an enumeration $B_1, \dots, B_j, \dots$ where each algorithm $A_i$ appears infinitely often, and where there exists an efficient function $i(j)$ that transforms an index in the series of algorithms $B_j$ into the index for the equivalent $A_i$. That is to say, $B_j = A_{i(j)}$. This way, once $n$ is arbitrarily large, the constant factor of the original, repeated algorithm is arbitrarily small compared to input length.\\
\noindent \textit{Claim:} $L \not\in \mathbf{TIME}(n^2), L \in \mathbf{TIME}(n^3)$\\
\noindent \textit{Proof:} We culminate in the following algorithm describing $L$:
\begin{itemize}
\item Find $i(j)$, and get the original (unpadded) algorithm $A_i = B_j$ in time $O(n^3)$.
\item Try to compute $A_i(B_j)$ in time $O(n^3)$.
\item If $A_i(B_j)$ halts within that time, output $\overline{A_i(B_j))}$.
\item Else output 0.
\end{itemize}
\noindent We see that this algorithm runs in $O(n^3)$ time, since we explicitly bound computation. However, we also see that for sufficiently large inputs $B_j$, the constant factor in $A_i$ is arbitrarily small, so we can definitely complement the output in $O(n^3)$ time. Since we repeat the algorithms (all of which are in $\mathbf{TIME}(n^2)$) an infinite number of times, this proves by diagonalization that $L \not\in \mathbf{TIME}(n^2)$. Therefore, $\mathbf{TIME}(n^2) \subsetneq \mathbf{TIME}(n^3)$. $\blacksquare$
\subsubsection{Looking Ahead}
There are some other interesting results related to the hierarchy theorems that we either aren't yet equipped to engage with or will not cover in this course.
Ladner's Theorem, often invoked in the context of the class $\textbf{NPI}$ ($\NP$-intermediate), can be used to show that if $\P \neq \NP$, then problems exist that are neither in $\P$ nor are $\NP$-complete. More generally, it states that given two complexity classes $C_1 \subsetneq C_2$, then $\exists C_{1.5} \mid C_1 \subsetneq C_{1.5} \subsetneq C_2$. We can use this theorem repeatedly to find as many strictly contained classes as desired.
Another problem amenable to diagonalization is $\NTIME(n) \overset{?}{=} \NTIME(n^2)$. Because the complement becomes more complicated once we add nondeterminism, we can't invert the diagonal as easily as in the proofs for deterministic algorithms. However, using a technique called ``lazy diagonalization'', it has been proven that:\\
$\NTIME(t(n)) \not\subseteq$ co-$\NTIME(t(n)^2)$
$\NTIME(t(n)) \subsetneq \NTIME(\omega(t(n+1)))$
%%%%%
\begin{thebibliography}{1}
\bibitem{AroraBarak} Arora and Barak, \textit{Computational Complexity: A Modern Approach}
\bibitem{Cobham} Alan Cobham, ``The intrinsic computational difficulty of functions'' (1965)
\bibitem{Edmonds} Jack Edmonds, ``Paths, trees, and flowers'' (1965)
\bibitem{Peterson} Peterson ???
\end{thebibliography}
\end{document}