\documentclass[11pt]{article}
\usepackage{amsmath,amssymb,amsthm,color,graphicx}
\DeclareMathOperator*{\E}{\mathbb{E}}
\let\Pr\relax
\DeclareMathOperator*{\Pr}{\mathbb{P}}
\newcommand{\eps}{\varepsilon}
\newcommand{\inprod}[1]{\left\langle #1 \right\rangle}
\newcommand{\R}{\mathbb{R}}
\newcommand{\handout}[5]{
\noindent
\begin{center}
\framebox{
\vbox{
\hbox to 5.78in { {\bf CS 229r: Algorithms for Big Data } \hfill #2 }
\vspace{4mm}
\hbox to 5.78in { {\Large \hfill #5 \hfill} }
\vspace{2mm}
\hbox to 5.78in { {\em #3 \hfill #4} }
}
}
\end{center}
\vspace*{4mm}
}
\newcommand{\lecture}[4]{\handout{#1}{#2}{#3}{Scribe: #4}{Lecture #1}}
\newtheorem{theorem}{Theorem}
\newtheorem{corollary}[theorem]{Corollary}
\newtheorem{lemma}[theorem]{Lemma}
\newtheorem{observation}[theorem]{Observation}
\newtheorem{proposition}[theorem]{Proposition}
\newtheorem{definition}[theorem]{Definition}
\newtheorem{claim}[theorem]{Claim}
\newtheorem{fact}[theorem]{Fact}
\newtheorem{assumption}[theorem]{Assumption}
% 1-inch margins, from fullpage.sty by H.Partl, Version 2, Dec. 15, 1988.
\topmargin 0pt
\advance \topmargin by -\headheight
\advance \topmargin by -\headsep
\textheight 8.9in
\oddsidemargin 0pt
\evensidemargin \oddsidemargin
\marginparwidth 0.5in
\textwidth 6.5in
\parindent 0in
\parskip 1.5ex
\newcommand{\todo}[1]{\textcolor{red}{TODO: #1}}
\begin{document}
\lecture{25 --- November 26, 2013}{Fall 2013}{Prof.\ Jelani Nelson}{Thomas Steinke}
\section{Overview}
Tody is the last lecture. We will finish our discussion of MapReduce by covering ``Solve-and-Sketch'' \cite{SaS} which is extremely recent work. We will also say a few words about $k$-means \cite{kmpp} and other theoretical ``big data'' activity that we didn't cover.
\section{Solve-and-Sketch}
We will discuss the Solve-and-Sketch approach towards approximation algorithms for low-dimensional geometric graph problems in the MapReduce framework \cite{SaS}. In particular, this approach yields efficient parallel algorithms for
\begin{itemize}
\item[(a)] approximate minimum spanning tree,
\item[(b)] approximate minimum cost bipartite matching, and
\item[(c)] earthmover distance.
\end{itemize}
Our input is a set of points $T$ in $\mathbb{R}^d$. We associate these with the complete graph where the edges are weighted by the Euclidean distance (or any $\ell_p$ distance).
Today we will focus on (a).
\subsection{Earthmover Distance}
We will not say anything about earthmover distance beyond defining it.
Consider a set $A$ of points. Each point $x \in A$ has an initial mass $\mu(x) \in \mathbb{R}_+$ associated with it. Each point also has a final mass $\nu(x)$ associated with it. The cost of moving $\eta$ units of mass from point $x$ to point $y$ is $\eta \cdot ||{x-y}||$. The \emph{earthmover distance between $\mu$ and $\nu$} is the minimum total cost of a series of operations to move the mass from the initial configuration to the final configuration.
Earthmover distance defines a metric on distributions. It is used in computer graphics. Here the points are the pixels and the initial and final configurations are the brightness values of two images. The earthmover distance is empirically a good measure of similarity between images.
\subsection{Solve-and-Sketch Approach}
The approach is to hierarchically partition the input set $T$. The root node contains all points $T$ and the leaves have one point each. There are $L$ levels and the branching factor is $c$.
For example, a quadtree gives such a hierarchical partition by partitioning $\mathbb{R}^d$ into $c=2^d$ orthants.
Our goal is to recursively compute a minimum spanning tree going up the hierarchical partition. In particular, we want to be able to contract edges and thereby reduce the number of points we need to keep track of at higher levels. However, we don't want to ``regret'' choosing an edge later.
\paragraph{Problem}
A problematic input in the pland and quadtree partition is the following.
\begin{center}
\begin{tabular}{|ccc|ccc|}
\hline
&&\textbullet&\textbullet&&\\
&&&&&\\
&&\textbullet&\textbullet&&\\
\hline
&&&&&\\
&&&&&\\
&&&&&\\
\hline
\end{tabular}
\end{center}
The optimal solution joins the upper and lower pair of points with short edges and uses one long edge to join both pairs together. The hierarchical solution uses two long edges to join the points within quadrants. Thus we see that using a plain quadtree gives poor results.
\paragraph{Solution}
Instead we sill use a randomly shifted and rotated quadtree. Intuitively, the above problem is very brittle. If the partition is shifted slightly, the problem disappears. A randomized quadtree ensures that nearby points are likely to end up in the same subtree.
The overall Solve-and-Sketch algorithm locally computes a partial solution as well as some extra information, based soley on what information is passed to it from its $c$ children. This is called the ``unit step'': on input of size $n_u$ it produces output of size $p_u(n_u)$ and runs in time $t_u(n_u)$ using space $s_u(n_u)$.
\begin{theorem}
Fix $s = (\log n)^{\Omega(d)}$. Suppose there is a unit step with $s_u(p_u(s)) \leq s^{1/3}$ and $p_u(s) \leq s^{1/3}$. Then, for $c=s^a$ ($a < 1$) and $L=O(d \log_s n)$ we can implement Solve-and-Sketch in MapReduce in $\text{poly}( \log_s n)$ rounds where each node's runtime is $s \cdot t_u(s) \text{poly}(\log n)$.
\end{theorem}
\subsection{Randomized Hierarchical Partitions}
\begin{definition}[{\cite{talwar}}] \label{RHP}
A randomized hierarchical partition $B$ is \emph{$(a,b)$-distance preserving with approximation $\gamma > 1$} ($0 < a < 1$) if, for $\Delta_\ell = \gamma \cdot a^{L \ell} \cdot \text{diam}(T)$ and every partition $P=(P_0 \cdots P_L)$ in the support of the distribution we have
\begin{itemize}
\item $\forall \ell \geq 0 ~~ \text{diam}(P_\ell) \leq \Delta_\ell$ and
\item $\forall x,y \in T, \ell \geq 0 ~~ \Pr[\text{$x$ and $y$ are separated at level $\ell$}] \leq b ||x-y||_2 / \Delta_\ell$.
\end{itemize}
\end{definition}
\begin{theorem}[{\cite[\S 6]{aks}}]
A randomly shifted and rotated $c$-ary quadtree is $(c^{-1/d},O(d))$-distance preserving with approximation $O(1)$.
\end{theorem}
\subsection{Unit Step for Minimum Spanning Tree}
A node is responsible for the set $C$ of points in its subtree. It computes a `spanning forest'.
\begin{itemize}
\item[Input.] $V(C) \subset C$ and a partition $\{ Q_1 \cdots Q_k \}$ of $V(C)$ corresponding to the connected components of $V(C)$ based on edges that were contracted at lower levels.
\item[1.] While $\exists i, j \in [k]$ such that $\ell_2(Q_i,Q_j) \leq \varepsilon \Delta_\ell$:
\item[1a.] Pick $u \in Q_i$ and $v \in Q_j$ such that $||u - v||_2$ is minimal.
\item[1b.] Output edge $(u,v)$ as an edge in the final spanning tree.
\item[1c.] Merge $Q_i$ and $Q_j$.
\item[2.] Output $V' \subset V(C)$ that is an $\varepsilon^2 \Delta_\ell$-cover of $V$ and the induced partition $Q'$ of $V'$.
\end{itemize}
Recall that, if $(X, d)$ is a metric space, $X' \subset X$ is an $\varepsilon$-cover of $X$ if, for all $x \in X$, there exists $x' \in X'$ such that $d(x,x') \leq \varepsilon$.
For input points $u$ and $v$, we define $\ell_\text{cut}(u,v)$ to be the highest level in the hierarchy where $u$ and $v$ are not in the same node.
Each node does the following.
\begin{itemize}
\item[] It gets $V_1' \cdots V_c'$ and partitions $Q_1' \cdots Q_c'$ from its children.
\item[] It sets $V = \bigcup_i V_i'$ and $Q = \bigcup_i Q_i'$. (At leaves $V$ is all the points and $Q$ a partition into singletons.)
\item[] It runs the unit step with input $V$ and $Q$ to obtain output $V'$ and $Q'$.
\item[] It passes $V'$ and $Q'$ to its parent.
\end{itemize}
\subsection{Analysis Sketch}
\begin{itemize}
\item[(i)] It's clear that the output is a forest. It turns out that we get a spanning tree.
\item[(ii)] Show that there exist edge weights such that the algorithm outputs a minimum spanning tree with respect to those weights (rather than the original distance weights).
\item[(iii)] There is a way to define these weights in terms of the tree such that $$\forall u, v ~~~ ||u-v||_2 \leq w(u,v) \leq (1 + \beta)||u-v||_2 + \alpha \Delta_{\ell_\text{cut}(u,v)},$$ where $\beta = O(\varepsilon)$ and $w(u,v)$ is the weight of edge $(u,v)$. Thus $\E[w(u,v)] \leq (1 + O(\varepsilon)) ||u-v||_2 + \alpha \E[\Delta_{\ell_\text{cut}(u,v)}]$.
\item[(iv)] Because we used a randomized hierarchical partition (Definition \ref{RHP}), we have $$\E[\Delta_{\ell_\text{cut}(u,v)}] = \sum_{\ell = 0}^L \Pr[\ell_\text{cut}(u,v)=\ell] \cdot \Delta_\ell \leq \sum_{\ell = 0}^L b ||u - v||_2/\Delta_\ell \cdot \Delta_\ell = b ||u - v||_2 L.$$
\end{itemize}
\section{Other MapReduce Topics}
We were not able to cover all the work on MapReduce. Here is an incomplete list of topics.
\subsection{Sorting}
We can perform sorting in MapReduce with $O(\log_M N)$ rounds and $O(N \log_M N)$ total communication, where $M$ is the Input/Output bound on individual mappers/reducers; see \cite{gsz}. This sorting algorithm is used as a subroutine in Solve-and-Sketch.
\subsection{$k$-means Clustering}
Given $x_1 \cdots x_n$ we wish to find $k$ centers $c_1 \cdots c_k$ that minimize $\sum_{i \in [n]} \min_{j \in [k]} ||x_i - c_j||_2^2$.
A popular method is Lloyd's algorithm:
\begin{itemize}
\item[1.] Start with initial centers $c_1 \cdots c_k$.
\item[2.] Partition the points into clusters.
\item[3.] Move each center to the average of the points in its cluster.
\item[4.] Repeat.
\end{itemize}
Lloyd's algorithm can only improve the objective function (see problem set 4 problem 2a). But it remains to choose good initial centers.
\paragraph{$k$-means$++$} The initial centers can be chosen as follows \cite{av}.
\begin{itemize}
\item[1.] Let $C$ be one random data point.
\item[2.] While $|C|