\documentclass[10pt]{article}
\usepackage{amsfonts}
\newtheorem{define}{Definition}
%\newcommand{\Z}{{\bf Z}}
%\usepackage{psfig}
\oddsidemargin=0.15in \evensidemargin=0.15in \topmargin=-.5in
\textheight=9in \textwidth=6.25in
\begin{document}
\input{preamble.tex}
\lecture{11}{October 18, 2004}{Madhu Sudan}{Elena Grigorescu}
\section{Overview}
Today we will be relating Shannon's capacity to coding and
decoding algorithms that could achieve this capacity. We will be
mainly concerned with correcting random and adversarial errors in
binary codes. We deal with the following two cases
\begin{enumerate}
\item decoding from fraction $p$ random error
\item decoding from fraction ${{1}\over{2}}- \epsilon$ adversarial
error.
\end{enumerate}
\section{Goals}
\begin{itemize}
\item Recall the random error channel with bits flipped independently at random w.p. $p$. Shannon's result stated that
information transmission is feasible at a rate of $R=1-H(p)-\epsilon$. We will show that this is
achievable with efficient encoding and decoding.
\item In the adversarial error model, since we cannot obtain an optimal bound as above, we will focus
on a $( {{1}\over{2}}- \epsilon)$ fraction of flipped bits and would like to produce efficient encoding
and {\it{list}}-decoding algorithms, for
rate $> 0$ ($R(\epsilon)>0$ for $\epsilon >0$).
\end{itemize}
At this point we have already seen a coding scheme that will be
helpful in achieving our goals. As we will show here, the Forney
concatenation method will be enough for our purposes.
\section{The random error case}
Recall Forney's concatenation coding scheme. A message is first
encoded using an outer RS code, and then block-wise encoded using
an inner encoding. From an outer RS code $[N, K, N-K]_N$ and an
inner code $[n, k, d]_2$, with $k=log_2{N}$, results a $[Nn, Kk,
Dd]_2$ code. The decoding is a block-by-block decoding followed
by a RS decoding.
Using the above as our framework, we would like to produce poly time
encoding and decoding algorithms, which would allow us to correct
$p$ errors.
Shannon's theorem stated:
\begin{theorem}
$\exists E:\{0, 1\}^k \rightarrow \{0, 1\}^n$ and $D:\{0, 1\}^n
\rightarrow \{0, 1\}^k$ with $k=1-H(p)-\epsilon n$ s.t.
$$Pr[\mbox{decoding error given } BSC_p ]< exp(-n).$$
\end{theorem}
It can be assumed that in this case the encoding and decoding
algorithms are in exponential time, by picking a linear encoding
function.
We obtain the following corollary to Shannon's theorem.
\begin{corollary}
$\exists E':\{0, 1\}^K \rightarrow \{0, 1\}^N$ and $D':\{0, 1\}^N
\rightarrow \{0, 1\}^K$ with similar parameters s.t.
$$Pr[\mbox{decoding error}]< {{1}\over{N}},$$ and $E'$, $D'$ are poly time algorithms.
\end{corollary}
\begin{proof}
To encode a message $m\in \{0, 1\}^K$, divide it into
${{K}\over{k}}$ blocks of size $k$, where $k=\log K$. Then use the
Shannon encoding function $E$ to encode each of these blocks into
words of size $n$, and concatenate the results. We have thus
obtained $N={{K}\over{k}}n$. Therefore ${{K}\over {N}}={{k}\over
{n}}$ so the rate is preserved by this new encoding.
We have then
$$Pr[\mbox{decoding failure of } (E', D')_K ]\leq {{K}\over{k}} Pr[ \mbox{decoding failure of } (E,D)_k ].$$
Then by picking large enough constant $c$ s.t. $k=c \log K$, the
above probability can be made $\leq {{1}\over{KN}}$
Therefore, the new $E'$ and $D'$ are $poly(N)$ which is what we
hoped for.
\end{proof}
Although this result is nice, we would like to be able to decode
correctly with higher probability, and obtain exponentially small
error in the number of blocks, without changing much the rate.
\subsection{Implementation using Forney's code}
Consider a Forney code given by:
\begin{itemize}
\item outer RS-code: $[N, (1-\epsilon)N, \epsilon N]_N$
\item inner code (assuming $BSC_p$): $[n, k=(1-H(p)-\epsilon)n, d]_2$. We should be able to correct $p$ fraction of random error with $exp(-n)$ decoding failure. These requirements impose that $k\geq log_2 N$.
\end{itemize}
We have obtained a code with the following parameters:
\begin{itemize}
\item $R=(1-H(p)-\epsilon)(1-\epsilon)\geq 1-H(p)-2\epsilon.$ So we had to compromise a little in the rate.
\item $Pr[\mbox{decoding failure}]\leq Pr[{{\epsilon N}\over{2}} \mbox{ inner blocks leading to decoding
failure}]$
$\leq {{N}\choose{{{\epsilon N}\over{2}}}} (exp(-n))^{{\epsilon N}\over{2}} \leq exp(-nN)$ (which is what we wanted!)
\item $\mbox{Running time}= exp(k) poly(N).$ In later lectures we will see that we can get $exp(k) linear(N)$ running time.
\end{itemize}
Since $k$ is a parameter more intrinsic to our solution, it would
be nice to substitute it with something that is more intrinsic to
the channel. This is one of the challenges of current research in
this area. The formal problem can be stated as follows:
\begin{theorem}
Given $BSC_p$, $\epsilon$ and $n$, design encoding $E$ and
decoding $D$ schemes of rate $R=1-H(p)-\epsilon$, block length
$n$, such that the encoding and decoding algorithms run in
$poly({{1}\over{\epsilon}}, n)$ and the probability of decoding
failure is exponentially small in $n$.
\end{theorem}
This completes our discussion of the random error model.
\section{Adversarial error model}
We will be presenting partial progress towards solving the
question of correcting ${{1}\over{2}}-\epsilon$ fraction of
errors.
\subsection{List decoding from $({{1}\over{2}}-\epsilon)$ fraction of errors}
The first question to ask is for what $p$ we will need list
decoding. We know that for a ${{d}\over {2n}}$ fraction we can
correctly uniquely decode. In a previous lecture we alluded to the
following fact, known as the Plotkin bound, that will give us
insight into a bound for $p$. We will later provide two proofs of
the Plotkin bound.
\begin{theorem}
If $C=[n , k, d]_2$ such that ${{d}\over{n}}\geq {{1}\over{2}}$,
then $k\leq \log_2(2n)$.
\end{theorem}
The above bound is tight, since the Hadamard codes achieve exactly
these parameters. However, the rate is very poor.
\begin{corollary}
If $E:\{0,1\}^k\rightarrow \{0,1\}^n$, and $D:\{0,1\}^n\rightarrow
\{0,1\}^k$ correcting $p\geq {{1}\over{4}}$ fraction of
adversarial error with unique decoding, then $k\leq \log_2(2n)$.
\end{corollary}
Since we aim for arbitrarily close to ${{1}\over{2}}$ errors, unique
decoding is not possible and therefore need list
decoding.
We can now state the same goal as we did for random errors,
namely:
\begin{theorem}
Given $\epsilon>0$, design $E:\{0,1\}^k\rightarrow \{0,1\}^n$ and
$D:\{0,1\}^n\rightarrow \{\{0,1\}^k\}^l$, (with $l0$, and
\item $poly(n)$ running time.
\end{itemize}
\end{theorem}
Before we can construct such $E$ and $D$, we need to obtain some
helpful combinatorial results.
\subsection{Combinatorial results}
Suppose we are not interested in the running time of the encoding
and decoding functions. The question is whether we can achieve the
above requirements. The answer is yes, and there are 2 ways to
show this:
\begin{enumerate}
\item non-constructive: $\forall p\in[0, {{1}\over{2}}]$, $\exists$
code of rate $1-H(p)$ that corrects $p$-fraction adversarial error
with $poly(n)$ size list. This is tight in some sense, since we
cannot have $R=1-H(p)+\epsilon'$ to correct $p$ fraction of errors
from $poly(n)$ size lists. It would contradict Shannon's theorem,
since we would be getting ${{1}\over{poly(n)}}$ probability of
decoding failure.
\item distance vs. list decoding connection: Take a code
of ${{d}\over{n}}={{1}\over{2}}-\epsilon^2$. Then it can list
decode a ${{1}\over{2}}-\epsilon$ fraction of errors with poly
size lists.
\end{enumerate}
In order to make these results algorithmic, we start with some
impossibility results.
\begin{theorem}[Plotkin]
Let $c_1, \ldots, c_m\in \{0, 1\}^n$ be codewords s.t. $\forall
i\not=j$ we have $\Delta(c_i, c_j)\geq {{n}\over{2}}$. Then $m\leq
2n$.
\end{theorem}
One can restate the above as follows:
\begin{theorem}
Let $c_1', \ldots, c_m'\in \{-1, 1\}^n$ be vectors s.t. $\forall
i\not=j$ we have $\leq 0$. Then $m\leq 2n$.
\end{theorem}
Note that this conversion can be done using the encoding
$0\rightarrow 1$ and $1\rightarrow -1$, of the bits of a codeword
$c_i$ into the entries of the vector $c_i'$. Also, $=\sum_{k=1}^{n}{ c_{i,k}' c_{i, j}'}=n-2\Delta(c_i, c_j)\leq
0$, if $\Delta(c_i, c_j)\geq {{n}\over{2}}$.
\begin{proof}
Geometric approach:
The proof is inductive. Start with vector $c_m$ and consider its
normal hyperplane $H$. Project all the other $m-1$ vectors to $H$.
Say $v_1, \ldots, v_{m-1}$ are the projections. Note that at most
one of them can be $0$ since the $c_i'$s were different. Also,
note that the $v_i$s satisfy the same property as the $c_i'$s,
namely the inner product of any two is $\leq 0$.
Therefore we obtained at least $m-2$ vectors of dimension $n-1$
satisfying the same conditions as we started with. Inductively,
the theorem follows.
\\\\
Linear Algebraic approach (to be completed in the next lecture):
\\\\
If $m\geq n+1$ then $\exists \lambda_1, \ldots, \lambda_{n+1}\in
\mathbb{R}$ s.t. $\sum_{i=1}^{i=n+1} \lambda_i c_i'=0$. Suppose
$\lambda_j>0$ for $j\leq i$ and $\lambda_j<0$ for $j>i$. Then, let
$v=\sum_{j\leq i}\lambda_j c_j'=-\sum_{j>i}\lambda_j c_j'$.
Therefore $0< =-\sum_{j\leq i, j'>i} \lambda_j \lambda_{j'}
\leq 0$.
It remains to deal with the $v=0$ case.
\end{proof}
\end{document}