\documentclass[11pt]{article}
\usepackage{latexsym}
\usepackage{amsmath}
\usepackage{amssymb}
\usepackage{amsthm}
\usepackage{epsfig}
\usepackage{graphicx}
\usepackage[fleqn,tbtags]{mathtools}
\newcommand{\handout}[5]{
\noindent
\begin{center}
\framebox{
\vbox{
\hbox to 5.78in { {\bf 6.440: Essential Coding Theory } \hfill #2 }
\vspace{4mm}
\hbox to 5.78in { {\Large \hfill #5 \hfill} }
\vspace{2mm}
\hbox to 5.78in { {\em #3 \hfill #4} }
}
}
\end{center}
\vspace*{4mm}
}
\newcommand{\lecture}[4]{\handout{#1}{#2}{#3}{Scribe: #4}{Lecture #1}}
\newtheorem{theorem}{Theorem}
\newtheorem{corollary}[theorem]{Corollary}
\newtheorem{lemma}[theorem]{Lemma}
\newtheorem{observation}[theorem]{Observation}
\newtheorem{proposition}[theorem]{Proposition}
\newtheorem{definition}[theorem]{Definition}
\newtheorem{claim}[theorem]{Claim}
\newtheorem{fact}[theorem]{Fact}
\newtheorem{assumption}[theorem]{Assumption}
\newtheorem{note}[theorem]{Note}
\newtheorem{notation}[theorem]{Notation}
% 1-inch margins, from fullpage.sty by H.Partl, Version 2, Dec. 15, 1988.
\topmargin 0pt
\advance \topmargin by -\headheight
\advance \topmargin by -\headsep
\textheight 8.9in
\oddsidemargin 0pt
\evensidemargin \oddsidemargin
\marginparwidth 0.5in
\textwidth 6.5in
\parindent 0in
\parskip 1.5ex
%\renewcommand{\baselinestretch}{1.25}
\begin{document}
\lecture{1 --- February 6, 2013}{Spring 2013}{Prof.\ Madhu Sudan}{Badih Ghazi}
\section{Administrative}
\begin{enumerate}
\item Instructor: Prof. Madhu Sudan; Email: madhu@mit.edu
\item Course website: http://people.csail.mit.edu/madhu/ST13/
\item Get added to the mailing list.
\item Course Requirements:
\begin{enumerate}
\item Hand in $3$-$4$ problem sets.
\item Scribe at least one lecture.
\item Complete a course project.
\end{enumerate}
\end{enumerate}
\section{History}
Two seminal papers in the field of coding theory are those of Claude Shannon and Richard Hamming, namely:
\begin{enumerate}
\item The 1948 paper ``A mathematical theory of communication'' \cite{shannon1948} of Shannon where he defined the notion of information.
\item The 1950 paper ``Error detecting and error correcting codes'' \cite{hamming1950error} of Hamming.
\end{enumerate}
The two papers are very interelated but they have different prespectives on what is an error and how to correct it. In the Shannon model, the errors are drawn from a probability distribution whereas they are chosen adversarially in the Hamming model. In the rest of this lecture, we will introduce the Haming model. First, we give an example with a concrete setting of parameters before giving the formal definitions.
\section{Example}
Assume that we have a storage medium that can store a block of $63$ bits and that in the course of one week, one arbitrary (adversarially chosen) bit gets flipped. We would like to have a mechanism that determines which bit got flipped and figures out what the original piece of information was.
\subsection{The repetition code}
In order to encode a message, the repetition code repeats each symbol three times. For instance, the message $[a ~ b ~ c ~ d]$ is encoded by $[a ~ a ~ a ~ b ~ b ~ b ~ c ~ c ~ c ~ d ~ d ~ d]$. Now, assume that an arbitrary bit of the codeword gets flipped. Then, by taking a majority vote in each of the $4$ triplets, we can recover the original message. Note that for every bit of information, we store $3$ bits on the storage device, so the rate of this code is $1/3$. Note that the repetition code can recover some patterns of $2$ bit flips (in fact, it can recover all patterns that have at most one bit flipped in each block) but it cannot recover \emph{all} patterns of $2$ bit flips. Before the works of Hamming and Shannon, the common belief was that the repetition code was roughly the best code that one can construct. In his 1950 paper, Hamming gave $2$ schemes that improve on the repetition code and that we describe next.
\subsection{A better code due to Hamming}
We divide the storage space of $63$ bits into $9$ blocks of $7$ bits each. Each block of $7$ bits will encode a seperate $4$-bit message as follows:
\begin{equation*}
[a ~ b ~ c ~ d] \begin{bmatrix} 1 ~ 0 ~ 0 ~ 0 ~ 1 ~ 1 ~ 0\\ 0 ~ 1 ~ 0 ~ 0 ~ 1 ~ 0 ~ 1\\ 0 ~ 0 ~ 1 ~ 0 ~ 0 ~ 1 ~ 1\\ 0 ~ 0 ~ 0 ~ 1 ~ 1 ~ 1 ~ 1\end{bmatrix} = [a ~ b ~ c ~ d ~ (a \oplus b \oplus d) ~ (a \oplus c \oplus d) ~ (b \oplus c \oplus d)]
\end{equation*}
Note that if we take $2$ different bit patterns $(a,b,c,d)$ and $(a',b',c',d')$, the corresponding $7$-bit blocks will differ in at least $3$ bits. So this code can correct one bit flip. Moreover, it uses $7$ bits of storage for every $4$ bits of information so its rate is $4/7$, which is an improvement over the repetition code. Is this the highest rate code that can correct one bit flip? It turned out that there is a better code that we describe next.
\subsection{The Hamming code with block length $63$}
Hamming constructed a matrix $G \in \mathbb{F}_{2}^{57 \times 63}$ s.t. for all $x,y \in \mathbb{F}_{2}^{57 \times 63}$, the distance between $xG$ and $yG$ is at least $3$. Thus, this code can correct one bit flip. Moreover, its rate is $57/63$ which is an improvement over both previous codes. Remarkably, Hamming proved that this code is optimal in the sense that we cannot encode more than $2^{57}$ messages using $63$ bits of space while requiring that every $2$ of the codewords differ by at least $3$ bits.\\
Moreover, Hamming constructed a matrix $H \in \mathbb{F}_{2}^{63 \times 6}$ s.t. for all $y \in \mathbb{F}_{2}^{63}$, $yH = 0 $ if and only if $y$ is a codeword. It turns out that the matrix $H$ has a special form; its $63$ rows consist of all non-zero binary strings of length $6$. Thus, the matrix $H$ has the following two nice features:
\begin{enumerate}
\item $H$ can be used to determine whether a given element $y$ of $\mathbb{F}_{2}^{63}$ is a codeword or not. We just multiply $y$ by $H$ and check whether the product is $0$ or not.
\item If only the $i$th bit is flipped, then the product will be equal to the $i$th row of $H$ which is the binary representation of the integer $i$.
\end{enumerate}
The matrix $G$ is usually called the generator matrix and the matrix $H$ is called the parity check matrix.
\section{Formal definitions}
We now formalize the notions introduced in the previous section.
\subsection{Codes, distance and error correction}
\begin{definition}(Error correcting code)\\
Let $n$ be a positive integer and $\Sigma$ a finite set. An error correcting code $C$ of block length $n$ over the alphabet $\Sigma$ is a subset of $\Sigma^n$. $\Sigma^n$ is usually called the ``ambient space''.
\end{definition}
\begin{definition}(Hamming distance)\\
Given $x,y \in \{0,1\}^n$, the Hamming distance between $x$ and $y$ is given by $\Delta(x,y) = |\{i \in [n]: x_{i} \neq y_{i}\}|$.
\end{definition}
\begin{proposition}
The Haming distance $\Delta$ is a metric, namely it satisfies:
\begin{enumerate}
\item Non-negativity: $\Delta(x,y) \geq 0$ for all $x,y \in \{0,1\}^n$.
\item Symmetry: $\Delta(x,y) = \Delta(y,x)$ for all $x,y \in \{0,1\}^n$.
\item Triangle inequality: $\Delta(x,y) + \Delta(y,z) \geq \Delta(x,z)$ for all $x,y,z \in \{0,1\}^n$.
\item $\Delta(x,y) = 0$ if and only if $x =y$.
\end{enumerate}
\end{proposition}
Thus, the ambient space $\Sigma^n$ together with the Hamming distance form a metric space. This observation allows us to think about codes not only as combinatorial objects but also as geometric objects. This geometric picture will be very useful when we derive bounds on codes. We next define the notion of distance of a code.
\begin{definition}(Distance of a code)\\
The distance of a code $C$ is defined by $\Delta(C) = \underset{x,y \in C, \atop x \neq y}{\operatorname{min}}{~ \Delta(x,y)}$.
\end{definition}
\begin{definition}$($$(n,k,d)_{q}$-code$)$\\
Let $n,k,d,q$ be positive integers. A $(n,k,d)_{q}$-code $C$ is a subset of $\Sigma^n$ where
\begin{enumerate}
\item $\Sigma$ is a finite set called the alphabet of the code and $|\Sigma|=q$.
\item $\Delta(C) \geq d$.
\item $|C| \geq q^k$.
\end{enumerate}
\end{definition}
\begin{note}
We would like a $(n,k,d,q)$-code to have small $n$, large $k$ and large $d$. Strictly speaking, there is no requirement on $q$. However, given a code with a certain alphabet size, we are usually able to construct codes with the same performance but with larger alphabet sizes. This leads us to try to decrease $q$.
\end{note}
\begin{notation}
For every $x \in \{0,1\}^n$ and $t \in \{0,1,\dots,n\}$, the Hamming ball $B(x,t)$ centered at $x$ and of radius $t$ is given by
\begin{equation*}
B(x,t) = \{y \in \{0,1\}^n: \Delta(x,y) \le t\}
\end{equation*}
\end{notation}
\begin{definition}(Error correction)\\
Let $e \in \{0,1,\dots,n\}$. A code $C$ is said to correct $e$ errors if for all $y \in \{0,1\}^n$, $|B(y,e) \cap C| \le 1$.
\end{definition}
\begin{proposition}
A code $C$ of distance $d$ corrects at least $\lfloor (d-1)/2 \rfloor$ errors. Conversely, a code that corrects $e$ errors has distance at least $2e+1$.
\end{proposition}
\begin{proof}
Since $C$ has distance $d$, the Hamming balls $B(x,\lfloor (d-1)/2 \rfloor)$ and $B(y,\lfloor (d-1)/2 \rfloor)$ are disjoint for any $x,y \in C$. Thus, for all $y \in \{0,1\}^n$, $|B(y,e) \cap C| \le 1$. Conversely, if a code corrects $e$ errors, then for any $x,y \in C$, $B(x,e)$ and $B(y,e)$ are disjoint, which implies that $\Delta(x,y) \geq 2e+1$.
\end{proof}
Most of the codes that we will study in this course will be linear codes, which we now define.
\begin{definition}(Linear codes)\\
A code $C \subseteq \Sigma^n$ is said to be linear if $\Sigma = \mathbb{F}_{q}$ and $C$ is an $\mathbb{F}_{q}$-vector space.
\end{definition}
\begin{notation}
We say that $C$ is an $[n,k,d]_{q}$ code to mean that $C$ is an $(n,k,d)_{q}$ linear code.
\end{notation}
\begin{proposition}
If $C$ is an $[n,k,d]_{q}$ code, then
\begin{enumerate}
\item\label{le:first_eq} There exist linearly independent vectors $b_1,\dots,b_k \in \mathbb{F}_{q}^{n}$ s.t $C = \{ xG ~ | ~ x \in \mathbb{F}_{q}^{k}\}$ where
\begin{equation*}
G = \begin{bmatrix}
b_1^{T}\\
b_2^{T} \\
\dots\\
b_k^{T}
\end{bmatrix}\in \mathbb{F}_{q}^{k \times n}
\end{equation*} is said to be a generator matrix of $C$.
\item\label{le:sec_eq} $\Delta(C) = \underset{x \in C \atop x \neq 0}{\operatorname{min}}{~ wt(x)}$ where $wt(x)$ is the number of non-zero coordinates of $x$.
\item\label{le:third_eq} There exists a matrix $H \in \mathbb{F}_{q}^{n \times (n-k)}$, called a parity check matrix of $C$, s.t. for all $x \in C$, $xH = 0$.
\end{enumerate}
\end{proposition}
\begin{proof}
The first part follows from the fact that $C$ is a $k$-dimensional $\mathbb{F}_{q}$-subspace of $\mathbb{F}_{q}^{n}$. Since $C$ is a linear code, $0 \in C$ and if $x,y \in C$, then $x-y \in C$. Moreover, $\Delta(x,y) = wt(x-y)$. This proves the second part. To prove the third part, note that for any $G \in \mathbb{F}_{q}^{k \times n}$, there exists $H \in \mathbb{F}_{q}^{n \times (n-k)}$ of rank $n-k$ s.t. $GH = 0$.
\end{proof}
\subsection{The Hamming bound}
Hamming proved the following relation between the parameters of any code.
\begin{theorem}(The Hamming bound)\\
Any $(n,k,d)_{q}$ code should satisfy \begin{equation}\label{le:Hamming_bound}
q^{k}Vol(n,\lfloor (d-1)/2 \rfloor) \le q^n
\end{equation}
where $Vol(n,\lfloor (d-1)/2 \rfloor)$ is the volume of a ball of radius
$\lfloor (d-1)/2 \rfloor$ in $\Sigma^n$.\end{theorem}
\begin{proof}
Consider the $q^k$ balls of radius $\lfloor (d-1)/2 \rfloor$ centered at each of the $q^k$ codewords of the code in the ambient space $\Sigma^n$ (where $|\Sigma|=q$). Since the code has distance $d$, the balls are all disjoint. Thus, the theorem follows.
\end{proof}
\subsection{The general Hamming code}
Hamming codes are $[2^{m}-1,2^{m}-m-1,3]_{2}$ codes that are constructed by setting the rows of the parity check matrix $H$ to all non-zero binary strings of length $m$.
\begin{note} ~
\begin{enumerate}
\item The Hamming code is an optimal code of distance $3$ in the sense that it satisfies the Hamming bound with equality. This follows from equation (\ref{le:Hamming_bound}) with $q=2$ and by noting that the volume of any ball in $\Sigma^n$ of radius $1$ is $n+1$.
\item Hamming's construction can be used to obtain codes of distance $4$.
\end{enumerate}
\end{note}
\subsection{Asymptotics of codes}
\begin{definition}(Rate and relative distance of a family of codes)\\
The rate $R$ and relative distance $\delta$ of a family of $(n,k,d)_{q}$-codes are given by
\begin{equation*}
R = \liminf_{n \to \infty} \frac{k}{n}
\end{equation*}
\begin{equation*}
\delta = \liminf_{n \to \infty} \frac{d}{n}
\end{equation*}
\end{definition}
Note that $\delta,R \in [0,1]$. A main component of the course will be concerned with determining the best tradeoffs between $R$ and $\delta$. Note that we can represent any family of codes by the point $(\delta,R)$ in the plane. We are interested in the set of all points $(\delta,R)$ for which there exists a family of codes with rate $R$ and relative distance $\delta$. In later lectures, we will prove upper bounds on $R$ in terms of $\delta$, which will rule out certain regions of the plane as ``forbidden''. For instance, we will show that for any family of codes, $R+\delta \le 1$. Moreover, we will prove existential results which assert that certain points of the plane are ``achievable''.
\begin{note} ~
\begin{enumerate}
\item Our knowledge of the achievable region is not complete. Closing the gap between the achievable and forbidden regions is a main open problem in coding theory.
\item We study families that are not on the boundary of the achievable region (in the sense that there exists a family with a larger rate and a larger relative distance) since they may provide algorithmic advantages in terms of the encoding and decoding algorithms.
\end{enumerate}
\end{note}
\bibliography{lect01}
\bibliographystyle{alpha}
%\begin{thebibliography}{77}
%\bibitem{fks}
%M. Fredman, J. Koml\'{o}s, E. Szemer\'{e}di,
%\emph{Storing a Sparse Table with $O(1)$ Worst Case Access Time},
%Journal of the ACM, 31(3):538-544, 1984.
%\end{thebibliography}
\end{document}