\documentclass[11pt]{article}
\usepackage{fullpage}
\usepackage{xcolor,amsmath,amssymb,amsthm,graphicx,float}
\input{preamble.tex}
% Commands
\newcommand{\vocab}[1]{\textbf{\color{blue} #1}}
\newtheorem{note}[theorem]{Note}
\newtheorem{example}[theorem]{Example}
\newcommand{\eps}{\varepsilon}
\newcommand{\CC}{\mathbb C}
\newcommand{\FF}{\mathbb F}
\newcommand{\NN}{\mathbb N}
\newcommand{\QQ}{\mathbb Q}
\newcommand{\RR}{\mathbb R}
\newcommand{\ZZ}{\mathbb Z}
% Alec's Commands
% Symbols
\newcommand{\subeq}{\subseteq}
\newcommand{\supeq}{\supseteq}
\newcommand{\subneq}{\subsetneq}
\newcommand{\supneq}{\supsetneq}
\newcommand{\tri}{\bigtriangleup}
\newcommand{\ltri}{\triangleleft}
\newcommand{\ltrieq}{\trianglelefteq}
\newcommand{\rtri}{\triangleright}
\newcommand{\rtrieq}{\trianglerighteq}
\newcommand{\bvert}{\bigg\rvert}
\newcommand{\nab}{\nabla}
\newcommand{\asc}{\nearrow}
\newcommand{\sr}{\stackrel}
\newcommand{\ob}{\overbrace}
\newcommand{\ub}{\underbrace}
\newcommand{\ua}{\uparrow}
\newcommand{\da}{\downarrow}
\newcommand{\la}{\leftarrow}
\newcommand{\lra}{\leftrightarrow}
\newcommand{\lla}{\longleftarrow}
\newcommand{\llra}{\longleftrightarrow}
\newcommand{\es}{\emptyset}
\newcommand{\shuf}{\shuffle}
\newcommand{\ang}{\angle}
\newcommand{\ot}{\otimes}
\newcommand{\te}{\text}
\newcommand{\for}{\forall}
% Operators
\DeclareMathOperator{\D}{d}
\DeclareMathOperator{\sgn}{sgn}
\DeclareMathOperator{\cont}{cont}
\DeclareMathOperator{\type}{type}
\DeclareMathOperator{\Mod}{\bmod}
\DeclareMathOperator{\sh}{shape}
\DeclareMathOperator{\ch}{ch}
\DeclareMathOperator{\cha}{char}
\DeclareMathOperator{\gr}{gr}
\DeclareMathOperator{\Gr}{Gr}
\DeclareMathOperator{\Fl}{Fl}
\DeclareMathOperator{\spa}{span}
\DeclareMathOperator{\sn}{sn}
\DeclareMathOperator{\cn}{cn}
\DeclareMathOperator{\dn}{dn}
\DeclareMathOperator{\am}{am}
\DeclareMathOperator{\Av}{Av}
\DeclareMathOperator{\Des}{Des}
\DeclareMathOperator{\des}{des}
\DeclareMathOperator{\SYT}{SYT}
\DeclareMathOperator{\Mat}{Mat}
\DeclareMathOperator{\supp}{supp}
\DeclareMathOperator{\Supp}{Supp}
\DeclareMathOperator{\ann}{ann}
\DeclareMathOperator{\Ann}{Ann}
\DeclareMathOperator{\Ad}{Ad}
\DeclareMathOperator{\Vol}{Vol}
\DeclareMathOperator{\Int}{Int}
\DeclareMathOperator{\codim}{codim}
% Hats
\newcommand{\til}{\Tilde}
% Letters
\newcommand{\alp}{\alpha}
\newcommand{\bet}{\beta}
\newcommand{\gam}{\gamma}
\newcommand{\Gam}{\Gamma}
\newcommand{\iot}{\iota}
\newcommand{\kap}{\kappa}
\newcommand{\Kap}{\Kappa}
\newcommand{\lam}{\lambda}
\newcommand{\Lam}{\Lambda}
\newcommand{\vheta}{\vartheta}
\newcommand{\ome}{\omega}
\newcommand{\Ome}{\Omega}
\newcommand{\sig}{\sigma}
\newcommand{\vhi}{\varphi}
\newcommand{\zet}{\zeta}
% Blackboard
\newcommand{\DD}{\mathbb D}
\newcommand{\EE}{\mathbb E}
\newcommand{\GG}{\mathbb G}
\newcommand{\HH}{\mathbb H}
\newcommand{\PP}{\mathbb P}
\newcommand{\TT}{\mathbb T}
\newcommand{\one}{\mathbbm 1}
% Fraktor
\newcommand{\fa}{\mathfrak a}
\newcommand{\fb}{\mathfrak b}
\newcommand{\fc}{\mathfrak c}
\newcommand{\fg}{\mathfrak g}
\newcommand{\fm}{\mathfrak m}
\newcommand{\fo}{\mathfrak o}
\newcommand{\fp}{\mathfrak p}
\newcommand{\fq}{\mathfrak q}
\newcommand{\fr}{\mathfrak r}
\newcommand{\fR}{\mathfrak R}
\newcommand{\fs}{\mathfrak s}
\newcommand{\fS}{\mathfrak S}
\newcommand{\fu}{\mathfrak u}
% Text Numbers
\newcommand{\negone}{\text{-1}}
\newcommand{\negtwo}{\text{-2}}
\newcommand{\negthree}{\text{-3}}
\newcommand{\negfour}{\text{-4}}
\newcommand{\negfive}{\text{-5}}
\newcommand{\ten}{10}
\newcommand{\eleven}{11}
% Script
\newcommand{\sA}{\mathscr A}
\newcommand{\sB}{\mathscr B}
\newcommand{\sC}{\mathscr C}
\newcommand{\sH}{\mathscr H}
\newcommand{\sR}{\mathscr R}
\newcommand{\sS}{\mathscr S}
% Brackets
\newcommand{\bigp}[1]{\left( #1 \right)} % (x)
\newcommand{\bigb}[1]{\left[ #1 \right]} % [x]
\newcommand{\bigc}[1]{\left\{ #1 \right\}} % {x}
\newcommand{\biga}[1]{\left\langle #1 \right\rangle} %
% Probability
\newcommand{\Var}[1]{\mathrm{Var}\left[ #1 \right]}
\newcommand{\Cov}[1]{\mathrm{Cov}\left[ #1 \right]}
% Calculus
\newcommand{\p}[1]{\partial #1}
\newcommand{\pd}[2]{\frac{\partial #1}{\partial #2}}
\newcommand{\pdi}[2]{\partial #1 / \partial #2}
% Complexity
\DeclareMathOperator{\Unif}{Uniform}
% Combinatorics
\DeclareMathOperator{\row}{row}
\DeclareMathOperator{\Par}{Par}
% Pictures
\newcommand{\emptybox}[2][\textwidth]{%
\begingroup
\vspace{0.1in}
\setlength{\fboxsep}{-\fboxrule}%
\noindent\framebox[#1]{\rule{0pt}{#2}}%
\vspace{0.1in}
\endgroup
}
\begin{document}
\handout{CS 229r Essential Coding Theory}{January 29, 2020}{Instructor: Madhu Sudan}{Scribe: Alec Sun}{Lecture 2}
\section{Introduction}
The zero-th problem set will be due on Friday. It is worth zero credit, but you are expected to work on the problems. Feedback on solution write-ups will likely be provided.
Today we will talk about the following topics:
\begin{itemize}
\item We will talk about Hamming's contributions, in particular the notion of distance in a code, as well as thinking of such a code as a set of balls. We will also talk about bounds on rate for these codes. All of these things came out of a single paper by Hamming.
\item Then we will talk about Shannon's contributions. We will introduce the concept of Shannon capacity and also prove a converse to Shannon's Theorem.
\end{itemize}
\section{Distances in Hamming Codes}
Last class we represented a coding function as $$E:\Sigma^k \to \Sigma^n,$$ where $\Sigma$ is the finite set, also known as the \vocab{alphabet}, in question. Denote the size $\abs{\Sigma}$ of the alphabet by $q.$ We also considered a decoding function $$\Sigma^n \to \Sigma^k.$$ We will not stress whether or not $D$ serves to detect errors or correct errors today. Recall the following definitions.
\begin{definition}
A \vocab{code} is defined as a set $$C = \bigc{ E(m)\mid m\in \Sigma^k}.$$
\end{definition}
\begin{definition}
An element of $\Sigma^k$ is called a \vocab{message}. The message length $k$ is always relative to the size of the alphabet $\Sigma.$
\end{definition}
\begin{definition}
We denote by $n$ the \vocab{block length}, or simply \vocab{length}.
\end{definition}
\begin{definition}
Define $$\Delta(x,y) = \#\{\text{coordinates where }x\text{ and }y\text{ differ}\}.$$ This is a valid metric, in particular, it satisfies the triangle inequality.
\end{definition}
\begin{definition}
Define the \vocab{distance} of a code $C$ as $$d = \Delta(C) = \min_{\substack{ x,y\in C \\ x\neq y}} \bigc{\Delta(x,y)}.$$
\end{definition}
The four parameters $n,k,d,q$ in the definition of a code above define a $(n,k,d)_q$-code. If $q$ is suppressed, it is assumed that the code is binary.
\begin{note}
Here is a special case of a $(n,k,d)_q$-code. If $\Sigma$ is a field and the code $C$ is linear, namely the encoding function $E$ is linear, then we use square brackets and call $C$ a $[n,k,d]_q$-code.
\end{note}
One combinatorial question we will ask is the following: Which $(n,k,d)_q$ codes are achievable?
\begin{remark}
Here are some easy observations:
\begin{enumerate}
\item A $(n,k,d)_q$-code can be extended to a $(n+1,k,d)_q$-code.
\item A $(n,k,d)_q$-code can be modified to a $(n,k-1,d)_q$-code.
\item A $(n,k,d)_q$-code can be modified to a $(n,k,d-1)_q$-code.
\end{enumerate}
There is no general monotonicity in the achievability of $q.$ In general, one might expect that a larger $q$ is easier to achieve. Hence, one might desire to construct codes with $q$ small. As for the other parameters, the general goal will be to minimize $n,$ maximize $k,$ maximize $d,$ and minimize $q.$
\end{remark}
The Hamming code we constructed last class produced a family of codes depending on a parameter $\ell$ with $n =2^\ell - 1, k = 2^\ell - \ell - 1.$ In particular, we showed the existence of a $[n,k,3]_2$-code. Put another way, for infinitely many $n,$ there exists a $[n, n-\log(n+1),3]_2$ code. The size of the code is $$\abs{C} = 2^{n-\log (n+1)} = \frac{2^n}{n+1}.$$
\begin{remark}
Why is the constant 3 here? We stated that a $t$-error correcting code is equivalent to a $2t$-error detecting code last time. This in turn implies a distance of at least $2t+1$ in order to detect for $2t$ errors. Hence for the 1-error correcting code we presented last time, the distance is at least 3.
\end{remark}
\section{Balls in Hamming Codes}
We will define the concept of a \vocab{ball} in order to prove an upper bound on the size of a code $C\subeq \FF_2^n.$
\begin{definition}
Define a ball of radius $r$ around $x$ to be $$\text{Ball}(x,r) = \bigc{ y\in \Sigma^n \mid \Delta(x,y)\le r}.$$ Define the \vocab{volume} of the ball as the number of points in the ball, namely $$\text{Vol}_q (n,r) = \abs{\text{Ball}(x,r)}.$$ In this definition we have implicitly used the fact that all balls with the same radius have the same volume, hence we can remove the dependence of $\Vol$ on the center of the ball $x.$
\end{definition}
\begin{figure}[H]
\centering
\includegraphics[scale=0.7]{hamming-balls.PNG}
\caption{In the Hamming code context, the general picture to keep in mind is that of a disjoint set of balls in $\Sigma^n.$ A set of balls with radii $t$ have to be disjoint in order to correct $t$ errors.}
\end{figure}
\begin{lemma}[Hamming Bound]
Let $C$ be a code a distance $d$ in $\Sigma^n.$ Then $$\abs{C} \le \frac{\abs{\Sigma}^n q^n}{\text{Vol}_q\bigp{n,\floor{\frac{d-1}{2}}}}.$$
\end{lemma}
\begin{example}
Consider the specific case of distance $d\ge 3.$ Note that $\text{Vol}_2(n,1) = n+1$ because there are $n$ bits that could be switched, and we also have to count the center of the ball. Hence the lemma tells us that if $C\subeq \FF_2^n$ is a code of distance at least 3, then $$\abs{C}\le \frac{2^n}{n+1}.$$ In other words, the Hamming code we constructed last class is optimal with respect to size of the code.
\end{example}
So far we have constructed the following explicit examples of $[n,k,d]_q$-codes:
\begin{itemize}
\item A $[n,n,1]_2$-code exists by simply considering the identity function $E:\{0,1\}^n\to \{0,1\}^n,$ and this is the best possible.
\item A $[n,n-1,2]_2$-code exists by defining \begin{align*}
E:\{0,1\}^{n-1} &\to \{0,1\}^n \\
E(m_1\circ \cdots \circ m_{n-1}) &= m_1\circ \cdots \circ m_{n-1} \circ \bigp{\oplus_{i=1}^{n-1} m_i}
\end{align*}
and this is the best possible.
\item A $[n,n-\log(n+1),3]_2$-code exists, and this is the best possible.
\end{itemize}
One initially surprising fact is that a $[n,n-\log(n+1)-1,4]_2$-code exists by simply adding a parity check bit to the distance 3 code. The paradigm is that the appearance of the expression $$\floor{\frac{d-1}{2}}$$ in the Hamming bound $$\abs{C} \le \frac{\abs{\Sigma}^n q^n}{\text{Vol}_q\bigp{n,\floor{\frac{d-1}{2}}}}.$$ in the Hamming Bound is ingrained in the sense that $k$ decreases by a significant amount in terms of $n$ only after every pair of distances $d.$
In general, there is the following lemma.
\begin{lemma}\label{d->d+1-code}
If $d$ is odd, then a $(n,k,d)_2$-code can be modified to create a $(n+1,k,d+1)_2$-code.
\end{lemma}
\begin{proofof}{Lemma \ref{d->d+1-code}}
See the exercises.
\end{proofof}
\begin{note}
Later in the course, we will prove that $\bigb{n, n-\frac{d}{2}\log n, d}_2$-codes exist. We will also spend a lot of time later in the course talking about efficiency in the encoding and the decoding functions.
\end{note}
\section{Binary Symmetric Channel}
We now turn to a very different foundational paper by Shannon. Shannon's theory differs from Hamming's theory in the sense that Hamming explored the adversarial side of errors in codes while Shannon explored the probabilistic errors. In the Hamming model, it is necessary that every two encodings differ by greater than $2t$ to correct $t$ errors because if not, the worst case in which these two encodings degenerate into the same string after $t$ errors are applied could happen. However, if two encoding differ, for example, by exactly $2t,$ in the Shannon model this degeneration event happens with so little probability that it is negligible. When we use the word ``negligible,'' it will mean ``exponentially small.''
Shannon considers a much more benign model of errors. Shannon introduces the concept of an \vocab{error channel}. See below for a beautiful artist's rendition of the binary symmetric channel.
\begin{figure}[H]
\centering
\includegraphics[scale=0.7]{bsc.PNG}
\end{figure}
\begin{definition}[Binary Symmetric Channel]
To define the binary symmetric channel $\text{BSC}(p),$ we consider someone who first transmits either the bit 0 or the bit 1. With probability $p$ the bit gets flipped, and with probability $1-p$ the bit remains the same.
\end{definition}
\begin{note}
It is natural to consider $p\in (0,1/2),$ otherwise the code is erred more often than not, although technically the results should extend to all $p\in [0,1].$ Furthermore, the binary symmetric channel will act independently on bits.
\end{note}
We informally defined the \vocab{rate} of a code in the Hamming context last class. We define it here for the probabilistic Shannon context.
\begin{definition}
For all $n,$ consider the encoding function $E_n:\{0,1\}^{k_n} \to \{0,1\}^n$ and the decoding function $D_n:\{0,1\}^n\to \{0,1\}^{k_n}.$ The goal in Shannon coding theory is for us to decode the message with high probability. In particular, for a binary symmetric channel acting on a message space $M,$ we desire $$\Pr_{m\sim \Unif(M)} \bigb{ D\bigp{\text{BSC}\bigp{E(m)}}\neq m} = o_n(1).$$ Given this condition, we then want to maximize the \vocab{rate} of a code, also known as the \vocab{capacity} of the Shannon code, which we define as $$\text{Capacity} = \lim_{n\to \infty} \frac{k_n}{n}.$$
\end{definition}
Shannon proved the following theorem in his paper.
\begin{theorem}[Shannon]\label{shannon-theorem}
The capacity of a binary symmetric channel code $\text{BSC}(p)$ is at most $1-H(p),$ where $$H(p) = p\log_2 \frac{1}{p} + (1-p)\log_2 \frac{1}{1-p}$$ is the entropy of the binary symmetric channel.
\end{theorem}
We will explain some elements of the proof. One can derive by using Stirling's Approximation that the volume of a radius $pn$ ball in $\FF_2^n$ can be approximated as $$\Vol_2 (n,pn) \approx 2^{(H(p) + o(1))n}.$$ In other words, this volume is related to the entropy $H(p)$ of the binary symmetric channel $\text{BSC}(p).$
How do we define an error correcting code in this model? One of the key ideas is the \vocab{probabilistic method}. It is debated who invented this method even though Erd\H{o}s published a paper on it. In particular, one of the pieces of evidence in the debate is Shannon's paper, which is an old and excellent application of this method. We now highlight the idea in Shannon's proof of Shannon's Theorem. We note that the following proof of Theorem \ref{shannon-theorem} is non-constructive, as is common with proofs using the probabilistic method. \newline
\begin{proofof}{Theorem \ref{shannon-theorem}}
First we pick a random encoding function. $E:\{0,1\}^k\to \{0,1\}^n.$ That is, for every message $m\in \{0,1\}^k,$ we map $m$ uniformly and independently into $\{0,1\}^n.$ Recall that the code associated to this function is $$C= \bigc{E(m)\mid m\in \{0,1\}^n}.$$ We define the decoding function $D:\{0,1\}^n\to \{0,1\}^k,$ which takes in an input $y = \text{BSC}(E(m)),$ as follows. On input $y\in \{0,1\}^n$:
\begin{itemize}
\item If $\text{Ball}\bigp{y, (p+\eps)(n))} \cap C = \es,$ output ``error.''
\item If $\text{Ball}\bigp{y, (p+\eps)(n))} \cap C = \{E(m)\},$ output $m.$
\item Otherwise, if $\abs{\text{Ball}\bigp{y, (p+\eps)(n))} \cap C} > 1,$ output ``error.''
\end{itemize}
In other words, we output if a small ball of radius slightly greater than $pn$ around the encoding is associated to a unique message, otherwise we output an error.
Now we do the analysis of the construction. For a fixed message $m$, what is the probability that such a randomly chosen encoding function $E$ has an exponentially small error probability? Shannon proved the following.
\begin{theorem}
We have $$\Pr_{m,E,\text{BSC}}\bigb{ D\bigp{ \text{BSC}\bigp{E(m)}}\neq m} \le \exp(-n).$$
\end{theorem}
By the probabilistic method paradigm, the above probability is an average over all $E,$ so in particular, proving the above theorem shows that there is at at least one $E$ such that $$\Pr_{m,\text{BSC}}\bigb{ D\bigp{ \text{BSC}\bigp{E(m)}}\neq m} \le \exp(-n).$$
To prove the theorem, We will prove that the probability of an error, namely that the first or third cases happen, is small:
\begin{itemize}
\item If $\text{Ball}\bigp{y, (p+\eps)(n))} \cap C = \es,$ output ``error.''
\item If $\text{Ball}\bigp{y, (p+\eps)(n))} \cap C = \{E(m)\},$ output $m.$
\item Otherwise, if $\abs{\text{Ball}\bigp{y, (p+\eps)(n))} \cap C} > 1,$ output ``error.''
\end{itemize}
\begin{figure}
\centering
\includegraphics[scale=0.7]{case-1.PNG}
\caption{Error in the first case.}
\end{figure}
\begin{figure}
\centering
\includegraphics[scale=0.7]{case-2.PNG}
\caption{No error in the second case.}
\end{figure}
\begin{figure}
\centering
\includegraphics[scale=0.7]{case-3.PNG}
\caption{Error in the third case.}
\end{figure}
We analyze the first case. By the Chernoff bound, the event $\text{Ball}\bigp{y, (p+\eps)(n))} \cap C = \es,$ which happens only if there are at least $(p+\eps)(n)$ errors, occurs with exponentially small probability.
Now we analyze the third case. Recall that we have fixed a message $m,$ which also fixes $E(m).$ We can view the Shannon error model acting on $E(m),$ as well as choosing our random encoding function $E$ on the other messages $m'\neq m,$ respectively as follows:
\begin{itemize}
\item Consider the errors being generated in the process $E(m)\mapsto \text{BSC}\bigp{E(m)}.$ Note that $\text{BSC}\bigp{E(m)}$ is within a ball $\text{Ball}(E(m),(p+\eps'')(n))$ with overwhelming possibility for every $\eps'' > 0$ by the Chernoff bound, hence can we can condition on this event. More formally, if the error probability is exponentially small conditioned on the event $$\text{BSC}\bigp{E(m)}\in \text{Ball}(E(m),(p+\eps)(n)),$$ then it is also exponentially small not conditioned on this event.
\item Finally, randomly choose all $\{E(m'), m'\ne m\}$ uniformly and independently.
\end{itemize}
For any particular $m',$ the probability that $E(m') = \text{BSC}\bigp{E(m)}$ is exactly $2^{-n}.$ Hence by a union bound, the probability that no $E(m')$ for $m'\neq m$ equals $\text{BSC}\bigp{E(m)}$ is bounded above by a union bound over all $m'$ by $$2^k \cdot 2^{-n} \cdot \Vol(n,(p+\eps)(n))$$ since there are at most $2^k$ messages $m'.$ The last step to show that the probability of the third case is exponentially small is to show that
\begin{align*}
2^k \cdot 2^{-n} \cdot \Vol(n,(p+\eps)(n))
&\le 2^k \cdot 2^{-n} \cdot 2^{(H(p)+\eps'')(n)}
\\&\le 2^{-\eps' n}
\end{align*}
for some $\eps'>0.$ In other words, we need
\begin{align*}
k-n+(H(p) + \eps'')n &\le -\eps' n \\
k&\le (1-H(p) - (\eps' + \eps''))n.
\end{align*}
Letting $\eps', \eps''\to 0,$ this shows that the maximum Shannon capacity of $\text{BSC}(p),$ which recall is defined by $k/n,$ is at least $1-H(p)-\eps$ for all $\eps>0.$
\end{proofof}
Next class, we will prove the other direction of Shannon capacity of a binary symmetric channel, namely that $$\text{Capacity}(\text{BSC}(p)) \le 1-H(p) + \eps$$ for all $\eps>0.$ In particular, we show that if $$\text{Capacity} > 1-H(p) + \eps$$ for some $\eps>0,$ then failure in decoding in fact happens with overwhelming probability.
Then we will return to studying general $[n,k,d]_q$-codes. The two parameters we will study are the rate $0\le R \le 1,$ namely the limit $k/n$ as $n\to \infty,$ as well as the \vocab{normalized distance} $0\le \delta\le 1,$ defined by the limit $d/n$ as $n\to \infty.$ In particular, there happen to be trade-offs between these two parameters which make the question of determining which ordered pairs $(R,\delta)$ in the region $[0,1]\times [0,1]$ are possible.
\newpage
\section{Exercises}
Below are two exercises related to the class material that should be straightforward with the hints provided. We have also included a tricky third exercise in the form of a puzzle that relates to creating an error correcting code using Hamming distance. The full details of the solutions will be posted in the later version of the scribe notes.
\begin{enumerate}
\item Prove Lemma \ref{d->d+1-code}: If $d$ is odd, then a $(n,k,d)_2$-code can be modified to create a $(n+1,k,d+1)_2$-code.
As a hint, consider adding a parity-type bit to the end of the encoding similar to the $d=1$ to $d=2$ case discussed last class. After you have proven that this works, explain why the proof fails when $d$ is even.
\item In our goal for the binary symmetric channel, the instructor mentioned that he desired $$\Pr_{m\sim \Unif(M)} \bigb{ D\bigp{\text{BSC}\bigp{E(m)}}\neq m} = o_n(1).$$ However, we can also desire the stronger condition that $$\Pr \bigb{ D\bigp{\text{BSC}\bigp{E(m)}}\neq m} = o_n(1)$$ for every message $m\in M.$ This would correspond with the idea that every message should be decodable with high probability, as opposed to the average decoding probability across messages being high. The construction we gave using the probabilistic method can be modified slightly to achieve this goal as well while maintaining the same Shannon capacity of $1-H(p) - \eps$ for $\eps>0.$ In particular, consider the following idea.
Fix a constant $0<\kappa<1.$ Consider dropping a fraction $\kappa$ of the messages with the greatest error probabilities from the message space $M = \{0,1\}^k.$ For example, if $\kappa=1/2$ then we would purge the $2^{k-1}$ messages for which $$\Pr \bigb{ D\bigp{\text{BSC}\bigp{E(m)}}\neq m}$$ is the highest. Prove that in the new message space $M'$ with $\abs{M'} = (1-\kappa) \abs{M},$ $$\Pr \bigb{ D\bigp{\text{BSC}\bigp{E(m)}}\neq m} = o_n(1)$$ for every message $m\in M'.$
\item Fix an integer $k\ge 2,$ and suppose that $2^k - 1$ wombats are sitting at a table. For each of them, Madhu randomly draws either 0 or 1 on that wombat's head uniformly and independently. This is done so that each wombat can see all other wombats' numbers but not their own number. No communication between wombats is allowed.
Now, each wombat must either privately write down what they think their number is, or abstain from guessing. Each wombat either guesses or abstains simultaneously with every other wombat, so the decision of any wombat should only be a function of the numbers of the other wombats. The wombats will win if at least one wombat guesses their number correctly, and no wombat guesses their number incorrectly. The goal is for the wombats can devise a strategy that has a winning probability of at least $$\frac{2^k-1}{2^k}.$$
\begin{enumerate}
\item Prove that the wombats can achieve the goal when $k=2.$
As a hint, prove that there exists a strategy that is guaranteed to succeed long as the wombats are not assigned either $000$ or $111.$
\item Prove that the wombats can achieve the goal for all $k \ge 2$ by creating a strategy modeled off of an error correcting code in which there is some guaranteed success set $S\subeq \{0,1\}^{2^k-1}$ of Madhu's assignments where the wombats win, as well as some guaranteed failure set $F = S^c$ of Madhu's assignments where the wombats lose. By definition of $S$ and $F,$ given an element $s\in S,$ what can you say about the minimum Hamming distance between $s$ and an element of $F$?
\end{enumerate}
\end{enumerate}
%%%%%%%%
\bibliographystyle{alpha}
\bibliography{bib}
\end{document}