\documentclass[10pt]{article}
\newtheorem{define}{Definition}
\usepackage{amsmath}
\usepackage{amsfonts}
%\usepackage{amssymb}
\newcommand{\Res}{\mathrm{Res}}
\newcommand{\emod}[1]{\, (\bmod\ #1)}
%\usepackage{epsfig}
\begin{document}
\input{preamble.tex}
\lecture{8}{Mar 2, 2015}{Madhu Sudan}{Badih Ghazi}
%%%% body goes in here %%%%
\paragraph{Announcement}
Problem Set $1$ is posted on the course website. Try the problems by Friday.
\section{Introduction}
We saw in previous lectures how to factor univariate polynomials. In this lecture, we are interested in factoring bivariate polynomials. The algorithm that we will describe shortly will apply to factoring bivariate polynomials of the forms $f(x,y) \in \mathbb{F}[x,y]$ where $\mathbb{F}$ is a field, but some of the tools that we will develop will be more general. To this end, we will consider polynomials $f(x) \in R[x]$ where $R$ is any commutative ring. Note that setting $R = \mathbb{F}[y]$ will give us bivariate polynomials as a particular case. Furthermore, we will assume throughout the lecture that the polynomials are monic.
One tool that we will use in the analysis of bivariate factoring is the \emph{resultant} $Res(f,g)$ of polynomials $f(x), g(x) \in R[x]$, which has the property that $Res(f,g) = 0 $ if and only if $f(x)$ and $g(x)$ share a non-trivial factor. The resultant has many applications beyond factoring polynomials, and we will see at the end of this lecture one of these applications, namely, proving one direction of Bezout's theorem in the plane.
\section{Overview of Approach}
Let $f(x) \in \mathbb{R}[x]$ be a monic polynomial over a commutative ring $R$. As we mentioned above, we will be interested in this lecture in the case where $R = \mathbb{F}[y]$ where $\mathbb{F}$ is some field. Some of the ideas presented in this lecture will also be used in a later lecture when we will talk about factoring polynomials over the integers, in which case $R = \mathbb{Z}$. At a high level, the algorithm for factoring $f$ has the following four steps:
\begin{enumerate}
\item Find an ideal $I \subseteq R$. In the case where $R = \mathbb{F}[y]$, we will take $I = (y) := \{ \alpha \cdot y ~ | ~ \alpha \in \mathbb{F}[y]\}$. Recall that an ideal of a ring $R$ is a subset that is closed under addition (i.e., for all $a,b \in I$, $a+b \in I$) and closed under multiplication with arbitrary elements of $R$ (i.e., for all $a \in R$ and all $b \in I$, $ab \in I$).
\item Factor $f$ modulo the ideal $I$, i.e., write $f$ as $f = f_1 \dots f_k (mod I)$ where $f_i$ is irreducible for every $i \in [k]$. The hope is that our treatment in previous lectures implies that this step is ``easy''. Indeed, in the case where $R = \mathbb{F}[y]$, taking any $f \in R[x]$ modulo the ideal $(y)$ yields a univariate polynomial in $x$, which we know how to factor (from previous lectures).
\item ``Lift'' the factors $f_i$ to polynomials $\tilde{f}_i$ s.t. $f = \tilde{f}_1 \tilde{f}_2 \dots \tilde{f}_k (mod I^t)$ for some sufficiently large value of $t$. This uses an important technique, called Hensel lifting, that will be covered in a later lecture, and that we will also be useful later in the course. Note that here $I^t$ is the additive closure of $\{\alpha_1 \alpha_2 \dots \alpha_t \beta ~ | ~ \alpha_i \in I, \beta \in R\}$.
\item Go from the lifted factor $\tilde{f}_1$ to a polynomial $g$ that divides $f$ in $R[x]$. This step is the main focus of today's lecture, and we will see that it essentially reduces to solving a linear system over $\mathbb{F}$ when $R = \mathbb{F}[y]$.
\end{enumerate}
Already, one might be a bit skeptical about this approach. Consider the polynomial $f = x^p - x + y \in \F_p[x,y]$, which is irreducible over $\F_p$. Taking $f$ modulo the ideal $I = (y)$ yields the polynomial $x^p - x$, and we know from previous lectures that this polynomial splits into $p$ distinct linear factors over $\F_p$. However, all factors of $f$ modulo $I^t$ for any $t \geq 2$ must be trivial (i.e., either 1 or $f$).
More generally, suppose that $f$ has the factorization $g_1 \cdot \ldots \cdot g_\ell$ in $R[x]$. Can $f$ have fewer factors when it is taken modulo $I$? Can it have more factors? The answer to both questions is, in fact, yes. To see that it can have fewer factors, consider the case when $g_i = \alpha + y \cdot h(x)$ for some $h \in \F[x]$ and $\alpha \in \F$. Then $g_i \emod{I}$ is a constant, so $f$ will have fewer (non-trivial) factors modulo $I$. However, this is actually a ``rare'' event, and we can circumvent this case by instead using a different ideal $I = (y + \beta)$ for an appropriately chosen $\beta \in \F$.
To see that $f$ can have more factors modulo $I$, consider the irreducible polynomial $f = x^p - x + y$ above which has $p$ factors modulo $I$. Unlike the case where $f \emod{I}$ has fewer factors, we cannot simply circumvent this possibility, and there is a potentially one-to-many correspondence between the factors of $f$ and the factors of $f \emod{I}$:
\[
\begin{array}{rcccccc}
f(x,y) &= &g_1(x,y)&\cdot&g_2(x,y)&\cdot\ \ldots\ \cdot&g_\ell(x,y)\\
f \emod{I} &= &\overbrace{f_{1}(x)\cdot\ \ldots\ \cdot f_{i_1}(x)}&\cdot&\overbrace{f_{i_1+1
}(x)\cdot\ \ldots\ \cdot f_{i_2}(x)}&\cdot\ \ldots\ \cdot&\overbrace{f_{i_{\ell-1}}(x)\cdot\ \ldots\ \cdot f_{k}(x)}
\end{array}
\]
However when we cover Hensel lifting we will see that this state of affairs is acceptable, and that repeatedly lifting $f_1$ will give an $\widetilde{f_1}$ that has enough information to recover $g_1$.
\section{The Jump Step}
We now explain Step 4 of the above algorithm.
Suppose that the polynomial $f$ splits into factors $g_1 \cdot \ldots \cdot g_\ell$ (unknown to us), and that we have the factorization $f = f_1 \cdot \ldots \cdot f_k \emod{I}$. Then, this is also a factorization of $\prod_i g_i \emod{I}$, and so $f_1$ is a factor of one of the $g_i \emod{I}$; say without loss of generality that it's a factor of $g_1$. In step 3 of the algorithm, we obtain via Hensel lifting the factors $\widetilde{f_1} \cdot \ldots \cdot \widetilde{f_k}$, with the guarantee that $\widetilde{f_1}$ is a factor of $g_1 \emod{I^t}$ under our assumption that $f_1$ is a factor of $g_1 \emod{I}$. (Note that we have not yet specified an appropriate value of $t$.) Define $d := \deg_x(\widetilde{f_1})$ to be the $x$-degree of $\widetilde{f_1}$. Note that $d < \deg_x(f)$ (unless $f$ is irreducible modulo $I^t$), but $\deg_y(\widetilde{f_1})$ might be very large. Given this setup, we can now state the Jump Problem that we are interested in.
\paragraph{The Jump Problem.} Find two polynomials $g,h \in \F[x,y]$ that satisfy the following conditions:
\begin{enumerate}
\item $\deg_x(g) \leq d$ and $\deg_y(g) \leq d$.
\item $g = \widetilde{f_1} \cdot h \emod{I^t}$.
\item $g$ has minimal $x$-degree.
\end{enumerate}
\medskip
Later on, we will come to the reason why such polynomials might be useful for us, but let's first focus on solving this problem. One thing to notice is that if $(g,h)$ and $(g',h')$ are two pairs that satisfy condition 2, then their sum $(g+g',h+h')$ also satisfies condition 2. In fact, it turns out that the Jump Problem reduces to simply solving a system of linear equations determined by $\widetilde{f_1}$ and $I^t$, where the unknowns are the coefficients of $g$ and $h$. (That this is indeed a linear system relies on the fact that multiplying by $\widetilde{f_1}$ and reducing modulo $I^t$ are both linear operations.) Solving such a system can be done efficiently using basic linear algebra.
We now explain why a solution to the Jump Problem is useful for us. Recall that we are hoping to find the irreducible polynomial $g_1$ such that $f = g_1 \cdot h_1$, where here $h_1 := g_2 \cdot \ldots \cdot g_\ell$. The following lemma shows that, given any solution $(g,h)$ to the Jump Problem, we can find a non-trivial factor of $f$ containing $g_1$ by computing $\gcd(f,g)$. (If $f$ is irreducible, this gcd will give $f$.)
\begin{lemma}
\label{main-lem}
If $(g_1,h_1)$ is a solution to the Jump Problem with $g_1$ irreducible, and $(g_2,h_2)$ is any other solution, and if $t > d^2$, then $g_1 | g_2$.
\end{lemma}
(Note that this also specifies the value of $t$ we need to choose in step 3.) We will not prove this lemma today. Instead we will introduce the {\em resultant}, which is a generally useful tool that in particular will help us prove this lemma.
\section{The Resultant}
In this section, we introduce the resultant, an algebraic tool that will aid in the proof of Lemma \ref{main-lem}. To start, consider the following problem:
\begin{quote}
Given two polynomials $\displaystyle A = \sum_{i=0}^k a_ix^i$ and $\displaystyle B = \sum_{i=0}^\ell b_ix^i$ in $R[x]$,\\decide if $A$ and $B$ have a common non-constant factor.
\end{quote}
\medskip
The resultant $\Res_x(A,B)$ solves this problem. It can be shown to satisfy the following properties.
\begin{enumerate}
\item $\Res_x(A,B) \in R$.
\item $\Res_x(A,B)$ is a polynomial in the coefficients $\{a_i\}_{i \leq k}$, $\{b_i\}_{i \leq \ell}$.
\item $\Res_x(A,B)$ is contained in the ideal generated by $(A,B)$.
\item $\Res_x(A,B) = 0$ if and only if $A$ and $B$ have a common non-constant factor.
\end{enumerate}
Note that we use the subscript $x$ to indicate the variable under consideration. If we are working in the ring $R = \F[y]$, as we will below, then the $a_i$ and $b_i$ coefficients are actually polynomials in $y$.
So, how is the resultant defined? $\Res_x(A,B)$ is the determinant of the following $(k+\ell) \times (k+\ell)$ matrix, known as the {\em Sylvester matrix} associated with $A$ and $B$.
$$M(A,B) = \begin{bmatrix}
a_0 & 0 & \cdots & 0 & b_0 & 0 & \ldots & 0 \\
a_1 & a_0 & & 0 & b_1 & b_0 & & 0 \\
a_2 & a_1 & & \vdots & \vdots & b_1 & & \vdots \\
\vdots & \vdots & \ddots & a_0 & b_\ell & \vdots & \ddots & 0\\
a_k & a_{k-1} & & a_1 & 0 & b_\ell & & b_0\\
0 & a_k & & a_2 & 0 & 0 & & b_1\\
\vdots & \vdots & & \vdots & \vdots & \vdots & & \vdots\\
0 & 0 & \cdots & a_k & 0 & 0 & \cdots & b_\ell
\end{bmatrix} \qquad \qquad \Res_x(A,B) := \det(M(A,B))$$
This already establishes properties 1 and 2 above, though we will look more closely at the second in a moment. But first, a natural question: where does $M(A,B)$ come from ? Its motivation can be found in the proof of the following lemma, which establishes property 4.
\begin{lemma}
Let $G := \gcd(A,B)$. Then, $G$ is non-constant if and only if $\det(M(A,B)) = 0$.
\end{lemma}
\begin{proof}
Assume that $G$ is non-constant. Then, $A \cdot (B/G) + B \cdot (-A/G) = 0$, and thus there exist two non-zero polynomials $C := \sum_ic_ix^i$ and $D := \sum_id_ix^i$ such that $AC + BD = 0$, $\deg(C) < \deg(B)$, and $\deg(D) < \deg(A)$. Then, defining the (column) vector $v = (c_0,\ldots,c_{\ell-1},d_0,\ldots,d_{k-1}) \neq 0$, we have $M(A,B) \cdot v = AC + BD = 0$ and thus $\det(M(A,B)) = 0$. This argument also holds in the other direction, i.e.\ if $\det(M(A,B)) \neq 0$ then there is no such $v \neq 0$ and so $G$ must be constant.
\end{proof}
We now note a few other facts about the resultant. Applying the following general lemma to our matrix shows that the vector $(\Res_x(A,B),0,\ldots,0)$ is in the column span of $M(A,B)$, which establishes property 3.
\begin{lemma}
For all $M \in R^{n \times n}$, the vector $(\det(M),0,\ldots,0)$ is in the column span of $M$.
\end{lemma}
\begin{proof}
$M$ can be put in lower-triangular form by performing only column operations. Letting $\overline{M}$ denote the triangularized matrix, we have $\det(M) = \prod_{i \leq n} \overline{M}_{ii}$.
Finally, observe that the vector $\left(\prod_{i \leq n} \overline{M}_{ii},0,\ldots,0\right)$ is in the column span of any triangular matrix $\overline{M}$.
\end{proof}
In the case when $R = \F[y]$, the following lemma bounds the $y$-degree of $\Res_x(A,B)$.
\begin{lemma}
\label{degree-lem}
If $A,B \in \F[x,y]$ have total degree $k$ and $\ell$ respectively, then $\Res_x(A,B) \in \F[y]$ has degree at most $k\ell$.
\end{lemma}
\begin{proof}
This is essentially a counting argument. Consider the degree of the $(i,j)$th element of $M(A,B)$:
$$\deg(M(A,B)_{ij}) \leq \begin{cases} k - i + j, & \mbox{if } j \leq \ell \\ j - i, & \mbox{if } j > \ell. \end{cases}$$
Therefore for every permutation $\sigma : [k+\ell] \to [k+\ell]$, $\deg\left(\prod_j M(A,B)_{\sigma(j),j}\right) \leq k\ell$, and so $\Res_x(A,B) = \det(M(A,B))$ is a sum of degree $\leq k\ell$ polynomials.
\end{proof}
We conclude by showing how the resultant can be used to prove one direction of B\'{e}zout's Theorem in the plane.
\begin{theorem}
If $A,B \in \F[x,y]$ have total degree at most $k$ and $\ell$ respectively, and they share more than $k\ell$ common zeros, then they have a common non-constant factor.
\end{theorem}
\begin{proof}
We will show that if $A$ and $B$ have $> k\ell$ common zeros, then $\Res_x(A,B) = 0$. Suppose that $(\alpha_1,\beta_1), \ldots, (\alpha_{k\ell+1},\beta_{k\ell+1})$ are the common zeros. We know that $\Res_x(A,B)$ is in the ideal generated by $A$ and $B$, so it must vanish on each of the $\beta_i$. Because $\Res_x(A,B)$ has $y$-degree $\leq k\ell$ by Lemma \ref{degree-lem}, if each of the $\beta_i$ are distinct then $\Res_x(A,B)$ must be identically zero. Of course, the assumption that the $\beta_i$ are distinct is not justified. However, if we work over a large enough extension field $K \supseteq \F$, and perform the following linear transformation for a random $\theta \in K$ $$ (\alpha_i, \beta_i) \longmapsto (\alpha_i,\ \beta_i + \theta \cdot \alpha_i) $$ then with non-zero probability the new $\beta_i$ will all be distinct, which again gives $k\ell + 1$ distinct points on which $\Res_x(A,B)$ vanishes.
\end{proof}
\begin{note}
In the case where $R = \mathbb{F}[y]$, one can alternatively define the resultant as follows. Given polynomials $f(x,y), g(x,y) \in \mathbb{F}[x,y]$, the resultant $\Res_{y}(f,g)$ of $(f,g)$ is the minimal non-zero degree polynomial in $x$, that's then minimal in $y$, s.t. $\Res_y(f,g) \in (f,g) := \{af + bg ~ | ~ a,b \in \mathbb{F}[x,y]\}$.
\end{note}
\paragraph{Acknowledgments}
Some of the material in this lecture is taken from the very nice scribe notes of Eric Miles in the 2012 iteration of the course.
\end{document}