Introduction to Ring Theory (7)

Polynomial Rings

A polynomial over a ring R is an expression of the form:

$f(x) = a_0 + a_1 x + a_2 x^2 +\ldots$ , where $a_i \in R$ , and $\exists n, 0 = a_{n+1}=a_{n+2}=\ldots$ .

Let’s get some standard terminology out of the way. The element a_i is called the coefficient of xⁱ. The largest n for which a_n≠ 0 is called the degree of f. [For the zero polynomial, it’s convenient to denote deg=-∞. ] If deg(f)=n, we call the coefficient of xⁿ the leading coefficient of f(x). If the leading coefficient is 1, we say the polynomial is monic.

Important: we should think of f(x) as a symbolic expression and not a function, for now, because there’re some dangers lurking underneath the functional interpretation of f.

For starters, we have the following result:

Theorem. The set of polynomials over R, denoted R[x], is a ring. The addition and product operations are the standard polynomial addition and product: let $f=\sum_i a_i x^i$ and $g=\sum_i b_i x^i$ ; then

$f+g = \sum_i c_i x^i$ , where $c_i = a_i+b_i$ ;

$fg = \sum_i d_i x^i$ , where $d_i = \sum_{k=0}^i a_k b_{i-k}$ .

The unity 1 is just, well, 1 (i.e. 1 + 0x + 0x² + … ). The proof is easy but a bit tedious, so we’re leaving it to the reader as an exercise. Note that R[x] is a commutative ring if and only if R is commutative.

The first danger of the day is that deg(fg) ≠ deg(f)deg(g) if R has zero-divisors. Indeed, if deg(f)=m and deg(g)=n, then the coefficient of x^m+n is a_mb_n which can be zero even if a_m and b_n are not.

Proposition. If R has no non-zero zero-divisors, then deg(fg) = deg(f) + deg(g) for any polynomials f, g.

Our next definition is a natural extension from the case of real polynomials.

Definition. The derivative of f(x), denoted ${df}{dx}$ or just f'(x), is given by:

$f(x) = a_0 + a_1 x + a_2 x^2 + \ldots \implies \frac{df}{dx} = a_1 + 2a_2 x + 3a_3 x^2 + \ldots.$

Once again, the derivatives are formal derivatives, in the sense that we’re not thinking in terms of calculus and limits. Our derivative is merely a function which takes a polynomial and outputs another. The usual properties regarding derivatives of sums and products still hold.

$\frac{d}{dx}(f+g) = \frac{df}{dx} + \frac{dg}{dx}$ ;
$\frac{d}{dx}(fg) = f\frac{dg}{dx} + \frac{df}{dx}g$ ;
$\frac{d}{dx}(f^n) = n\cdot f^{n-1} \frac{df}{dx}.$

Do take note of the order of multiplication in the product law since the ring may well be non-commutative. Once again, the proof is left to the reader, but we’ll leave a hint to make his/her job easier. For the product law, note that the maps

$(f,g)\mapsto \frac{d}{dx}(fg)$ and $(f,g) \mapsto f\frac{dg}{dx} + \frac{df}{dx}g$

are both bilinear in f and g. In other words, if we fix f (resp. g), then the resulting map is linear in g (resp. f). This reduces to the case of monomials $f(x) = x^m$ , $g(x) = x^n$ .

Now, the next bombshell is: $\deg(\frac{df}{dx}) \ne \deg(f)-1$ in general, even for commutative rings, in fact, even for fields. Note that if the leading coefficient of f is $a_n x^n$ , then its derivative is $na_n x^{n-1}$ , which can be zero even if a_n ≠ 0.

E.g. let R = Z/3 which is a finite field. Then the polynomial $f(x) = x^3 + x^2 + 1$ has derivative $3x^2 + 2x = 2x$ since 3 = 3·1 = 0 in the ring R.

In general, we merely have $\deg(\frac{df}{dx}) \le \deg(f)-1$ .

Polynomials as Functions

Any student of elementary algebra knows that we can substitute values in x to evaluate a polynomial. This holds for $f(x) \in R[x]$ too.

Problem 1: product is not preserved if R is non-commutative.

We prefer not to do it when R is not commutative, because if we substitute x=r, then it’s not true in general that (fg)(r) = f(r)g(r).

E.g. consider the quaternion ring H. Suppose f(x) = ix and g(x) = jx. Then fg = (ij)x² = kx². This gives:

f(i) = -1, g(i) = –k but (fg)(i) = –k ≠ f(i)g(i).

The problem occurs because x=r does not commute with the coefficients of f(x), so we won’t attempt to interpret f as a function when the coefficients don’t commute.

Problem 2 : a non-zero polynomial can be a zero function.

This is best given by the example of R = Z/3. Take the polynomial f(x) = x³–x. The only elements of R are {0, 1, 2} mod 3, and all these satisfy f(0) = f(1) = f(2) = 0. More generally, if p is a prime, then the polynomial f(x) = x^p–x over the ring Z/p becomes the zero function by Fermat’s little theorem.

With those caveats out of the way, let’s state the main theorem of this section.

Theorem. Let R be a commutative ring. The evaluation map gives a ring homomorphism:

$R[x] \to \text{Func}(R, R), \ f(x) \mapsto (r \mapsto f(r)),$

where Func(R, R) is the set of functions R → R with addition and product given by: for all $f, g \in \text{Func}(R,R)$ ,

$(f+g)(r) = f(r)+g(r)$ ;

$(f\times g)(r) = f(r)g(r)$ .

The unity 1 is the constant function which takes all r to 1.

We’ve already seen that the evaluation map is not injective in general. Clearly, it’s also far from surjective.

Remainder Theorem

Throughout this section, let’s assume R is commutative.

Theorem. Let $f(x), g(x)\in R[x]$ , where g(x) is monic (recall: this means leading coeff = 1). Then we can write:

$f(x) = g(x)q(x) + r(x)$ ,

for some $q(x), r(x)\in R[x]$ such that deg(r) < deg(g) (this includes g=0, whose degree is -∞).

Furthermore, q(x) and r(x) are unique, and we them the quotient and remainder of f(x) ÷ g(x) respectively.

We’ll sketch the proof: which is almost identical to the case of secondary school algebra.

Case 1: deg(f) < deg(g). The only possibility is q(x)=0, r(x)=f(x), because if q≠0, then deg(gq) = deg(g)+deg(q) ≥ deg(g) > deg(r) (important: the first equality holds because g is monic!) which gives deg(f) = deg(gq+r) = deg(gq). But deg(gq) ≥ deg(g) so this contradicts the case assumption.
Case 2: deg(f) ≥ deg(g). Let $f(x) = a_n x^n + \ldots + a_0$ , where n = deg(f). Show that the leading coefficient of q(x) must be $g(x) = a_n x^{n-m} + \ldots$ , where m = deg(g). Then apply induction on deg(f).

In particular, suppose g(x) = x – c for some c in R. Then f(x) = q(x)(x – c) + r, where r is a constant. Substitute x=c to obtain f(c) = r, so we have:

Remainder Theorem. If $f(x)\in R[x]$ and $c\in R$ , then we have:

$f(x) = (x-c)q(x) + f(c)$ ,

for some $q(x)\in R[x]$ . As a corollary, we get the Factor Theorem : if f(c) = 0, then f(x) = (x-c)q(x) for some polynomial q(x).

[ An element c of R is called a root of f(x) if f(c) = 0. ]

Here’s another way of looking at it. Fixing c, we get a surjective ring homomorphism R[x] → R which takes f(x) → f(c). Then the factor theorem says the kernel of this map is precisely <x–c>.

Example 1

Consider the ring R[x, y, z] of all polynomials with real coefficients in x, y, z. This is (canonically isomorphic to) the ring of polynomials R[z], where R = R[x, y]. Now let (a, b, c) be an ordered triplet of real numbers.

The remainder theorem for R = R[x, y] gives us f(x, y, z) = (z–c)q(x, y, z) + r(x, y).
Applying remainder theorem to R = R[x] gives us r(x, y) = (y–b)q₂(x, y) + r₂(x).
Finally, applying it to R = R gives r₂(x) = (x–a)q₃(x) + r₃.

In conclusion:

$f(x,y,z) = (z-c)q(x,y,z) + (y-b)q_2(x,y) + (x-a)q_3(x) + r_3,$

and upon substituting x=a, y=b, z=c, we get r₃= f(a, b, c). In short, the ring homomorphism R[x, y, z] → R which takes f(x, y, z) to a real number by evaluating it at (a, b, c) has ideal I = <x–a, y–b, z–c>. Since the quotient R[x, y, z]/I is isomorphic to R by the first isomorphism theorem, I is maximal.

Example 2

Suppose R is an integral domain and $f(x)\in R[x]$ is of degree n. Denote the distinct roots of f by $r_1, r_2, \ldots, r_m$ . We shall prove that m ≤ n.

Indeed, since $f(r_m) = 0$ , by the Factor Theorem, $f(x) = g(x)(x - r_m)$ . Now for every other root, $r_i, 1 \le i \le m-1$ , we have $0 = f(r_i) = g(r_i)(r_i - r_m)$ . Since the r_i‘s are distinct, r_i–r_m ≠ 0. Since R is an integral domain, this implies $r_1, \ldots, r_{m-1}$ are also roots of g(x). Since deg(g) = deg(f) – 1, apply induction.

If R has zero-divisors, things can go very wrong. E.g. in R × R, the innocuous looking quadratic equation x²=x has 4 solutions: (0, 0), (0, 1), (1, 0) and (1, 1).

Omake: Shamir’s Secret Sharing Scheme

Polynomials in abstract ring theory are a key ingredient in Shamir’s secret sharing protocol. The problem is as follows: you have a secret code (e.g. 5-digit number) to a safe containing some highly sensitive documents. You’d like to distribute the code among 10 scientists so that:

any 3 scientists can cooperate and obtain the code, but
if any 2 scientists cooperate, they get absolutely no information on the code.

The second factor is critical: not only do the 2 scientists not know the full code, but the amount of knowledge they have pertaining to the code is zilch! In short, cutting up the code into binary bits and distributing them is out of the question, since each scientists would then know something about the code (each bit cuts the number of possibilities by half).

Shamir’s ingenious solution is as follows: pick a reasonably large prime p and take the finite field R = Z/p. Now:

The computer randomly picks a polynomial $f(x) = ax^2 + bx + c$ such that f(0) = c = secret code. The case of a=0 is not excluded.
For i = 1, 2, …, 10, the computer then calculates f(i) and sends it to scientist #i.
If any three scientists cooperate, they would have f(i), f(j), f(k) for three distinct positive integers i, j, k, and we know that every quadratic equation is uniquely determined by 3 points.
If two scientists cooperate to get f(i) and f(j), the amount of information they have on the code is still zilch since for every value of f(0), there is a unique solution for f.

Step 3 can be performed by the Lagrange interpolation formula if one so desires: to interpolate (i, f(i)), (j, f(j)) and (k, f(k)), compute:

$f(x) = f(i)\cdot\frac{(x-j)(x-k)}{(i-j)(i-k)} + f(j)\cdot\frac{(x-i)(x-k)}{(j-i)(j-k)} + f(k)\cdot\frac{(x-i)(x-j)}{(k-i)(k-j)},$

which is well-defined if i, j and k are distinct. Or we can solve simultaneous equations: e.g. to solve for f(1) = 2, f(2) = 3, f(3) = 9 for mod p=11, we have:

$\begin{matrix}a + b + c = 2\\ 4a + 2b + c = 3\\ 9a+3b+c = 9\end{matrix}\ \implies a = 8, b = 10, c=6\implies f(x) = 8x^2 + 10x+6$

so the secret code is 6.

In the general case, say, you wish to design a secret sharing system whereby any (n+1) scientists can collaborate to obtain the code, and any n scientists cannot figure out anything about the code. To that end, a polynomial of degree n is required. Once again, Lagrange’s interpolation formula allows us to recover the entire polynomial given (n+1) distinct points, and hence the secret code.

Leave a comment Cancel reply

Recent Posts

Archives

Categories

Meta

Pages