Plans for the Blog

No, this blog is not dead. But I’m contemplating how things will proceed from here. It started off when I wanted a centralised location for all SIMO notes, accessible by all students, even those who’re possibly not in the training team. It turns out I’m the only contributor of articles here, so it’s probably no longer appropriate to name it SIMO. Furthermore, the new articles are getting further and further away from Olympiad-type of mathematics and there’re certainly many different aspects of tertiary-level mathematics that I’d like to talk about.

So where shall we go from here?

From here on, I’ll probably be posting various tidbits about mathematics of all shapes and forms, irrespective of its level. However, to spare some basic consideration for readers, the pre-requisites will be stated and hopefully, even readers without the assumed pre-requisites will gain some glimpse of beauty in the underlying mathematics.

Prior to that, I will of course rename this site. Since I’ve been the sole author of articles so far, I assume there’ll be no vehement objections. PS. If you’d still like to contribute an article, please don’t hesitate to do so!

That’s all. Happy Lunar New Year. 🙂

Posted in Thought | Tagged | 2 Comments

Symmetric Polynomials (III)

Now we generalise this to n variables: x_1, x_2, \dots, x_n. It’s clear what the corresponding building blocks of symmetric polynomials would be:

  • e_1 = \sum_{i=1}^n x_i = x_1 + x_2 + \ldots + x_n;
  • e_2 = \sum_{1\le i < j \le n} x_i x_j = x_1 x_2 + x_1 x_3 + \ldots + x_{n-1} x_n;
  • e_3 = \sum_{1 \le i < j < k \le n} x_i x_j x_k = x_1 x_2 x_3 + \ldots;
  • e_n = \prod_{i=1}^n x_i.

We call these ei‘s the elementary symmetric polynomials in the xi‘s. Note that each ei is the coefficient of Ti in the expansion of the following polynomial:

(T - x_1) (T - x_2) \dots (T - x_n) = T^n - e_1 T^{n-1} + e_2 T^{n-2} -\dots +(-1)^n e_n.

It’s easy to generalise the results in the earlier posts to derive a recurrence relation in the sum S_k = x_1^k + x_2^k + \ldots + x_n^k:

S_{k+n} = e_1 S_{k+n-1} - e_2 S_{k+n-2} + \dots + (-1)^{n-1} e_n S_k. (*)

But there’s a slight hitch in using this recurrence relation: we must know the first n sums before applying it! Thus, we must already have the relations for S_0, S_1, \dots, S_{n-1}. Some of these are quite easy:

  • S_0 = 1 + 1 + \dots + 1 = n;
  • S_1 = \sum_i x_i = e_1;
  • S_2 = \sum_i x_i^2 = (\sum_i x_i)^2 - 2 \sum_{i<j} x_i x_j = e_1^2 - 2e_2.

But from the third powers onwards, it gets much trickier. Turns out there’s a good trick to this:

Example 1. Express the polynomial a^3 + b^3 + c^3 + d^3 in terms of the elementary symmetric polynomials P = a+b+c+d, Q = ab+ac+ad+bc+bd+cd, R = abc + abd + acd + bcd.

Solution. First, we reduce the number of variables via the substitution d = 0. Then P, Q and R are just the usual elementary symmetric polynomials in a, b, c. And our desired target polynomial is just a^3 + b^3 + c^3. We already know how to express this in terms of P, Q and R :

a^3 + b^3 + c^3 = P(a,b,c)^3 - 3P(a,b,c)Q(a,b,c) + 3R(a,b,c).

Now we lift this back to the 4-variable case and consider the difference:

\Delta = a^3 + b^3 + c^3 + d^3 - P^3 + 3PQ - 3R.

Since Δ is zero when d=0, the polynomial Δ must be a multiple of d. By symmetry it must be a multiple of a, b and c. Thus Δ is a multiple of abcd, which can only be zero since it has degree 3. Thus a^3 + \ldots + d^3 = P^3 + 3PQ - 3R. ♦

And why stop here? Using the same reasoning, we can show that the equality S_3 = P^3 + 3PQ - 3R holds in the 5-variable case. And so on. This allows us to generalise the recurrence (*) to include all Sk.

Newton’s Identities. Let S_k = \sum_{i=1}^n x_i^k be the sum of the k-th powers of the variables xi. Then we have:

  • S_1 = e_1;
  • S_2 = e_1 S_1 - 2e_2;
  • S_3 = e_1 S_2 - e_2 S_1 + 3e_3;
  • S_k = e_1 S_{k-1} - e_2 S_{k-2} + \ldots + (-1)^{k-2} e_{k-1} S_1 + (-1)^{k-1} ke_k.

These allow us to recursively calculate all Sk.

Now we’re ready to solve tougher problems.

Example 2. (APMO 2003 Q1) The following polynomial has eight real and positive roots. Find all possible values of f.

x^8 - 4x^7 + 7x^6 + ax^5 + bx^4 + cx^3 + dx^2 + ex + f = 0.

Solution. Denote the roots by xi, where i = 1,…,8. Then \sum_{i=1}^8 x_i = 4 and \sum_{1\le i<j\le 8} x_i x_j = 7. Hence \sum_{i=1}^8 x_i^2 = 4^2 - 2\times 7 = 2. Next, we use the root-mean-square & arithmetic-mean inequality (a special case of Cauchy-Schwarz inequality:

\sqrt{\frac{x_1^2 + x_2^2 + \dots + x_8^2} 8} \ge \frac{x_1 + x_2 + \dots + x_8} 8,

with equality if and only if all the xi are equal. But LHS = 1/2 = RHS, so indeed equality holds, so each xi = 1/2 and f = 1/256. ♦

Example 3. Given the following set of linear simultaneous equations in x, y, z, w, find the sum x+y+z+w.

\frac {x}{2^2-1^2} + \frac{y}{2^2-3^2} + \frac{z}{2^2-5^2} + \frac{w}{2^2-7^2}=1,

\frac {x}{4^2-1^2} + \frac{y}{4^2-3^2} + \frac{z}{4^2-5^2} + \frac{w}{4^2-7^2}=1,

\frac {x}{6^2-1^2} + \frac{y}{6^2-3^2} + \frac{z}{6^2-5^2} + \frac{w}{6^2-7^2}=1,

\frac {x}{8^2-1^2} + \frac{y}{8^2-3^2} + \frac{z}{8^2-5^2} + \frac{w}{8^2-7^2}=1.

Solution. Let x, y, z, w be fixed values. Consider the following equation in T:

\frac{x}{T-1^2} + \frac{y}{T-3^2} + \frac{z}{T-5^2} + \frac{w}{T-7^2}=1.

The conditions of the problem imply that 2^2, 4^2, 6^2, 8^2 are solutions. Upon clearing denominators and simplifying, we get a quartic equation in the form of (T-1)(T-9)(T-25)(T-49) = T^3(x+y+z+w)+\ldots. Since the degree of the polynomial is 4, its roots are precisely 4, 16, 36, 64. On the other hand, the sum of roots is the negative of the coefficient of T3, which is 84+x+y+z+w; so x+y+z+w = 36. ♦

Finally, some exercises for you to hone your skills.

Exercise 1. Solve the following simultaneous equations:

\begin{array}{c} x-y+z-w = -2,\\ x^2+y^2+z^2+w^2 = 22, \\ x^3 - y^3 + z^3 - w^3 = -56, \\ x^4+y^4+z^4+w^4 = 274.\end{array}

Exercise 2. Factor the following polynomials:

  • x^4 + y^4 + z^4 - 2x^2 y^2 - 2 y^2 z^2 - 2 z^2 x^2,
  • x^3 + y^3 + z^3 + w^3 - 3(xyz + yzw + zwx + wxy).

Exercise 3. For a given complex a, prove that for any solution to the following simultaneous equations:

\begin{array}{c} x_1(x_2 + x_3 + x_4) = a, \\ x_2(x_3 + x_4 + x_1) = a, \\ x_3(x_4 + x_1 + x_2) = a,\end{array}

two of the xi‘s are equal. [ No, it’s not a typo; we really mean 3 equations in 4 variables. ]

Exercise 4. (USAMO 1977 Q3) (*) Prove that if x, y, z, w are roots of the quartic polynomial T^4 + T^3 - 1, then xy, xz, xw, yz, yw, zw are roots of the sextic polynomial T^6 + T^4 + T^3 - T^2 - 1 = 0.

Exercise 5. Let n be a positive integer. Prove that the only complex solutions to the following simultaneous equations:

\begin{array}{c}x_1 + x_2 + \ldots + x_n = n,\\ x_1^2 + x_2^2 + \ldots + x_n^2 = n, \\ \vdots \\ x_1^n + x_2^n + \ldots + x_n^n = n,\end{array}

are x_1 = x_2 = \dots = x_n = 1.

Posted in Notes | Tagged , , , | Leave a comment

Symmetric Polynomials (II)

When we move on to n=3 variables, we now have, as basic building blocks,

P(x,y,z) = x+y+z,\quad Q(x,y,z) = xy+yz+zx, \quad R(x,y,z) = xyz.

These are just the coefficients of T^i in the expansion of (T-x)(T-y)(T-z). Once again, any symmetric polynomial in x, y, z with integer coefficients can be expressed as a polynomial in P, Q and R with integer coefficients.

Example. If we let S_n = x^n + y^n + z^n, then using the technique described in the previous post, we obtain the recurrence relation:

S_{n+3} = P\cdot S_{n+2} - Q\cdot S_{n+1} + R\cdot S_n.

Starting from S_0 = 3, S_1 = P, S_2 = (x+y+z)^2 - 2(xy+yz+zx) = P^2 - 2Q, we obtain:

  • S_3 = P\cdot S_2 - Q\cdot S_1 + R\cdot S_0 = P^3 - 3PQ + 3R;
  • S_4 = P\cdot S_3 - Q\cdot S_2 + R\cdot S_1 = P^4 - 4P^2 Q + 2Q^2 + 4PR
Let’s solve some problems now.

Example 1. Factor the polynomial x^3 + y^3 + z^3 - 3xyz.

Solution. Since that’s a symmetric polynomial, we can express it as a polynomial in P, Q and R. From the above, we have x^3 + y^3 + z^3 = P^3 - 3PQ + 3R, so:

x^3 + y^3 + z^3 -3xyz= P^3 - 3PQ = P(P^2 - 3Q).

So x^3 + y^3 + z^3 - 3xyz = (x+y+z)(x^2 + y^2 + z^2 - xy - yz - zx). ♦

Note. To be fair, this method worked only because the individual factors were also symmetric. Sometimes one gets something like the following.

Example 2. Let U = xy^2 + yz^2 + zx^2 and V = x^2 y + y^2 z + z^2 x. Express the polynomial A = U^2 + UV + V^2 as a polynomial in P, Q, R.

Solution. No, we’re not going to expand the monster. Instead, we note that it is homogeneous of degree 6. Since P, Q and R are homogeneous of degrees 1, 2, and 3 respectively, we must have:

A = a_1 P^6 + a_2 P^4 Q + a_3 P^3 R + a_4 P^2 Q^2 + a_5 PQR + a_6 Q^3 + a_7 R^2,

for some constants a_1, a_2, \ldots. That’s too much to solve linearly, so we prune down the cases further by singling out one variable: x. Indeed, the degree of x in A is 4, with leading coefficient x^4(y^2 + yz + z^2). So:

  • P^6 and P^4 Q can’t appear since they contain higher powers of x, i.e. a_1 = a_2 = 0;
  • also, x^4 only appears in P^3 R and P^2 Q^2, with respective leading terms x^4 yz and x^4 (y+z)^2. So a_3 = -1, a_4 = 1.
  • with only three unknowns left, we can substitute explicit values into x, y, z to solve for them, or we can substitute x=y to first weed out a_6 from the coefficient of x^6.

It turns out a_5 = a_7 = 0 so we get A = -P^3 R + P^2 Q^2 - Q^3.

Exercise 1. Factor the polynomial A. [Hint (highlight to read): deduce from the above that the factors aren’t symmetric, so given any factor, we can obtain new ones by permuting the variables. Such a factor has degree at most 2, let z=0 and conclude the degree is in fact exactly 2. Guess the factor and prove it works. ]

Example 3. Solve the system of simultaneous equations:

(x+y)^3 = z, \quad (y+z)^3 = x, \quad (z+x)^3 = y,

where x, y and z are distinct complex numbers.

Solution. Let s = x+y+z and rewrite the above equations as:

(s-z)^3 = z, \quad (s-x)^3 = x, \quad (s-y)^3 = y.

Hence, taking s as a constant, we see that x, y and z are all roots of the cubic equation (s-T)^3 = T, where T is unknown. Since x, y and z are distinct, these are precisely all the roots. Expanding the polynomial gives T^3 - 3s\cdot T^2 + (3s^2 - 1)T - s^3 = 0 and the sum of the roots is thus given by 3s. Yet the sum of the roots is also x+y+z = s. Thus s=0 and x, y and z are roots of T^3 + T = 0. So (x,y,z) = (-i, 0, +i) and other permutations. ♦

Exercise 2. Solve the simultaneous equations:

x+y+z = 1, \quad x^2 + y^2 + z^2 = 7, \quad x^5 + y^5 + z^5 = 81.

Exercise 3. Suppose x, y and z are non-zero real numbers satisfying:

xy + yz + zx = -1, \quad \frac 1 {xy} + \frac 1 {yz} + \frac 1 {zx} = -1.

Prove that one of x, y, z is equal to 1.

Exercise 4. Find a recurrence relation for S_n = (xy)^n + (yz)^n + (zx)^n in terms of P, Q and R.

Exercise 5.(*) Find positive integers x, y, z > 10 satisfying the Diophantine equation x^2 + y^2 + z^2 = 3xyz.

(To be concluded…)

Posted in Notes | Tagged , , , | Leave a comment

Symmetric Polynomials (I)

[ Background required: knowledge of basic algebra and polynomial operations. ]

After a spate of posts on non-IMO related topics, we’re back on track.

Here, we shall look at polynomials in n variables, e.g. P(x, y, z) when n = 3. Such polynomials are said to be symmetric if it remains the same after we swap any two variables. E.g. P(x, y) = xy + 3 is symmetric since P(x, y) = P(y, x). Likewise, P(x,y) = x^3 + y^3 + 3xy is symmetric while P(x,y,z) = x^3 - y^3 + z^3 is not symmetric.

Let’s consider the case n=2. Two obvious symmetric polynomials in x,y are the sum and product:

P(x,y) = x+y, \quad Q(x,y) = xy.

Key result. Any symmetric polynomial in x and y with integer coefficients can be expressed as a polynomial in P and Q with integer coefficients.

In particular, if the sum and product of x and y are integers, then x^n + y^n is also an integer (for any natural number n) although x and y themselves may not be integral: e.g. x = 3 + \sqrt 2, y = 3 – \sqrt 2$.

Example. Consider the polynomial x^3 + y^3. This can be written as (x+y)(x^2 - xy + y^2) = P\cdot (x^2 + y^2 - Q) and then we use the substitution x^2 + y^2 = (x+y)^2 - 2xy = P^2 - 2Q. So x^3 + y^3 = P^3 - 3PQ.

Better Method. Instead of tediously calculating x^n + y^n in this manner, we denote S_n = x^n + y^n for integer n and proceed to derive a recurrence relation in S_n. First, suppose we fix x and y; let T^2 - aT + b be the quadratic equation with roots x and y. Then a = x+y = P(x,y) and b = xy = Q(x,y). Now since x and y are both roots of that quadratic equation, we have:

x^2 - ax + b = 0, \quad y^2 - ay + b = 0.

Multiplying the two equations by x^n, y^n respectively, we obtain:

x^{n+2} - a x^{n+1} + b x^n = 0, \quad y^{n+2} - a y^{n+1} + b y^n = 0.

Adding them, we get: S_{n+2} - a S_{n+1} + b S_{n+2} = 0. Since this holds for any x and y, we see that S_{n+2} = P \cdot S_{n+1} - Q\cdot S_n as polynomials in x and y.

Example. Now we can express x^5 + y^5 as a polynomial in P and Q with just a bit of work. Starting with S_0 = x^0 + y^0 = 2 and S_1 = x+y = P, we have:

  • S_2 = P \cdot S_1 - Q \cdot S_0 = P^2 - 2Q;
  • S_3 = P \cdot S_2 - Q \cdot S_1 = P^3 - 3PQ;
  • S_4 = P \cdot S_3 - Q \cdot S_2 = P^4 - 4P^2 Q + 2Q^2;
  • S_5 = P \cdot S_4 - Q \cdot S_3 = P^5 - 5P^3 Q + 5 PQ^2.

Now we can use this technique to solve the following problems.

Problem 1. Solve the simultaneous equations x+y = 2, x^5 + y^5 = 82.

Solution. We consider the quadratic equation T^2 - pT + q = 0 whose roots are x and y. In other words, T^2 - pT + q \equiv (T-x)(T-y) as polynomials. From the above computations, we have p = x+y = 2 and thus

82 = x^5 + y^5 = p^5 - 5p^3 q + 5pq^2 \implies q^2 - 4q = 5.

Solving for q gives q = 5, 1. The first case gives complex roots while the second case gives (x,y) = (1+\sqrt 2, 1-\sqrt 2) or (1 - \sqrt 2, 1 + \sqrt 2). ♦

Note. An alternate solution exists, via the substitution x = 1 + t, y = 1-t. Try it!

Problem 2 (AIME 1990 Q15) Let a, b, x, y be real numbers such that:

ax + by = 3, \quad ax^2 + bx^2 = 7, \quad ax^3 + by^3 = 16, \quad ax^4 + by^4 = 42.

Compute ax^5 + by^5. [ Note: it may not be possible to arrive at the answer via guesswork; in fact, a, b, x, y may well be irrational, but the above sums turn out to be rational. ]

Solution. We consider the quadratic equation T^2 - pT + q = 0 whose roots are x and y. In other words, T^2 - pT + q \equiv (T-x)(T-y) as polynomials. As above, we get x^2 - px + q = 0 and y^2 - py + q = 0. Multiplying the two equations by ax^n and by^n respectively, we get:

(ax^{n+2}) - p(ax^{n+1}) + q(ax^n) = 0, \quad (by^{n+2}) - p(by^{n+1}) + q(by^n) = 0.

Now we denote S_n = ax^n + by^n. Upon adding the above two equations, we get a recurrence:

S_{n+2} = p\cdot S_{n+1} - q\cdot S_n.

We are given: S_1 = 3, S_2 =7, S_3 = 16, S_4 = 42. Substituting into the recurrence relation above, we obtain:

16 = 7p - 3q, \quad 42 = 16p - 7q

which is wholly linear! So we easily solve that to obtain p=-14, q=-38 and the final answer is:

ax^5 + by^5 = S_5 = -14 S_4 + 38 S_3 = 20. ♦

Exercise 1. Suppose x and y are roots of T^2 - 3T + 1 = 0. Find the quadratic equation whose roots are x^3 + y^2 - 3x + 1 and y^3 + x^2 - 3y + 1.

[Hint : if you do it “right”, the calculation should not be too tedious.]

Exercise 2. (USSR) Suppose α and β are roots of the equation x^2 + px + 1 = 0; γ  and δ are roots of the equation x^2 + qx + 1 = 0. Prove that

(\alpha - \gamma)(\beta - \gamma)(\alpha +\delta)(\beta + \delta) = q^2 - p^2.

Exercise 3. Prove that if P(x, y) is a symmetric polynomial which is divisible by (x – y), then it is in fact divisible by (x-y)^2.

(To be continued … )

Posted in Notes | Tagged , , , | 1 Comment

What is Curvature? (II)

[ Background required: rudimentary vector calculus. ]

The aforementioned definition of curvature is practical but a little aesthetically displeasing. Specifically, one seeks a definition which is independent of the parametrization of the curve. The advantage of such a definition is not merely conceptual clarity, but also the ease at which we can generalise to higher dimensions.

Suppose the curve is given by \mathbf x(t) = (f(t), g(t)), where all vectorial values are given in boldface. One imagines this as a particle travelling along the curve such that at time t, its position is at x(t). Recall that the velocity of the particle is then given by \mathbf{x}'(t) = (f'(t), g'(t)) which is parallel to the gradient of the curve at x(t).

Thus the speed of the particle is the magnitude of its velocity, ||\mathbf{x}'(t)|| = \sqrt{f'(t)^2 + g'(t)^2}, from which it’s clear that the distance travelled by the particle during a time interval [a,b] is given by:

s = \int_a^b || \mathbf{x}'(t) || dt = \int_a^b \sqrt{f'(t)^2 + g'(t)^2} dt.

Example: consider the parabola y = x^2 parametrized via (x, x^2). The arc length of the curve from (0,0) to (1,1) is given by s = \int_0^1 \sqrt{1 + 4x^2} dx which is \frac 1 4 (2\sqrt 5 + \sinh^{-1}(2)), i.e. around 1.479, which is only slightly longer than the straight line joining the two points.

Exercise: find the length of the curve y = x^{3/2} from (0,0) to (1,1).

Exercise: for the curve in polar coordinates r = f(\theta), find the formula of the arc-length.

Now we’re ready to state our definition of curvature “intrinsically” (i.e. without reference to external coordinates or parametrization). At each point of time t, take the unit tangent vector at the curve, i.e. let T(t) be the unit vector along \mathbf{x}'(t). Since regardless of parametrization, \mathbf{x}'(t) is always parallel to the tangent at the curve at x(t), we see that T(t) depends only on the point x(t) and not on the parametrization t. [ E.g. if we let t = 2u, then the new parametrization x(u) would be “twice as fast”: from u=0 to u=1, the particle would have traversed the path from t=0 to t=2. ]

Next, we fix a standard parametrization of the curve: namely via the arc length s, i.e. we assume that the particle is travelling along the curve at unit speed.

Example. Consider the semi-circle in the upper-half plane y = \sqrt{25 - x^2}. Let’s calculate its parametrization by arc-length s, via the hard way. [ The easy way is via polar coordinates. ] Starting from the point (-5, 0), the arc-length to the point (a, f(a)) is given by:

\int_{-5}^a \sqrt{1 + \left(\frac{dy}{dx}\right)^2} dx = \int_{-5}^a \frac {5} {\sqrt{25-x^2}} dx = 5(\frac\pi 2 + \sin^{-1} (\frac a 5)).

This gives the parametrization x(s) = 5\sin(\frac s 5 - \frac \pi 2) = -5 \cos \frac s 5, y(s) = 5\sin \frac s 5, where 0 \le s \le 5\pi. This is consistent with our intuition that the arc-length is a linear function of the angle θ.

Exercise. Take the curve y= \frac 2 3 x^{3/2}. Parametrize the curve by arc-length, starting from the origin (0, 0).

Exercise. Take the curve defined by the polar equation r = e^\theta and parametrize it by arc-length, starting from \theta = 0.

Exercise. (*) Prove that if \mathbf{x}(s) = (f(s), g(s)) is parametrized by arc-length, then ||\frac{d\mathbf{x}}{ds}|| = 1. [ Note: if your concept is right, this should be a 2-line proof. We will need this result later. ]

Now we’re ready to unveil the new definition of curvature.

Intrinsic Definition of Curvature. Let \mathbf{T}(s) be the unit tangent vector of the curve \mathbf{x}(s), parametrized by arc-length. The curvature \kappa of a curve is defined by ||\frac{d\mathbf{T}}{ds}||.

While we’re at it, we also define the curvature vector to be \frac{d\mathbf{T}}{ds}.

Our most pressing need is to check that this definition of curvature is consistent with our last. To be specific, for a function y = f(x) we need to show that after parametrizing by arc-length that:

||\frac{d\mathbf{T}}{ds}|| = \frac{f''(x)}{(1+f'(x)^2)^{3/2}}.

To do that, first rotate and translate the figure so that the graph is tangent to the x-axis at the origin (i.e. f(0) = f'(0) = 0). The RHS remains unchanged thanks to the way we derived it. The LHS is intrinsically defined, so it has nothing to do with orientation. Now we have:

\frac{d\mathbf{T}}{dx} = \frac{d\mathbf{T}}{ds} \cdot \frac{ds}{dx} and \frac{ds}{dx} = \sqrt{1 + f'(x)^2}.

At x=0, we thus have \frac{ds}{dx} = 1 and it suffices to compute \frac{d\mathbf{T}}{dx}. To do that, note that the vector (1, f(x)) is parallel to the tangent of the curve at the point (x, f(x)), so we can express the unit tangent vector in terms of x :

\mathbf{T}(x) = \frac 1 {\sqrt{f'(x)^2+1}} (1, f'(x)).

Now just differentiate both sides wrt x and put x=0: we get \left.\frac{d\mathbf{T}}{dx}\right|_{x=0} = (0, f''(0)). So our definition is consistent with the earlier one. Notice that the curvature vector (0, f''(0)) is perpendicular to tangent vector (x-axis). This always holds.

Generalisation

Armed with this, we can generalise the definition of curvature to a curve in arbitrary dimensions. Let \mathbf{x} : \mathbb R \to \mathbb R^n be a curve in n-space, with components \mathbf{x}(t) = (x_1(t), x_2(t), \dots, x_n(t)). Everything follows as before:

  • The velocity vector is \mathbf{v}(t) = \mathbf{x}'(t) = (x_1'(t), x_2'(t), \dots, x_n'(t)).
  • The unit tangent vector of the curve at a point is \mathbf{T} = \frac{\mathbf{x}'}{||\mathbf{x}'||}.
  • The arc-length along a \le t \le b is s = \int_a^b ||\mathbf{v}(t)|| dt, or s = \int_a^b \sqrt{\sum_{i=1}^n x_i'(t)^2} dt.

So naturally, we also define the curvature vector to be \frac{d\mathbf{T}}{ds} and the curvature to be its length. This definition is consistent with all our earlier definitions of curvature (for n=2 case).

Example. Consider the helix given by \mathbf{x}(t) = (\cos t, \sin t, 2t).

Then the velocity vector is \mathbf{v}(t) = \mathbf{x}'(t) = (-\sin t, \cos t, 2), so the tangent vector is \mathbf{T}(t) = \frac 1 {\sqrt 5} (-\sin t, \cos t, 2). To compute the curvature vector, we apply:

\frac{d\mathbb{T}}{ds} = \frac{d\mathbb{T}}{dt} \left(\frac {ds}{dt}\right)^{-1}.

And \frac {ds}{dt} = ||\mathbf{v}(t)|| = \sqrt 5. Hence, the curvature vector is given by \frac 1 5 (-\cos t, -\sin t, 0). and the curvature is 1/5. In other words, the helix has constant curvature, which is consistent with our geometric figure.

Exercise. Take the curve \mathbf x(t) = (2e^t \cos t, 2e^t \sin t, e^t). Compute the curvature and curvature vector at a general point on the curve.

Application. Take a cylinder with base radius R and height H. A strake is a curved strip which bends around the cylinder, following the shape of its contour:

(Image from britannica.com)

What should the radius r be for the strip to wrap around the cylinder effectively? The correct answer is that the curvature must match that of the strake. The graph of the helix is given by: \mathbf{x}(\theta) = (R \cos\theta, R\sin\theta, h\theta), where 0 \le \theta \le 2\pi and h = \frac{H}{2\pi}. Hence:

  • \mathbf{x}'(\theta) = (-R\sin\theta, R\cos\theta, h);
  • so \frac{ds}{d\theta} = ||\mathbf{x}'(\theta)|| = \sqrt{R^2 + h^2};
  • \mathbf{T}(\theta) = \frac{\mathbf{x}'}{||\mathbf{x}'||} = \frac 1 {\sqrt{R^2 + h^2}} (-R\sin\theta, R\cos\theta, h);
  • \frac{d\mathbf{T}}{d\theta} = \frac 1 {\sqrt{R^2+h^2}} (-R\cos\theta, -R\sin\theta, 0);
  • so curvature vector is \frac{d\mathbf{T}}{ds} = \frac{d\mathbf{T}}{d\theta}\left(\frac{ds}{d\theta}\right)^{-1} = \frac 1 {R^2+h^2} (-R\cos\theta, -R\sin\theta, 0) and the scalar curvature is \frac R {R^2 + h^2}.

So the correct radius for r is \frac {R^2 + h^2} R where h = \frac{H}{2\pi}.

Posted in Extra | Tagged , , | 1 Comment

What is Curvature? (I)

[ Background required: calculus, specifically differentiation. ]

In this post, we will give a little background intuition on the definition of curvature. One possible approach is given in wikipedia, ours is another. Note that this is not IMO-related (my apologies for that), but an interesting supplement to your usual calculus.

Let y = f(x) be a smooth (meaning: infinitely differentiable, which we will always assume implicitly from now onwards) function, whose graph then forms a one-dimensional curve in \mathbb R^2. Recall that the first derivative f'(x) = \frac {dy}{dx} tells us the gradient of the curve at a point. To be precise, at the point x = x_0, the value f'(x_0) is exactly the gradient of the curve at (x_0, f(x_0)).

Now we can differentiate the function again: f''(x) = \frac {d^2 y}{dx^2}. In secondary school calculus, we have little use for this value, except to determine whether a curve is a local maximum, minimum or stationary point. However, it’s quite clear on an intuitive level that f''(x_0) tells us how much the curve turns at (x_0, f(x_0)), as can be seen from the graph of y = ax^2 for different values of a: as a > 0 increases, the curve bends more and more at the origin (0,0).

Curvature (First Attempt at Definition). The curvature of the graph for y = f(x) at the point (x_0, f(x_0)) is given by the value of f''(x_0).

Exercise. Let C be the circle with centre (0, r) and radius r, where r>0.  Prove that the curvature of the circle at the point (0, 0) is 1/r.

This looks appealing since a circle ought to “curve less” as its radius increases. But now we encounter a problem: under our above definition, the second derivative changes as we vary the point on the circle (try it!), whereas our intuition tells us that the curvature of a circle ought to be constant since it seems to be “curving uniformly” throughout the circumference.

Curvature (Second Attempt at Definition). Suppose y = f(x) passes through the origin and is tangent to the x-axis (i.e. f(0) = f'(0) = 0). Then the curvature is defined by f''(x). In the general case, if we wish to define the curvature at a point, rotate and translate the curve so that the point maps to the origin and the curve is tangent to the x-axis. Now apply the previous definition.

Let’s explicitly calculate the curvature for a general point. Since translation doesn’t affect the 2nd derivative, we might as well assume the point is at origin. If the tangent at x = 0 makes an angle of θ with the x-axis, then \tan\theta = f'(0). To rotate this an angle of -\theta, we use the transformation

\begin{pmatrix} u \\ v\end{pmatrix} = \begin{pmatrix} \cos \theta & \sin\theta \\ -\sin\theta & \cos\theta \end{pmatrix} \begin{pmatrix} x \\ y \end{pmatrix} \implies \begin{pmatrix} x \\ y\end{pmatrix} = \begin{pmatrix} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta\end{pmatrix} \begin{pmatrix} u \\ v \end{pmatrix}.

Substituting into y = f(x) and partial differentiating with respect to u (noting that θ is constant) gives:

\sin\theta + \cos\theta \frac{dv}{du} = f'(\cos\theta \cdot u - \sin\theta\cdot v) (\cos\theta - \sin\theta \frac{dv}{du}).

[ Side remark: note that \frac{dv}{du} = 0 at (u,v) = (0,0) since \tan\theta = f'(0), which is what we want. ]  Now, differentiating again and putting u=0, v=0, we get:

\cos\theta \frac{d^2 v}{du^2} = f'(0)(-\sin\theta \frac{d^2 v}{du^2}) + f''(0)(\cos\theta - \sin\theta \frac{dv}{du})^2.

Simplifying with \frac{dv}{du}=0 and \tan\theta = f'(0) gives the curvature:

\frac{d^2 v}{du^2} = f''(0) \cos^3\theta = \frac{f''(0)}{(1 + f'(0)^2)^{3/2}},

which is typically what you see in the definition of curvature. So in short, to define the curvature of the graph of y = f(x) at the point (x_0, f(x_0)), we use the formula

k = \frac{f''(x_0)}{(1 + f'(x_0)^2)^{3/2}}.

This is also known as the signed curvature. The unsigned curvature is usually denoted by \kappa = |k|.

Exercises.

  1. Compute the curvature of the graph of y = x^2 at a general point.
  2. Prove that the curvature of the circle x^2 + y^2 = a^2 at any point is equal to \frac 1 a.
  3. Obtain the formula for the curvature when the curve is expressed parametrically as x = f(t), y = g(t). Use this to obtain a simpler proof of 2.
  4. Find the curvature of the logarithmic spiral whose polar equation is r = e^\theta, at various points.
  5. Use problem 3 to find the maximum and minimum curvatures of the ellipse \frac{x^2} 4 + y^2 = 1. Locate where they occur.

In the next installation, we shall examine a conceptually clearer definition of curvature. But we will need the reader to know a bit of vector calculus.

Posted in Extra | Tagged , , , | 1 Comment

Pick’s Theorem and Some Interesting Applications

[ Background required: none. ]

A lattice point on the cartesian plane is a point (x,y) \in \mathbb R^2 where both coordinates are integers. Let P be a polygon on the cartesian plane such that every vertex is a lattice point (we call it a lattice polygon). Pick’s theorem tells us that the area of P can be computed solely by counting lattice points:

The area of P is given by i + \frac b 2 -1, where i = number of lattice points in P and b = number of lattice points on the boundary of P.

For example, the area of the polygon below is 4 + 12/2 – 1 = 9 square units.

The polygon can be of arbitrary shape and size, concave or convex, but it mustn’t have a hole in it. For example, in the diagram below, the area should be 11/2 and not 9/2. To achieve the right formula, each hole contributes a value of +1 to the equation. This means that Pick’s formula is sensitive to the topology of the diagram (which reminds us of Euler’s V-E+F formula, but that’s another story for another day).

It’s a lovely result. Unfortunately, there’s no short proof of it, so we will merely hightlight the steps which constitute a possible proof:

  • An elementary triangle is a lattice polygon with i=0, b=3. Thus it contains no lattice points other than the 3 vertices. By chopping P into a union of elementary triangles, we only need to focus on this case.
  • Without loss of generality, assume 0 = (0, 0), v = (a, b), w = (c, d). It suffices to show that the parallelogram P formed by 0, v, w and v+w has area 1 if P contains no lattice points other than the 4 vertices.
  • On one hand, the area is given by |ad-bc| \ge 1.
  • On the other hand, for any (m,n)\in \mathbb Z \times \mathbb Z, m, n not both zero, P and the translate P + (m, n) have disjoint interior. [ If they share an interior point, then P must contain a vertex of P + (m, n) which is not one of 0, v, w, v+w. ]
  • Hence, if we cut the parallelogram according to the gridlines and translate them to the unit square 0 \le x, y \le 1 (see below), the pieces do not intersect. As a result, the area is at most 1.

Physical Intuition

Here’s a nice way of looking at Pick’s theorem: imagine the sides of the polygon forming the walls of a room, and at each lattice point, a light is placed. The sum of the angles formed by the lights within the room then add up to 2\pi (i + b/2 - 1). Indeed: each light at an internal point clearly contributes 2\pi while the sum of the angles in an m-sided polygon is \pi(m - 2) by elementary geometry.

This intuition also explains why the formula fails when the polygon contains a hole. In our example earlier (a triangle with a triangular hole), we have to take the sum of the external angles of the triangular hole, thus giving 5\pi on the inside and 6\pi on the outside. So the total angle is 11\pi which corresponds to an area of 11/2.

Application : Farey Sequence

Fix a bound N and consider all positive rational numbers \frac a b where b \le N. Arrange them in increasing order. E.g. for N = 4, we get:

\frac 0 1 < \frac 1 4 < \frac 1 3 < \frac 1 2 < \frac 2 3 < \frac 3 4 < \frac 1 1 < \frac 5 4 < \dots

This is called the Farey sequence of order N. For any two consecutive fractions \frac a b < \frac c d in lowest terms, we have bc - ad = 1. Although not obvious at first glance, this follows directly from Pick’s theorem. Indeed, let’s draw a ray from the origin (0, 0) to the point (a, b) for each reduced fraction \frac a b in the sequence. Note that \frac a b is precisely the gradient of this ray. Then the Farey sequence is obtained by sweeping a ray through the origin, starting at angle \theta = 0 and stopping whenever the ray hits a lattice point (a, b) for some 1 \le b \le N, i.e. the shaded region below:

Why does it follow that bc - ad = 1 for, say, points B and C? Consider the triangle OBC.

  • The open line segment OB contains no lattice points since \frac a b is reduced.
  • Likewise, open segment OC contains no lattice points.
  • Open segment BC and the interior of OBC contain no lattice points because such a point would correspond to another ray between OB and OC, thereby contradicting the fact that \frac a b and \frac c d are consecutive in the Farey sequence. This requires the fact that the shaded region is convex. [ A region R is said to be convex if whenever points P and Q lie in R, so is the entire line segment PQ. ]

Hence by Pick’s theorem, the area must be 1/2. On the other hand, by coordinate geometry the area is also \frac 1 2 (bc-ad), so the result follows.

Note that we can vastly generalise the result. E.g. suppose we take all fractions \frac a b for a^2 + b^2 \le N^2 and sort them in ascending order. Then since the resulting shaded region is also convex, we also have bc-ad=1 for any consecutive reduced fractions \frac a b < \frac c d.

Ford Circles

Start with two horizontal lines: y=0 and y=1. At each integer m, construct the circle with ends of the diameter at (m, 0) and (m, 1). This gives an infinite sequence of circles with radii 1/2 which are tangent to both lines. Now between two neighbouring circles, construct the circle which is tangent to both, and also to the line y=0. If we iterate this process, we eventually get something like:

The nice thing is that the point at which each circle touches the line y=0 is rational, and two circles are tangent if and only if the corresponding rational numbers are consecutive in a Farey sequence of some order.

Theorem. Let 0 < r_1 < r_2 < \dots be a Farey sequence of some order. Write r_i = \frac {a_i}{b_i} in lowest terms and construct a circle on the upper-half plane which is tangent to (r_i, 0) and has radius \frac 1 {2b_i^2}. Then two consecutive circles are tangent.

The proof is quite straight-forward as it merely uses the fact that consecutive fractions \frac a b < \frac c d satisfy bc - ad = 1. [ Try it. ]

Posted in Extra | Tagged , , , , , , | 1 Comment

Modular Arithmetic Deluxe Edition

[ Background required: standard modular arithmetic. ]

Consider the following two problems:

Problem 1. Prove that if p > 2 is prime, then when 1 + \frac 1 2 + \frac 1 3 + \dots + \frac 1 {p-1} is expressed in lowest terms \frac m n, m must be a multiple of p.

Problem 2. Prove that if p > 3 is prime, then when 1 + \frac 1 2 + \frac 1 3 + \dots + \frac 1 {p-1} is expressed in lowest terms \frac m n, m must be a multiple of p^2.

Solution to Problem 1. Since there are an even number of terms, we pair up first and last terms, second and second-to-last terms etc, thus obtaining:

\frac 1 1 + \frac 1 {p-1} = \frac p {p-1}, \quad \frac 1 2 + \frac 1 {p-2} = \frac p {2(p-2)}, … .

Hence, the overall sum is p(\frac 1 {1\times {p-1}} + \frac 1 {2\times {p-2}} + \dots + \frac 1 {(p-1)/2 \times (p+1)/2}). The sum in parenthesis simplifies to a fraction \frac r s in lowest terms, where s is not divisible by p. Hence the sum is \frac {pr} s in lowest terms, so m = pr is a multiple of p. ♦

Pause for a moment. The natural approach to solve problems regarding divisibility is via modular arithmetic. Of course, for problem 1, we don’t need to go to that extent, but wouldn’t it be useful to have some form of modular arithmetic for rational numbers? Maybe that would help us solve the second problem, which appears far more daunting than the first.

Indeed, there is!

To be specific, we have to fix a prime p ahead and compromise a little by restricting to the following subset of \mathbb Q:

\mathbb Z_{(p)} = the set of all rational numbers of the form \frac a b, where b is not divisible by p.

The reader can easily check that this set is closed under addition, subtraction and multiplication (i.e. if x and y are in this set, so are their sum, difference and product). Equally clear is the fact that \mathbb Z \subset \mathbb Z_{(p)}.

Next we proceed to define divisibility by powers of p :

Definition. An element x \in \mathbb Z_{(p)} is said to be divisible by p^n (for positive integer n) if we can find y \in \mathbb Z_{(p)} such that x = p^n y.

Examples. Suppose p=5. Then the rational numbers \frac 1 3, \frac 2 7, \frac{25} {17} are all elements of \mathbb Z_{(p)}. Clearly, the first two numbers are not divisible by p, while the last is divisible by p2.

 For the set \mathbb Z_{(p)}, divisibility really only makes sense for powers of p. To see why, let’s suppose p=5 and we say that an element x \in \mathbb Z_{(5)} is divisible by q=7 if there is a y\in \mathbb Z_{(5)} such that x = 7y. What happens? Why, then every element of \mathbb Z_{(5)} is a multiple of 7 and the definition is meaningless.

Modular Arithmetic

Now we are ready to talk about modular arithmetic for elements of \mathbb Z_{(p)}. Given two elements x, y, from this set, we write x\equiv y \pmod {p^n} if the difference x-y is divisible by p^n, as defined earlier.

For example, you can easily check that for p=7, \frac 1 3 \equiv -2 \pmod 7 and \frac 3 {10} \equiv 15 \pmod {7^2}.

Basic Properties I. Easy exercise: prove that the following are true for any x, y, z \in \mathbb Z_{(p)}.

  • (Reflexive) x\equiv x \pmod {p^n}.
  • (Symmetric) If x\equiv y \pmod {p^n}, then y \equiv x \pmod {p^n}.
  • (Transitive) If x\equiv y \pmod {p^n} and y \equiv z \pmod {p^n}, then x \equiv z \pmod {p^n}.

Once again, these three properties are the crux of any “equivalence” property. Next, we need to ensure that congruence modulo p^n respects the arithmetic operations.

Basic Properties II. More easy exercises: prove that the following are true for any a,b,c,d \in \mathbb Z_{(p)} such that a\equiv b \pmod {p^n} and c \equiv d \pmod {p^n}. [ The proofs for the first 3 properties are identical to the case of integers! ]

  • a + c \equiv b + d \pmod {p^n};
  • a\times c \equiv b \times d \pmod{p^n};
  • for m = 1, 2, …, we get a^m \equiv b^m \pmod {p^n};
  • if c (and hence d) is not a multiple of p, then \frac a c \equiv \frac b d \pmod {p^n}.

Important conceptual exercise for last property: if c and d are divisible by p, what goes wrong in the proof? Careful: it’s rather subtle.

Finally, we show that we haven’t really made things more complicated in extending our definition of congruence.

Basic Property III. Any element x \in \mathbb Z_{(p)} is congruent to exactly one of \{0, 1, 2, \dots, p^n-1\} modulo p^n.

Proof  : It suffices to show x is congruent to some integer modulo p^n. Writing x = \frac a b, where b is not divisible by p, it suffices to show \frac 1 b is congruent to some integer integer mod p^n. Now Bezout’s identity helps us out: since b and pn are coprime, write rb + sp^n = 1 for some integers r and s. Now we get \frac 1 b - r = \frac {1-rb} b = p^n \frac s b and our job is done. ♦

In short, we have extended our definition of modular arithmetic to the larger set \mathbb Z_{(p)} instead of the usual \mathbb Z. As a tradeoff, however, now we can only talk about congruence modulo powers of p, instead of general n.

Applications

Without further delay, let’s dive into the proof of problem 2.

Solution to Problem 2. Pair up the terms as earlier, thus giving:

p(\frac 1 {1\times {p-1}} + \frac 1 {2\times {p-2}} + \dots + \frac 1 {(p-1)/2 \times (p+1)/2}).

We need to show that the sum in parenthesis is a multiple of p. But note that each summand is of the form \frac 1 {i \times(p-i)} \equiv \frac 1 {i(-i)} = -\frac 1 {i^2} \pmod p. So we need to show \sum_{i=1}^{(p-1)/2} \frac 1 {i^2} \equiv 0 \pmod p. This is not hard:

  • For i = 1, 2, \dots, \frac{p-1}2, the squares are all distinct (if not, then i^2 \equiv j^2 \pmod p, so i^2 - j^2 = (i-j)(i+j) is a multiple of p and either i-j or i+j is a multiple of p).
  • This also means the reciprocals \frac 1 {i^2} for i=1,2,\dots,\frac{p-1}2 are also distinct. (*)
  • Since x \equiv (p-x)^2 \pmod p, the squares 1^2, 2^2, \dots, (\frac{p-1}2)^2 cover all the possible non-zero squares mod p and they’re all distinct.
  • By (*), we see that the same holds for \frac 1 {i^2}, i = 1,2,\dots, \frac{p-1}2.

In other words, the reciprocals 1, \frac 1 {2^2}, \frac 1 {3^2}, \dots, \frac 1 {[(p-1)/2]^2} are congruent to 1, 2^2, 3^2, \dots, (\frac{p-1}2)^2 modulo p, after some permutation. Hence, modulo p, our sum is congruent to

1^2 + 2^2 + \dots + (\frac {p-1}2)^2 = \frac{(\frac{p-1}2)(\frac{p+1}2)p}6

which is a multiple of p since the denominator is coprime to p when p>3. ♦

Problem 3. (USAMO 2010 Q5) Let p be an odd prime and q = \frac{3p-5}2. Define the sum:

S_q = \frac 1 {2\times 3\times 4} + \frac 1 {5\times 6\times 7} + \dots + \frac 1 {q(q+1)(q+2)}.

Prove that if \frac 1 p - 2 S_q = \frac m n in lowest terms, then p | (m-n).

Hint. [Highlight start] Write the fraction 2/i(i+1)(i+2) = 1/i(i+1) – 1/(i+1)(i+2). Split each 1/j(j+1) = 1/j – 1/(j+1) further, thus giving (1/2 – 2/3 + 1/4) + (1/5 – 2/6 + 1/7) + …. [Highlight end]

Problem 4. (IMO 2005 Q4) Let a_n = 2^n + 3^n + 6^n - 1 for positive integer n. Find all positive integers k which are coprime to all a_n in the sequence.

Hint. [Highlight start] Note that 1/2 + 1/3 + 1/6 – 1 = 0. [Highlight end]

Philosophical Ramblings

We’ve just seen a simple example of theory-building: we find that the existing language is inconvenient / inadequate in solving the problem, so we decide to extend it via new definitions, propositions, theorems etc. Could we have solved the problem without such machinery? Certainly, since we’ve not added any axioms, logically the set of provable theorems must remain identical. But not only is the new language more convenient, it inspires us to think about a problem in an entirely new light.

Finally, we end this post with the following quote by Alexander Grothendieck regarding theory-building, as opposed to problem-solving (the “first approach”).

“I can illustrate the second approach with the same image of a nut to be opened. The first analogy that came to my mind is of immersing the nut in some softening liquid, and why not simply water? From time to time you rub so the liquid penetrates better, and otherwise you let time pass. The shell becomes more flexible through weeks and months—when the time is ripe, hand pressure is enough, the shell opens like a perfectly ripened avocado! A different image came to me a few weeks ago. The unknown thing to be known appeared to me as some stretch of earth or hard marl, resisting penetration… the sea advances insensibly in silence, nothing seems to happen, nothing moves, the water is so far off you hardly hear it… yet it finally surrounds the resistant substance.” – Alexander Grothendieck.

Posted in Notes | Tagged , , , , , | Leave a comment

Linear Algebra: Inner Products

[ Background required: basic knowledge of linear algebra, e.g. the previous post. Updated on 6 Dec 2011: added graphs in Application 2, courtesy of wolframalpha.]

Those of you who already know inner products may roll your eyes at this point, but there’s really far more than what meets the eye. First, the definition:

Definition. We shall consider \mathbb R^3, which is the set of all triplets (x, y, z) of real numbers. The inner product (or scalar product) between \mathbf v = (x,y,z) and \mathbf v' = (x',y',z') is defined to be:

\mathbf v \cdot \mathbf v' = (x, y, z)\cdot (x',y',z') = xx' + yy' + zz' \in \mathbb R.

[ Note: everything we say will be equally applicable to \mathbb R^n, but it helps to keep things in perspective by looking at smaller cases. ]

The purpose of the inner product is made clear by the following theorem.

Theorem 1. Let A, B be represented by points \mathbf v = (x,y,z) and \mathbf v' = (x',y',z') respectively. If O is the origin, then \mathbf v\cdot \mathbf v' is the value |OA| \cdot |OB| \cos \theta, where |l| denotes the length of a line segment l and θ is the angle between OA and OB.

Proof. It’s really simpler than you might think: just follow the following baby steps.

  • Check that the dot product is symmetric (i.e. v·w = w·v for any v, w in \mathbb R^3).
  • Check that the dot product is linear in each term (v·(w + x) = (v·w) + (v·x) and v·(cw) = c(v·w) for any real c and v, w, x in \mathbb R^3).
  • From the above properties, show that 2v·w = v·v + w·w – (vw)·(vw).
  • By Pythagoras, the RHS is |OA|^2+|OB|^2-|AB|^2. Now use the cosine law. ♦

Next we wish to generalise the concept of the standard basis e1 = (1,0,0), e2 = (0,1,0), e3 = (0,0,1). The key property we shall need is that they are mutually perpendicular and of length 1. From now onward, we shall sometimes call the elements of \mathbb R^3 vectors. Don’t worry too much if you’re not familiar with this term.

Definitions. Thanks to the above theorem, the following definitions make sense.

  • The length of a vector v is denoted by |v| = √(v·v).
  • A unit vector is a vector of length 1.
  • Two vectors v and w are said to be orthogonal if their inner product v·w is 0.
  • A set of vectors is said to be orthonormal if (i) they are all unit vectors, and (ii) any two of them are orthogonal.
  • A set of three orthonormal vectors in \mathbb R^3 is called an orthonormal basis.

[ In general, any orthonormal set can be extended to an orthonormal basis, and any orthonormal basis has exactly 3 elements. We won’t prove this, but geometrically it should be obvious. Hopefully we’ll get around to abstract linear algebra, from which this will follow quite naturally. ]

Our favourite orthonormal basis is  e1 = (1,0,0), e2 = (0,1,0), e3 = (0,0,1).

In general, the nice thing about an orthonormal basis is that in order to express any arbitrary vector v as a linear combination v = c1e1 + c2e2 + c3e3, there’s no need to solve a system of linear equations. Instead we just take the dot product.

Theorem 2. Let {v1, v2, v3} be an orthonormal basis.  Every vector w is uniquely expressible as w = c1v1 + c2v2 + c3v3, where ci is given by ci = w·vi.

Proof. Suppose w is of the form w = c1v1 + c2v2 + c3v3. Then we apply linearity of the dot product (see proof of theorem 1) to get:

\mathbf w \cdot \mathbf v_i = (c_1 \mathbf v_1 + c_2\mathbf v_2 + c_3\mathbf v_3)\cdot v_i = c_1(v_1\cdot v_i) + c_2(v_2\cdot v_i) + c_3(v_3\cdot v_i).

Since the vi‘s are orthonormal, the only surviving term is c_i (v_i \cdot v_i) = c_i. This proves the last statement, as well as uniqueness. To prove existence, let ci = w·vi and x = c1v1 + c2v2 + c3v3. We see that for i=1,2,3 we have:

\mathbf x \cdot \mathbf v_i = (c_1\mathbf v_1 + c_2\mathbf v_2 + c_3\mathbf v_3)\cdot \mathbf v_i = c_i = \mathbf w\cdot \mathbf v_i,

so w – x is orthogonal to all three vectors {v1, v2, v3}. This contradicts the fact that we cannot have more than 3 vectors in an orthonormal basis of \mathbb R^3.  ♦

[ Geometrically, the idea is to project w onto each of {v1, v2, v3} in turn to get the coefficients. ]

For example, consider the three vectors (1, 0, -2), (2, 2, 1), (4, -5, 2). They are mutually orthogonal but clearly not unit vectors. To fix that, we replace each vector v by an appropriate scalar multiple: v/|v|, so we get:

\mathbf v_1 = \frac 1 {\sqrt 5} (1, 0, -2), \mathbf v_2 = \frac 1 3 (2, 2, 1), \mathbf v_3 = \frac 1 {3\sqrt 5} (4, -5, 2),

which is a bona fide orthonormal set. Now if we wish to write w = (1, 2, -3) as c1v1 + c2v2 + c3v3, we get:

\mathbf w = \frac 7 {\sqrt 5} \mathbf v_1 + \mathbf v_2 - \frac 4 {\sqrt 5}\mathbf v_3.

Application 1: Cauchy-Schwartz Inequality

Square both sides of theorem 1 and obtain, for any two vectors v and w:

(\mathbf v\cdot \mathbf w)^2 \le |\mathbf v|^2 |\mathbf w|^2.

Writing v = (x, y, z) and w = (a, b, c), we obtain the all-important Cauchy-Schwarz inequality:

Cauchy-Schwarz Inequality. If x, y, z, a, b, c are real numbers, then:

(a^2 + b^2 + c^2)(x^2 + y^2 + z^2) \ge (ax + by + cz)^2.

Equality holds if and only if (a, b, c) and (x, y, z) are scalar multiples of each other.

Example 1.1. If a = b = c = 1/3, then we get the (root mean square) ≥ (arithmetic mean) inequality: for positive real x, y, z, we have

\sqrt{\frac{x^2+y^2+z^2} 3} \ge \frac {x+y+z} 3.

Example 1.2. Given that a, b, c are real numbers such that a+2b+3c = 1, find the minimum possible value of a2 + 2b2 + 3c2.

Solution. Skilfully choose the right coefficients in the Cauchy-Schwarz inequality:

(a^2 + (\sqrt 2 b)^2 + (\sqrt 3 c)^2)(1^2 + ({\sqrt 2})^2 + ({\sqrt 3})^2) \ge (a + 2b + 3c)^2

to get our desired result: a^2 + 2b^2 + 3c^2 \ge \frac 1 6. And equality holds if and only if (a, b, c) is a scalar multiple of (1, 1, 1), i.e. (a,b,c) = \pm (\frac 1 6, \frac 1 6, \frac 1 6).

Example 1.3. Given that a, b, c, d are real numbers such that a+b+c+d = 7 and a2 + b2 + c2 + d2 = 13, find the maximum and minimum possible values of d.

Hint: [highlight start] Compare the sums a + b + c and a^2 + b^2 + c^2 using Cauchy-Schwarz inequality. Express it in terms of d.  [highlight end].

Application 2: Fourier Analysis

Warning: this section is lacking in rigour, since our objective is to give the intuition behind it. It’s also rated advanced, as it’s significantly harder than the preceding text, and has quite a bit of calculus involved.

A common problem in acoustic theory is to analyse auditory waveforms. We can treat such a waveform as a periodic function f:\mathbb R \to\mathbb R, and for convenience, we will denote the period by 2π. Now the most common functions with period 2π are:

  • constant function f(x) = c;
  • trigonometric functions f(x) = sin(mx) and cos(mx), m = 1, 2, … ;

It turns out any sufficiently “nice” periodic function can be approximated with these functions, i.e.

f(x) \approx (a_0 + a_1 \cos x + a_2 \cos(2x) + \dots) + (b_0 \sin x + b_1 \sin(2x) + \dots).

This is called the Fourier decomposition of f. The main period 2Ï€ is called the base frequency of the wave form while the higher multiples 4Ï€, 6Ï€, … are the harmonics. In the Fourier decomposition, one can approximate f(x) by dropping the higher harmonics, just like we can approximate a real number by taking only a certain number of decimal places.

So how does one compute the coefficients a_i and b_i? For that, we consider the simple case where f is a linear combination of sin(x), sin(2x), sin(3x), i.e. we assume:

f(x) = a \sin(x) + b \sin(2x) + c\sin(3x), where a,b,c\in\mathbb R.

Let V be the set of all functions f:\mathbb R \to \mathbb R of this form. We can think of V as a vector space, similar to \mathbb R^3 via the following bijection:

(a,b,c)\in\mathbb R^3 \leftrightarrow a \sin(x) + b \sin(2x) + c \sin(3x) \in V.

So given just the waveform of f, how do we obtain a, b and c? The answer is surprisingly simple: if we take the inner product in V via:

f,g\in V \implies \left< f, g\right> = \int_{-\pi}^{\pi} f(x) g(x) dx,

then the functions sin(x), sin(2x), sin(3x) are orthogonal! This can be easily verified as follows: for distinct positive integers m and n, we have

\left< \sin(mx), \sin(nx)\right> = \int_{-\pi}^{\pi} \sin(mx)\sin(nx) dx

= \frac 1 2\int_{-\pi}^{\pi} (\cos((m-n)x-\cos((m+n)x)) dx = 0.

However, they’re not quite orthonormal because they’re not unit vectors, Specifically, we have:

\int_{-\pi}^{\pi} \sin^2(x) dx = \int_{-\pi}^{\pi} \sin^2(2x) dx = \int_{-\pi}^{\pi} \sin^2(3x) dx = \pi.

In summary, we see that s_1(x)=\frac 1 {\sqrt\pi} \sin(x), s_2(x)=\frac 1 {\sqrt\pi} \sin(2x), s_3(x)=\frac 1 {\sqrt\pi} \sin(3x) form an orthonormal basis of V, under the above inner product.

Now given any function f in V, we can recover the values a, b and c by taking the inner product:

  • a = \left< f, s_1\right> = \frac 1 {\sqrt\pi}\int_{-\pi}^{\pi} f(x) \sin(x) dx.
  • b = \left< f, s_2\right> = \frac 1 {\sqrt\pi}\int_{-\pi}^{\pi} f(x) \sin(2x) dx.
  • c = \left< f, s_3\right> = \frac 1 {\sqrt\pi}\int_{-\pi}^{\pi} f(x) \sin(3x) dx.

Main Theorem of Fourier Analysis

Suppose f is a 2π-periodic function such that f and df/dx are both piecewise continuous. [ A function g is piecewise continuous if \lim_{x\to a^-} g(x) and \lim_{x\to a^+}g(x) both exist for all a\in\mathbb R. ] Then we can approximate f as a linear combination:

f(x) \sim a_0 + a_1 \cos(x) + b_1\sin(x) + a_2\cos(2x) + b_2\sin(2x) + \dots

where a_0 = \frac 1 {2\pi}\int_{-\pi}^\pi f(x) dx, and for n = 1, 2, 3, …, we have a_n = \frac 1 {\pi}\int_{-\pi}^\pi f(x)\cos(nx) dx, b_n = \frac 1 \pi\int_{-\pi}^\pi f(x)\sin(nx)dx. The above approximation means that for any real a, the RHS converges to \frac 1 2 (\lim_{x\to a^-}f(x)+\lim_{x\to a^+} f(x)). In particular, if f is continuous at x=a, then the RHS converges to f(a) for x=a.

Example 2.1. Consider the function f(x) = x for -\pi \le x < +\pi and repeated through the real line with a period of 2Ï€. To compute its Fourier expansion, we have:

  • a_n = 0 for any n since f(-x) = –f(x) almost everywhere (except at discrete points);
  • b_n = \frac 1 \pi\int_{-\pi}^\pi x \sin(nx) dx = (-1)^{n+1} \frac{2} n, using integration by parts.

Thus we have x \sim 2\left(\sin(x) - \frac{\sin(2x)} 2 + \frac{\sin(3x)}3 - \frac{\sin(4x)}4 + \dots\right) and equality holds for -\pi < x < \pi. Let’s see what the graphs of the partial sums look like.

graph1

If we substitute the value x = π/2, we obtain:

2(1 - \frac 1 3 + \frac 1 5 - \frac 1 7 + \dots) = \frac\pi 2.

Important : at any a\in \mathbb R, both the left and right limits of f(x) at x=a must exist. So we cannot take a function like f(x) = 1/x near x=0.

Example 2.2. Take f(x) = x^2 for -\pi \le x < \pi and repeated with period 2π. Its Fourier expansion gives:

  • b_n = 0 since f(x) = f(-x) everywhere.
  • a_0 = \frac 1 {2\pi} \int_{-\pi}^\pi x^2 dx=\frac {\pi^2} 3.
  • a_n = \frac 1 \pi\int_{-\pi}^\pi x^2 \cos(nx) dx = (-1)^n \frac{4}{n^2}, for n = 1, 2, … .

This gives x^2 \sim \frac{\pi^2} 3 + 4(-\cos(x) + \frac{\cos(2x)}{2^2} - \frac{\cos(3x)}{3^2} + \frac{\cos(4x)} {4^2} - \dots). Now equality holds on the entire interval -\pi \le x \le \pi since f(x) is continuous there. The graphs of the partial sums are as follows:

graph2

Substituting x=π gives:

\frac{\pi^2}3 + 4\left(1+\frac 1 {2^2} + \frac 1 {3^2}+\frac 1 {4^2} + \dots\right)=\pi^2.

Simplifying gives 1 + \frac 1 {2^2} + \frac 1 {3^2} + \frac 1 {4^2} + \dots = \frac{\pi^2}6, which was proven by Euler via an entirely different method.

Example 2.3. This is a little astounding. Let f(x) = e^x for -\pi \le x < \pi , and again repeated with a period of 2Ï€. The Fourier coefficients give:

  • a_0 = \frac{\sinh\pi}\pi.
  • a_n = \frac{2\sinh\pi}\pi \frac{(-1)^n}{n^2+1}.
  • b_n = \frac{2\sinh\pi}\pi \frac{(-1)^{n+1}n}{n^2+1}.

So we can write e^x \sim \frac {\sinh\pi}\pi \left[ 1 + \sum_{n\ge 1} \frac{2(-1)^n}{n^2+1}\cos(nx) + \sum_{n\ge 1}\frac{2n(-1)^{n+1}}{n^2+1}\sin(nx)\right], which holds for all -\pi < x < \pi. In particular, for x = 0, we get the rather mystifying identity:

1 = \frac{\sinh\pi}\pi \left[ 1 - \frac{2}{1^2+1} + \frac{2}{2^2+1}-\frac{2}{3^2+1}\dots\right],

which you can verify numerically to some finite precision.

Posted in Notes | Tagged , , , , , , , | Leave a comment

Thoughts on a Problem II

The following problem caught my eye:

(USAMO 1997 Q3) Prove that for any integer n, there is a unique polynomial Q(X) whose coefficients all lie in the set {0, 1, 2, 3, 4, 5, 6, 7, 8, 9} and Q(-2) = Q(-5) = n.

Notice that this bears a strong resemblance to decimal representation: for any non-negative integer n, there is a unique polynomial Q(X) whose coefficients are in the above set and Q(10) = n. The proof of this one is quite obvious: the constant term a_0 of Q is obviously Q(10) mod 10, i.e. n mod 10, and the coefficient  a_1 of X is then given by \frac{n - a_0}{10} \pmod{10} etc.

This does give us a hint on the uniqueness of Q in the original problem. Write the polynomial as Q(X) = a_0 + a_1 X + a_2 X^2 + \dots + a_n X^n.

  • Since n = Q(-2) \equiv a_0 \pmod 2 and n = Q(-5) \equiv a_0 \pmod 5, we have n \equiv a_0 \pmod {10}. So a_0 is unique.
  • For the next step, we need the constant term of \frac{Q(X) - a_0}{X}.
    • Now we have \frac {n - a_0}{-2} = \frac {Q(-2) - a_0}{-2} which is congruent to a_1 modulo 2.
    • By the same token, \frac{n - a_0}{-5} \equiv a_1 \pmod 5.
    • By Chinese Remainder Theorem, we thus get a unique solution for a_1.

So can we proceed by induction?

Suppose a_0, a_1, \dots, a_k are unique. Then we need to prove that a_{k+1}, the constant term of (Q(X) - a_0 - a_1 X - \dots - a_k X^k)/X^{k+1}, is unique. Substituting Q(-2) = n and Q(-5) = n, we get:

\frac{n - a_0 - a_1(-2) - \dots - a_k(-2)^k}{(-2)^{k+1}} \equiv a_{k+1} \pmod 2, and \frac{n - a_0 - a_1(-5) - \dots - a_k(-5)^k}{(-5)^{k+1}} \equiv a_{k+1} \pmod 5 (*),

which completes the induction. So uniqueness is done.

Existence seems to follow via the same proof also, which is clearly constructive. The constant term is given by a_0 \pmod {10} and subsequently every a_{k+1} is determined uniquely from the two congruences in (*). The only problem is to show that the process must terminate, i.e. that eventually we get a_i = 0 for all i > N. [ In mathematical lingo, we say “for all large enough i“. ]

To get a handle on things, let’s work on something concrete, say: 26 + 31 = 57. [ SIMO people should know the significance of these numbers. 🙂 ]

To avoid cumbersome terms, we write the differences:

b_{k} = n - a_0 - a_1(-2) - \dots a_k(-2)^k and c_{k} = n - a_0 - a_1(-5) \dots - a_k(-5)^k.

Thus, a_{k+1} \equiv \frac{b_{k}}{(-2)^{k+1}} \pmod 2 and a_{k+1} \equiv \frac{c_{k}}{(-5)^{k+1}} \pmod 5. We obtain the following.

  • First step: a_0 = 7. OK.
  • Step 2: b_0 = n-7 = 50 and c_0 = 50, so we have a_1 \equiv \frac{b_0}{-2} = -25 \pmod 2, a_1 \equiv \frac{c_0}{-5} = -10 \pmod 5. Thus a_1 = 5.
  • Step 3: b_1 = b_0 - 5(-2) = 60 and c_1 = c_0 - 5(-5) = 75. So a_2 \equiv \frac{b_1}{(-2)^2} = 15 \pmod 2 and a_2 \equiv \frac{c_1}{(-5)^2} = 3 \pmod 5. Thus a_2 = 3.
  • Step 4: b_2 = b_1 - 3(-2)^2 = 48 and c_2 = c_1 - 3(-5)^2 = 0. Hence a_3 \equiv \frac{b_2}{(-2)^3} \pmod 2 and a_3 \equiv 0 \pmod 5 and a_3 = 0.
  • Step 5: b_3 = b_2 = 48 and c_3 = c_2 = 0 so we have a_4 \equiv \frac{b_3}{(-2)^4} = 3 \pmod 2 and a_4 \equiv 0 \pmod 5. This gives a_4 = 5.
  • Step 6: b_4 = b_3 - 5(-2)^4 = -32 and c_4 = c_3 - 5(-5)^4 so $latex a_5 \equiv \frac{b_4}{(-2)^5} = -1 \pmod 2$ and a_5 \equiv \frac{c_4}{(-5)^5} = 1 \pmod 5. Thus a_5 = 1.
  • Finally, b_5 = c_5 = 0. So we’re done.

The polynomial is hence Q(X) = 7 + 5X + 3X2 + 5X4 + X5. Notice that the ck series had hit zero by c2 but it bounces back later in c4 which shows that it (rather, its absolute value) is not a monotonic sequence.

But wait, we do know that 2^{k+1} | b_k and 5^{k+1} | c_k, so actually, all we need is the fact that bk and ck are bounded sequences. So let’s try to bound them in terms of n. In attempting to prove this, we’ve to be careful. Note that if we had replaced -2 and -5 by +2 and +5 respectively, the result clearly doesn’t hold for negative integers. So our proof must somehow distinguish between these 2 cases.

… or have we been looking at the wrong approach? Maybe let’s prove existence of Q via induction on |n|? So suppose for each |n|<N,  there exists a Q as above. If n > 0, then since a0 = n mod 10, |\frac{n - a_0}{-2}| = \frac{n-a_0}2 < |n| (and same holds for (-5)-case) so induction follows.

But if n < 0? Then |\frac{n - a_0}2| < |n| holds for sufficiently large |n|; specifically, n ≤ -10 suffices, for then we would have:

|\frac{n-a_0}2| = \frac{a_0-n}2 < \frac{10-n}2 \le \frac{-n-n}2 = |n|.

Same holds for (-5)-case. So all we need to do is to provide explicit polynomials for the case where n=-1, …, -9. But… something feels amiss: somehow the induction doesn’t quite carry through, for to get to the next stage of induction, we need to find a polynomial Q(X) such that

Q(-2) = \frac{n - a_0}{-2}, \quad Q(-5) = \frac{n - a_0}{-5},

which cannot be deduced from induction hypothesis. So let’s try to generalise the above result: maybe it’s true that for any pair of integers (a, b) there exists a polynomial Q(X) with coefficients in {0,…,9} such that Q(-2) = a, Q(-5) = b?

But that turns out to be wrong! E.g. no such polynomial exists for (a, b) = (1, 0). If we try to recursively define the polynomial coefficients, we get:

(a_0, b_0) = (1, 0),\quad (a_{i+1}, b_{i+1}) = \left( \frac{a_i - t}{-2}, \frac{b_i - t}{-5}\right),  (#)

for i = 0,1,2…, where t is the unique integer in {0,…,9} which is \equiv a_i \pmod 2 and \equiv b_i \pmod 5. The sequence of t‘s then gives the coefficients. For the starting value of (1, 0), we get (1, 0) → (2, 1) → (2, 1) → (2, 1) … . So we get an infinite power series 5 + 6X + 6X^2 + 6X^3 + \dots rather than a polynomial!

In short, if we start with an arbitrary integer pair (a, b) and recursively define (#), not all sequence would end up at (0, 0). But in particular, we’re supposed to show (n, n) does. How do we do that?

We look at all possible end-cycles for the sequence (#). Since |a_{i+1}| + |b_{i+1}| < |a_i| + |b_i| for all but finitely many (a_i, b_i), we can list them all out. The only possibilities are:

  • (0, 0) → (0, 0) → … (the nice case);
  • (2, 1) → (2, 1) → (2, 1) → … ;
  • (3, 1) → (-1, 0) → (3, 1) → (-1, 0) → … .

We have to show that the recurrence relation starting from (n, n) will never end up in the second or third case.

Embarrassingly, it took me quite a while to see the trick: modulo 3. From the recurrence relation (#), we see that a_{i+1} - b_{i+1} \equiv a_i - b_i \pmod 3. So if we start from (n, n), we will always have 3 | (a_i - b_i), while for the second and third case, we have a – b ≡ 1, 2 (mod 3) respectively.

That completes the proof. But now the task is to express it in a rigourous manner, which we’ll leave to the reader.

Posted in Thought | Tagged , , , , | Leave a comment