Why Do We Need Eigenvalues and Eigenvectors?

[ Prerequisites : basic linear algebra, matrices and determinants. ]

Eigenvalues and eigenvectors are often confusing to students the first time they encounter them. This article attempts to demystify the concepts by giving some motivations and applications. It’s okay if you’ve never heard of eigenvectors / eigenvalues, since we will take things one step at a time here.

Note: all vectors are assumed to be column vectors here, and a matrix acts on a vector via left multiplication.

Consider the matrix M = \begin{pmatrix} 3 & 1\\ 1 & 3\end{pmatrix}. We know mathematically what it does: it takes a (column) vector v = (xy) to w = (3x+yx+3y). But what’s the geometry behind it? If we take the circle C with radius 1 and centre at origin, then the matrix maps C to the ellipse C’ as follows:

(Image from wolframalpha.com)

As for how we obtained the equation (3x-y)^2 + (3y-x)^2 = 64, that’s left as an exercise. 🙂  From the diagram, it should be clear that the matrix M “stretches” vectors along the diagonals. Specifically, let’s tilt our head 45° and pick the basis comprising of v = (1, 1) and w = (-1, 1). Then we get Mv = (4, 4) = 4v and Mw = (-2, 2) = 2w. Since {v, w} forms a basis of the plane, we get:

M(\alpha v + \beta w) = \alpha\cdot Mv + \beta\cdot Mw = (4\alpha)v + (2\beta)w.

Thus geometrically, M behaves very much like the diagonal matrix D=\begin{pmatrix} 4 & 0 \\ 0 & 2\end{pmatrix}. We will expound on this now: writing Q = (v|w), we get MQ = (Mv|Mw) = (4v|2w) = QD, where D is the diagonal matrix above. Since {vw} forms a basis, the matrix Q is invertible so:

M = QDQ^{-1}.

More generally, we say that matrices M and N are similar if there exists an invertible matrix Q such that M = QNQ^{-1}. For a given matrix M, the process of finding a similar diagonal matrix D is called diagonalisation. Generally, the fact that M is similar to a diagonal matrix can be useful, both conceptually and computation-wise.

Exercise 1 : if M and N are similar, as are M’ and N’, does that mean M+M’ and N+N’ are similar?

From M = QDQ^{-1} we get M^2 = (QDQ^{-1})(QDQ^{-1}) = QD^2Q^{-1}. More generally, one can show by induction that M^n = QD^n Q^{-1} in general. This spells good news for us: since D is diagonal, its powers can be easily evaluated: D^n = \begin{pmatrix} 4^n & 0 \\ 0 & 2^n\end{pmatrix}. Putting it all together with Q^{-1} = \frac 1 2\begin{pmatrix} 1 & 1\\ -1 & 1\end{pmatrix}, we get a closed form:

M^n = \frac 1 2\begin{pmatrix}1 & -1\\ 1 & 1\end{pmatrix} \begin{pmatrix}4^n & 0 \\ 0 & 2^n\end{pmatrix}\begin{pmatrix}1 & 1\\ -1 & 1\end{pmatrix} = \frac 1 2\begin{pmatrix}4^n + 2^n & 4^n - 2^n\\ 4^n-2^n & 4^n + 2^n\end{pmatrix}.

Example: Binet’s Formula

This is (yet) another derivation of Binet’s formula for the Fibonacci sequence: F_0 = 0, F_1 = 1, F_{n+1} = F_n + F_{n-1} for n\ge 1. If we consider the column vector v_n = (F_{n+1}, F_n), then the recurrence F_{n+1} = F_n + F_{n-1} translates to the following matrix relation:

\begin{pmatrix} 1 & 1\\ 1 & 0\end{pmatrix}\begin{pmatrix}F_n\\F_{n-1}\end{pmatrix} = \begin{pmatrix}F_{n+1}\\F_n\end{pmatrix}\implies \begin{pmatrix} 1 & 1\\ 1 & 0\end{pmatrix} v_{n-1} = v_n.

Thus, we get v_n = \begin{pmatrix} 1 & 1\\ 1 & 0\end{pmatrix}^n v_0. This strongly suggests diagonalising the matrix A=\begin{pmatrix} 1 & 1 \\ 1 & 0\end{pmatrix}.

To do so, we need to find vectors v and w such that Av = \lambda_1 v and Aw = \lambda_2 w for some scalar values \lambda_1, \lambda_2. Of course, v and w are non-zero vectors here, so this means the matrix (A - \lambda_i\cdot I) (where i = 1, 2) maps some non-zero vector to zero. In other words, \det(A - \lambda_i\cdot I) = 0. But we can expand via the determinant formula to obtain:

\det(A - \lambda I) = 0 \implies \det\begin{pmatrix} 1-\lambda & 1 \\ 1 & -\lambda\end{pmatrix} = 0 \implies (1-\lambda)(-\lambda) - 1 = 0.

Solving for this quadratic equation we get \lambda_1 = \frac{1+\sqrt{5}}2 and \lambda_2 = \frac{1-\sqrt{5}}2, which explain the terms in Binet’s formula. We’ll leave the rest to the reader.

More generally, if p(x) is any polynomial with real coefficients, then we can write p(M) = Q\cdot p(D) Q^{-1} = Q\begin{pmatrix} p(4) & 0\\ 0 & p(2)\end{pmatrix}Q^{-1}. For example, if p(x) = x^4 + 3x^2 - x + 1, then we have:

p(M) = M^4 + 3M^2 - 2M + I = Q\begin{pmatrix} p(4) & 0 \\ 0 & p(2)\end{pmatrix}Q^{-1}

and evaluate accordingly.

And why stop at polynomials? Why not infinite power series? Indeed, recall the Taylor expansion for the exponential function e^x = \exp(x) = 1 + x + \frac{x^2}{2!} + \frac{x^3}{3!} + \dots. We can define it for matrices also: for any matrix M, let us write:

\exp(M) = I + M + \frac{M^2}{2!} + \frac{M^3}{3!} + \dots.

Infinite sums are always rather pesky since we’re not sure if they converge. This is even more so for matrices. Fortunately, since we have the diagonalisation M = QDQ^{-1}, we even know exactly what it converges to:

I + M + \frac{M^2}{2!} + \frac{M^3}{3!} + \dots = Q\left(I + D + \frac{D^2}{2!} + \dots\right)Q^{-1} = Q\begin{pmatrix} 1 + 4 + \frac{4^2}{2!} +\dots & 0 \\ 0 & 1 + 2 + \frac{2^2}{2!} + \dots\end{pmatrix}Q^{-1} = Q\begin{pmatrix} e^4 & 0 \\ 0 & e^2\end{pmatrix}Q^{-1}.

Exercise 2. Prove that for any real numbers t and s, \exp(tM) \exp(sM) = \exp((t+s)M).

Exercise 3. Is it true that \exp(A) \exp(B) = \exp(A+B) for any two matrices?

Exercise 4. Define cos(A) and sin(A) for an arbitrary matrix A. [ Hint: if you know complex numbers, there’s no need to write them in Taylor expansion. ] Is it true that \cos^2(A) + \sin^2(A) = I for any matrix A? What about \sin(A+B) = \sin A \cos B + \cos A \sin B?

Clearly we can generalise the above discussion to n × n matrices in general. Specifically, let M be a square matrix. If there is a scalar λ and a non-zero vector v such that Mv = \lambda v, then we say λ is an eigenvalue with corresponding eigenvector v.

Next, M is diagonalisable if we can write M = QDQ^{-1} for some invertible matrix Q and diagonal matrix D. E.g. if M is a 4 × 4 matrix and we can diagonalise it as:

M = Q\begin{pmatrix} -5 & 0 & 0 & 0\\ 0 & 2 & 0 & 0\\ 0 & 0 & 3 & 0\\ 0 & 0 & 0 & 1\end{pmatrix}Q^{-1},

then we know that the general entries of M^n can be written as a fixed linear combination of the powers (-5)n, 2n, 3n, 1n=1. In particular, when n is large, the absolute values of the entries are approximately a constant times 5n.

In short, if we know the largest eigenvalue λ of a matrix, then we can approximate the size of the entries of M^n via a constant times \lambda^n. For this reason, we have a special name for this value. The spectral radius of a matrix M is defined to be the largest |λ| among all eigenvalues λ of M.

As mentioned earlier, to diagonalise an n × n matrix M in general, we first write MQQD for some diagonal matrix D. If we break the matrix Q into column vectors (v_1 | v_2 | \dots | v_n), then this translates to:

Mv_1 = \lambda_1 v_1,\ Mv_2 = \lambda_2 v_2, \dots, Mv_n = \lambda_n v_n,

for some scalar values \lambda_1, \lambda_2, \dots, \lambda_n. But this just means each v_i is a non-zero vector satisfying (M-\lambda_i I)v_i = 0 so the matrix M - \lambda_i I is singular for each i. This means the eigenvalues of M are precisely the values of λ such that \det(M - \lambda I) = 0!

It turns out not all matrices are diagonalisable. But most of the time they are:

Theorem. If an n × n matrix M has n distinct eigenvalues (i.e. the equation \det(M - \lambda I) = 0 has no repeated root), then M is diagonalisable.

Proof. Let the n distinct eigenvalues be \lambda_ii = 1, 2, …, n. Each eigenvalue must have some corresponding eigenvector v_i. Let Q = (v_1 | v_2 | \dots | v_n). The relation M v_i = \lambda_i v_i then says MQ = QD where D is the diagonal matrix with diagonal entries \lambda_1, \lambda_2, \dots.

Thus, we only need to prove Q is invertible, or equivalently, that v_1, v_2, \dots, v_n are linearly independent. To that end, suppose they are linearly dependent. We re-order the eigenvectors and eigenvalues and drop extraneous terms such that:

c_1 v_1 + c_2 v_2 + \dots + c_k v_k = 0

 for some non-zero scalar values ci and k ≤ n is as small as possible. Now let M act on both sides. Using the fact that Mv_i = \lambda_i v_i, we get:

c_1 \lambda_1 v_1 + c_2 \lambda_2 v_2 + \dots + c_k \lambda_k v_k = 0.

Multiplying the first equation by \lambda_1 and taking the difference, we thus get an even shorter relation with smaller k. This contradicts the minimality of k. ♦

Thus, if one picks some random matrix with random entries, the result is almost certainly diagonalisable.

Exercse 5. Prove that the matrix M =\begin{pmatrix} 2 & 1 \\ 0 & 2\end{pmatrix} is not diagonalisable. Write the entries of M^n in closed form.

Posted in Notes | Tagged , , , | Leave a comment

Power Series and Generating Functions (IV) – Exponential Generating Functions

Note: this article is noticeably more difficult than the previous instalments. The reader is advised to be completely comfortable with generating functions before proceeding.

We’ve already seen how generating functions can be used to solve some combinatorial problems. The nice thing is that even if a sequence has no nice closed form, generating functions allow us to efficiently compute a large number of terms in the sequence. With modern computers, such computations are often a cakewalk.

Here, we shall look at a different type of generating functions. Given a sequence a_0, a_1, a_2, \dots, let us define:

f(x) = a_0 + \frac{a_1}{1!}x + \frac{a_2}{2!}x^2 + \frac{a_3}{3!}x^3 + \dots.

This is called the exponential generating function (EGF) of the sequence an. Here’re some easy examples.

  • The EGF for an = 1 is given by 1 + x + \frac{x^2}{2!} + \frac{x^3}{3!} + \dots = e^x; that’s why there’s an “exponential” attached to the name.
  • The EGF for ann! is given by 1 + x + x^2 + \dots = (1-x)^{-1}.
  • The EGF for anrn is given by 1 + rx + \frac{(rx)^2}{2!} + \frac{(rx)^3}{3!} + \dots = e^{rx}.

When do we use this and not the standard generating function? Generally speaking, the EGF is useful for counting the number of combinatorial objects for a labeled set, while the standard generating function is useful for an unlabeled set. Here’s why, roughly.

  • Suppose, to get a structure A on n identical balls, we need to split the set of n balls into two disjoint subsets and impose structure B on the first subset, and structure C on the second subset. Under this assumption, if an (resp. bncn) denotes the number of structures of type A (resp. BC) on the set of n balls, then we have a_n = b_0 c_n + b_1 c_{n-1} + \dots + b_{n-1} c_1 + b_n c_0, since we need to consider the case where the first subset has 0, 1, 2, …, or n balls.
  • Now consider the same problem, except that the n identical balls are replaced by the set {1, 2, …, n}. To count an, suppose we choose a k-subset S \subset \{1,2,\dots,n\}; there are C(n, k) = \frac{n!}{k!(n-k)!} possible choices for S. Then we impose a structure B on S; there are bk ways of doing this. And finally, we choose a structure C on the remaining \{1,2,\dots,n\} - S; there are cn-k ways of doing this. This gives a_n = \sum_{k=0}^n C(n,k) b_k c_{n-k}, or \frac{a_n}{n!} = \sum_{k=0}^n \frac{b_k}{k!} \cdot \frac{c_{n-k}}{(n-k)!}thus their EGFs multiply to give the final EGF.

Without further delay, let’s look at some concrete examples.

Problem 1. Find the number of functions f : {1, 2, …, n} → {1, 2, …, r}.

Solution. Any beginning student of combinatorics knows the solution: rn. But let’s derive this using EGF. Corresponding to each f, we take the inverse image f^{-1}(1), f^{-1}(2), \dots, f^{-1}(r) which are subsets of {1, 2, …, n}. Let S_i = f^{-1}(i); then we get a partition of {1, 2, …, n} into r disjoint subsets:

\{1, 2, \dots, n\} = S_1 \cup S_2 \cup \dots \cup S_r.

Hence, to find a function f, we only need to state an ordered partition into r subsets (“ordered” here means that swapping S1 and S2 changes the partition). If r is fixed, and an denotes the number of f, then the EGF for (an) is the product of r copies of the sequence (1,  1, 1, … ):

EGF(a_n) = \overbrace{e^x \times e^x \times \dots \times e^x}^{r\ \text{copies}} = e^{rx}.

But the coefficient of x^n in e^{rx} is \frac{r^n}{n!}, so we get the result a_n = r^n as expected. ♦

Problem 2. Find the number of surjections : {1, 2, …, n} → {1, 2, …, r}, where r ≤ n.

Solution. We now have the partition \{1, 2, \dots, n\} = S_1 \cup S_2 \cup \dots \cup S_rwhere each S_i is non-empty. So the corresponding EGF for each term is now the EGF of (0, 1, 1, 1, …), which is

x + \frac{x^2}{2!} + \frac{x^3}{3!} + \dots = e^x - 1.

Hence, if we fix r and let an denote the number of surjections f, then the EGF for an is (e^x - 1)^r. ♦

Let’s consider small values of r:

  • r = 2 : EGF = e^{2x} - 2e^x + 1, so a_n = 2^n - 2.
  • r = 3 : EGF = e^{3x} - 3e^{2x} + 3e^x - 1, so a_n = 3^n - 3\cdot 2^n + 3.
  • r = 4 : a_n = 4^n - 4\cdot 3^n + 6 \cdot 2^n - 4.

Now the astute reader can easily write down the general formula.

Problem 3. Consider all possible functions f : {1, 2, …, n} → {1, 2, …, s}. What is the expected size of the image of f ?

Solution. Fix n and s. Let b_r be the number of f whose image has size r. If we consider the case im(f) = {1, 2, …, r}, then the number of such f is n! times the coefficient of x^n in (e^x - 1)^r. Hence, b_r is exactly s\choose r times this number. So we need to look at:

b_1 + 2b_2 + 3b_3 + \dots = n! \times \text{coeff. of } x^n \text{ in } \sum_{r=0}^s r{s\choose r} (e^x - 1)^r. (*)

Thus it really suffices to simplify \sum_r r {s\choose r} y^r. This can be achieved by differentiating the binomial expansion:

{s\choose 0} + {s\choose 1} y + {s\choose 2}y^2 + \dots + {s\choose s}y^s = (1+y)^s

\frac{d}{dy} \implies {s\choose 1} + 2{s\choose 2}y + \dots + s{s\choose s}y^{s-1} = s(1+y)^{s-1}.

So our sum simplifies to \sum_r r {s\choose r} y^r = sy(1+y)^{s-1}. Upon substituting y = e^x - 1, the sum (*) simplifies to s e^{(s-1)x}(e^x - 1) = s (e^{sx} - e^{(s-1)x}). This gives b_r = s^{n+1} - s(s-1)^n. Dividing by the total number of possible functions (s^n), the mean image size is s - \left(\frac{s-1}s\right)^n, which is very close to s-1 for large s. ♦

Exercise 1. Find the number of r-letter words you can form with the letters ‘a’, ‘b’ and ‘c’, where ‘a’ occurs an odd number of times and ‘b’ occurs an even number of times.

Exercise 2. The Stirling number of the second kind S^{(r)}_n is the number of partitions of {1, 2, …, n} into r disjoint non-empty subsets. Write down an expression for S^{(r)}_n by computing its EGF or otherwise. Also write down an expression for the Bell number B_n, which is the number of partitions of {1, 2, …, n}. [ Note: the partitions here are unordered, i.e. \{1,2\} \cup \{3\} and \{3\} \cup \{1,2\} are considered identical partitions. ]

Exercise 3. (Comtet’s square). Let a_n be the number of ways to partition {1, 2, …, n} such that (i) the number of parts is X; and (ii) the size of each part is Y, where XY are elements of {“unrestrained”, “even”, “odd”}. Find the EGF for a_n. Since there’re 9 possible ordered pairs for (XY), you should get 9 answers in all.

Throughout this section, we shall require the reader to be somewhat acquainted with the concept of permutations. A permutation of order n is a bijective map from {1, 2, …, n} to itself. For example, we can express a permutation of order 9 via the following diagram:

Such a permutation p would map 1 to 3, 2 to 5, 3 to 9, … . Or one can also express the same permutation in the form of cycles:

which is clearer and one immediately knows how the successive powers p^2, p^3, \dots act on the numbers 1, …, 9. For example, since the three cycles have lengths 1, 3, and 5, the smallest n for which p^n = id is n = lcm(1,3,5) = 15. We say that the order of the permutation is 15.

An element i for which p(i) = i is called a fixed point of the permutation p. In the above example, the only fixed point is 4.

Problem 4. Let a_n be the number of permutations p of {1, 2, …, n} such that p^2 = id. [ Such a permutation is called an involution. ] Find the exponential generating function of a_n.

Proof. If we write p as a product of cycles as above, then p is an involution iff all the cycles have lengths 1 or 2. If there are exactly r cycles (where r is fixed), then the number of permutations has exponential generating function in n given by: \left(x + \frac{x^2} 2\right)^r, right?

No! Since swapping the cycles has no discernible effect on the permutation (e.g. (1, 2)(4, 5)3 = (1, 2)3(4, 5)), we should divide it by r!. Hence, we should look at \frac 1 {r!} \left(x + \frac{x^2} 2\right)^r instead. Summing over all r, we get:

\sum_{r=0}^\infty \frac 1 {r!} \left(x + \frac{x^2} 2\right)^r = \exp\left(x + \frac{x^2}2\right) = \exp(x)\exp(x^2/2).

This allows us to express a_n as a sum:

a_n = \sum_{k=0}^{[n/2]} \frac{n!}{2^k k! (n-2k)!}. ♦

Problem 5. Find the number a_n of permutations p of {1, 2, …, n}, where p is a product of an even number of disjoint cycles.

Solution. As in the previous problem, we consider the case of r cycles. Given the set {1, 2, …, s}, how many cycles p of length s can we form? To answer that, for any such cycle we can start from 1 and obtain a unique sequence of s-1 numbers via p(1), p(p(1)), p(p(p(1))), \dots which is a permutation of {2, 3, …, s}. Hence the number is (s-1)!, which gives the EGF:

x + \frac{1!x^2}{2!} + \frac{2!x^3}{3!} + \frac{3!x^4}{4!} + \dots= \sum_{r=1}^\infty \frac{x^r}{r} = -\log(1-x).

To find the number of p where p is a product of r cycles, we have to take n! times the coefficient of x^n in \frac 1 {r!}(-\log(1-x))^r. [ As before, the factor of \frac 1 {r!} is included since swapping cycles does not change the permutation. ] Hence to find the EGF for a_n, we sum over all even r:

\sum_{i=0}^\infty \frac 1 {(2i)!} (-\log(1-x))^{2i} = \cosh(-\log(1-x)),

where \cosh z = \frac{e^z + e^{-z}} 2 is the hyperbolic cosine function. Hence the above simplifies to \frac 1 2((1-x) + (1-x)^{-1}) = 1 + \frac 1 2 (x^2 + x^3 + \dots). In conclusion:

  • When n = 0, a_n = 1.
  • When n = 1, a_n = 0 (obviously!).
  • When n ≥ 2, a_n = \frac{n!}2.  ♦

Note : considering how nice the answer is, one naturally wonders if there is a nice combinatorial argument for this. There is, actually. If p is any permutation of {1, 2, …, n}, where n\ge 2, we compose it with (1, 2) to give q = p\circ (1,2). If 1 and 2 belong to the same cycle of p, the composed permutation q would split them up into two cycles; conversely, if 1 and 2 belong to disjoint cycles, then q merges the two cycles into one. The proof is left as an exercise.

But that’s how things are; one often discovers such patterns through heavy machinery before finding a simpler combinatorial proof.

Exercise 4. A derangement is a permutation without a fixed point (i.e. for all ip(i) ≠ i). Prove that the number of derangements of {1, 2, …, n} is n!(1 - \frac 1 {1!} + \frac 1 {2!} - \dots + (-1)^n \frac 1 {n!}).

Exercise 5. Prove that the number of permutations p of {1, 2, …, 2n} having only even cycle lengths, is given by 1^2 \times 3^2 \times \dots \times (2n-1)^2.

Exercise 6. Generalise the above two problems as follows. Let A, B \subseteq \mathbb{N} be two subsets. We wish to enumerate the number a_n of permutations of {1, 2, …, n} for which

  • the number of cycles is an element of B; and
  • the length of each cycle is an element of A.

Prove that the EGF of the sequence a_n can be expressed in the form \beta(\alpha(x)), where α(x) (resp. β(x)) is a power series which depends solely on A (resp. B). Find these power series.

Exercise 7. Prove that the number of permutations of {1, 2, …, n} containing an even number of even-length cycles is exactly \frac{n!}2 when n ≥ 2. For extra challenge, go for two proofs: one combinatorial and one via EGF.

Exercise 8. Prove that the expected number of cycles of a random permutation of {1, 2, …, n} is given by 1 + \frac 1 2 + \frac 1 3 + \dots + \frac 1 n, which is very close to log(n).

Exercise 9. (New: bonus question)* A permutations p of {1, 2, …, n} is said to be alternating if p(1) < p(2), p(2) > p(3), p(3) < p(4), p(4) > p(5), etc. If f(x) denotes the EGF of the number of alternating permutations of {1, 2, …, n}, find a relation between f(x) and its derivative f’(x). Hence prove that f(x) = tan(x).

This concludes our discussion of EGF. Though the article is rather long, we’ve barely scratched the surface of what generating functions are capable of. In particular, one can look at combinatorial species, where an equality not only establishes that a_n = b_n, but an actual bijection between the associated combinatorial structures. The best way to understand this is via category theory, but that’s really another story for another day.

Posted in Notes | Tagged , , , , , | 3 Comments

Power Series and Generating Functions (III) – Partitions

One particularly fruitful application of generating functions is in partition numbers.

Let n be a positive integer. A partition of n is an expression of n as a sum of positive integers, where two expressions are identical if they can be obtained from each other via a permutation. For example, 7 = 3 + 2 + 1 + 1 and 7 = 3 + 1 + 1 + 2 are identical partitions. Or one can also write a partition as an expression n = a_1 + a_2 + \dots + a_k, where a_1 \ge a_2 \ge \dots \ge a_k are positive integers. We are interested in counting the number of partitions of n, which is denoted by p(n).

Example: p(5) = 7 since we can write

5 = 4+1 = 3+2 = 3+1+1 = 2+2+1 = 2+1+1+1 = 1+1+1+1+1

The partition numbers were a pet favourite of the late Ramanujan who found various congruence relations involving partition numbers and even a rapidly converging formula which he later used to compute p(100) and p(200) exactly (the latter had 13 digits, and those were the days without electronic computation tools!).

One effective way of representing a partition is by means of a Young diagram. Corresponding to the partition n = a_1 + a_2 + \dots + a_k, where a_1 \ge a_2 \ge a_3 \ge \dots, we place a_1 dots on a row, evenly spaced apart and  starting from a left margin; then a_2 dots on the second row, again evenly spaced apart and starting from the same left margin, etc. For example, for the two partitions 18 = 6+5+3+3+1 = 4+4+4+3+1+1+1, we have the corresponding Young diagrams.

Problem 1. Prove that the number of partitions of n into k parts is equal to the number of partitions of n into parts, the largest of which is k. E.g. if n = 7, k = 3, then we have:

7 = 5+1+1 = 4+2+1 = 3+3+1 = 3+2+2, (exactly 3 parts)

7 = 3+3+1 = 3+2+2 = 3+2+1+1 = 3+1+1+1+1, (the largest part is 3).

Sketch of Proof. Construct a bijection between the two sets of partitions, by taking the transpose (i.e. reflection about the main diagonal line). This is also called the conjugate. The following diagram shows an example.

What is truly of interest to us here is the generating function for the partition numbers f(x) = p(0) + p(1)x + p(2)x^2 + \dots, where p(0) is taken to be 1 (there’s just one way to partition 0: i.e. don’t do anything). Now, instead of looking at the individual parts in a partition, we can count the number of 1’s, the number of 2’s etc and represent this by a tuple (r_1, r_2, \dots) of non-negative integers such that \sum_i (ir_i) = n. For example, the above two Young diagrams correspond to tuples (1, 0, 2, 0, 1, 1) and (3, 0, 1, 3) respectively. This gives:

f(x) = (1 + x + x^2 + x^3 + \dots)(1 + x^2 + x^4 + \dots)(1 + x^3 + x^6 + \dots)\dots = \prod_{m=1}^\infty \frac 1 {1 - x^m}.

Problem 2. Let q(n) be the number of partitions of n into parts which are greater than 1. Prove that q(n) = p(n) – p(n-1).

Proof.  From the above reasoning, we see that the generating function for q(n), i.e. the power series f(x) = q(0) + q(1)x + q(2)x^2 + \dots is given by:

(1 + x^2 + x^4 + \dots)(1 + x^3 + x^6 + \dots)(1 + x^4 + x^8 + \dots)\dots = \prod_{m=2}^\infty \frac 1 {1 - x^m}.

This shows that if g(x) is the generating function for p(n) then g(x) = (1-x)^{-1} f(x) or f(x) = (1-x)g(x). Comparing the coefficient of x^n on both sides then gives q(n) = p(n) – p(n-1). ♦

Problem 3. Prove that the number of partitions of n into disjoint parts is equal to the number of partitions of n into odd parts. E.g. for n = 6, we have:

6,  5+1, 4+2, 3+2+1   and   5+1, 3+3, 3+1+1+1, 1+1+1+1+1+1.

Proof. Consider the generating function for q(n), the number of partitions of n into odd parts. The corresponding generating function q(0) + q(1)x + q(2)x^2 + \dots for the sequence is

\left(\sum_{i=0}^\infty x^i\right)\left(\sum_{i=0}^\infty x^{3i}\right)\left(\sum_{i=0}^\infty x^{5i}\right) \dots = \frac 1 {(1-x)(1-x^3)(1-x^5)(1-x^7)\dots}.

Multiplying the numerator and the denominator by the “missing even terms” (1-x^2)(1-x^4)(1-x^6)\dots we get:

\frac{(1-x^2)(1-x^4)(1-x^6)\dots}{(1-x)(1-x^2)(1-x^3)\dots} = \frac{1-x^2}{1-x} \cdot\frac{1-x^4}{1-x^2}\cdot \frac{1-x^6}{1-x^3}\cdots = (1+x)(1+x^2)(1+x^3)\dots

and we see that the final power series is the generating function for the number of partitions of n into distinct parts. ♦

Problem 4. (AMM 11464) Let a(n) be the number of ways to place n identical balls into urns U_1, U_2, \dots such that (i) U_1 gets at least one ball, and (ii) while any balls remain, each successive urn receives at least as many balls as all previous urns combined. On the other hand, let b(n) be the number of partitions of n into powers of 2, possibly with repeated terms. Prove that a(n) = b(n). E.g. if n = 6, then:

a(6) : 114, 123, 15, 24, 33, 6,       b(6) : 111111, 11112, 1122, 114, 222, 24.

Proof. Let g(x) = b(0) + b(1)x + b(2)x^2 +\dots be the generating function of the sequence b(n). From the above argument, it’s clear that:

g(x) = (1-x)^{-1}(1-x^2)^{-1} (1-x^4)^{-1}\dots = \prod_{i=0}^\infty \left(1 - x^{2^i}\right)^{-1}.

Thus, algebra gives us

g(x^2) = (1-x^2)^{-1}(1-x^4)^{-1}(1-x^8)^{-1}\dots = (1-x)g(x).

Comparing the coefficients of x^m on both sides then gives us:

  • b(2n+1) = b(2n);
  • b(2n) = b(2n-1) + b(n).

We claim that the sequence a(n) satisfies the same two recurrence relations. The result would then follow from a(1) = b(1) = 1. To show that a(2n+1) = a(2n), suppose we distribute 2n+1 balls among the urns. Look at the last urn: the number of balls in this urn must be more than the number of balls in all prior urns; for if it were equal, then the total number of balls would be even (contradiction). By removing 1 ball from the last urn, we get a bijection between the distributions of 2n+1 balls and those of 2n balls.

To prove a(2n) = a(2n-1) + a(n), consider a distribution of 2n balls. If the last urn has as many balls as all prior urns combined, then removing the last urn gives us a distribution of n balls. If it has more, then removing one ball gives a distribution of 2n-1 balls. You can easily check that we obtain bijections in both cases. This completes the proof. ♦

Exercises.

  1. Let mn be positive integers with n > \frac 1 2 m(m+1). Prove that the number of partitions of n into m distinct parts is equal to the number of partitions of n - \frac 1 2 m(m+1) into at most m parts.
  2. Solve Problem 2 via a combinatorial argument.
  3. Prove that the number of partitions of n in which no integer occurs more than k-1 times, equals the number of partitions of n into parts which are not divisible by k.
  4. A partition is said to be self-conjugate if its Young diagram is identical to its conjugate (or transpose). Prove that the number of self-conjugate partitions of n is equal to the number of partitions of n into distinct odd parts.
  5. Find a closed form for the number of partitions of n into 1, 2 and 3, i.e. find the number of ways to obtain $n if we have only $1, $2 and $3 notes. [ We’ll give the answer here: \frac 1 8 (-1)^n + \frac{47}{72} + \frac n 2 + \frac {n^2} {12} + \frac 1 9 (\omega^n + \omega^{2n}), where \omega = \frac{-1+\sqrt{-3}}2 is a 3rd root of unity, satisfying \omega^2 + \omega + 1 = 0. If you’re not familiar with complex numbers, just note that \omega^n + \omega^{2n} is 2 (resp. -1) if n is divisible (resp. not divisble) by 3. ]
  6. Let c(n) be the number of partitions n = a+b+c+d where a\ge 0, b\ge 2a, c \ge 2b, d \ge 2c. Prove that the generating function of c(n) is (1-x)^{-1}(1-x^3)^{-1}(1-x^7)^{-1}(1-x^{15})^{-1}.
  7. Let c(n) be the number of partitions of n into k parts, each of which is at most l. Prove that the generating function of c(n) is given by: \frac{(1-x)(1-x^2)\dots(1-x^{k+l})}{(1-x)(1-x^2)\dots(1-x^k)\times(1-x)(1-x^2)\dots(1-x^l)}.

Extra Stuff. What’s a good way of computing p(n)? There doesn’t seem to be a closed form although there are asymptotic expressions which converge rather fast. One rather practical way of obtaining p(n) for modestly large values of n is via a recurrence relation.

Pentagonal Number Theorem. The power series (1-x)(1-x^2)(1-x^3)\dots = \prod_{i=1}^\infty (1-x^i), when expanded, has very few non-zero entries. To be specific:

\prod_{i=1}^\infty (1-x^i) = \sum_{k=-\infty}^\infty (-1)^k x^{k(3k-1)/2}.

The first few terms are 1 - x - x^2 + x^5 + x^7 - x^{12} - x^{15} + \dots.

Since this power series is precisely the inverse of the power series f of p(n), we thus get:

f(x)(1 - x - x^2 + x^5 + x^7 - x^{12} - x^{15} + \dots) = 1.

Taking the coefficient of xn for n > 0, we get

p(n) - p(n-1) - p(n-2) + p(n-5) + p(n-7) - p(n-12) - p(n-15) + \dots = 0,

which is rather efficient since we only need to worry about \sqrt{n/3} terms above. If you know programming, you can now use Python to calculate up to p(1000) easily. 🙂

Posted in Notes | Tagged , , , , , , | Leave a comment

Kinetic Theory, Entropy and Information Theory

This is really a continuation from the series “Thermodynamics for Mathematicians”. Our discussion then wasn’t quite complete without some justifications of the facts we used from kinetic theory of gases, in particular, we will figure out the constant c in the formula U = c·PV. At the end of this post, we will also relate the thermodynamic definition of entropy with the statistical entropy, which is the log of the number of configurations.

The fundamental assumption of kinetic theory is that an ideal gas comprises of molecules. Let’s not assume they’re identical for now, and instead label them via 1, 2, 3, … . Suppose they’re contained in a box with height L and base area A. Let:

  • m_i be the mass of particle i;
  • v_{x,i}, v_{y,i}, v_{z,i} be respectively the x-, y– and z-components of the velocity of i.

If we assume the internal energy U is given by the kinetic energy of these particles, then we get:

U = \sum_i \frac 1 2 m_i (v_{x,i}^2 + v_{y,i}^2 + v_{z,i}^2) = \sum_i \frac 1 2 m_i v_{x,i}^2 + \sum_i \frac 1 2 m_i v_{y,i}^2 + \sum_i \frac 1 2 m_i v_{z,i}^2.

Assuming there’s no gravity effect, the three terms are identical by symmetry since nature has no preferred direction. We thus get: U =\frac 3 2 \sum_i m_i v_{z,i}^2. Next, since the gas is homogeneous, the pressure on the top/bottom wall is given by P. To compute P in terms of the microscopic state, consider a single particle hitting the wall and having an elastic collision. 

Hence, change in momentum of a single particle is 2m_i v_{z,i}. Since the height of the container is L, the particle takes time 2L/v_{z,i} to hit the top wall again. So on average, the change in momentum per unit time is given by:

\frac{2m_i v_{z,i}} {2L/v_{z,i}} = m_i v_{z,i}^2/L.

And that’s just one particle. The total force exerted on the top wall is the sum of all such terms. Hence, the pressure (force exerted per unit area) is given by

P = \frac 1 A\sum_i m_i v_{z,i}^2/L = \frac 1 V \sum_i m_i v_{z,i}^2,

where V is the volume of the container. Equating this formula with the earlier one, we get U = \frac 3 2 PV.

[ Note: this section requires multivariate calculus and integration, of a very large number of dimensions! ]

Thus, the formula for adiabatic transformation of an ideal gas is:

\left( \frac{P'}P\right) = \left(\frac{V'}V\right)^{-5/3}.

And the formula for entropy of an ideal gas is:

S(P, V, N) = N(\frac 3 2 \log P + \frac 5 2 \log \frac V N) + kN,

where k is a constant we can’t know without delving into quantum mechanics (we’ll explain why later). Finally, we’ll explain how this ties in with statistical entropy. For this, we’ll need to define:

The statistical entropy (S) for a system subjected to certain constraints is the logarithm of the number of configurations (Ω) subjected to these constraints, assuming each configuration occurs with equal probability.

To fix ideas, one may imagine a finite number of possible configurations. For a concrete example suppose we have a huge table top filled with N coins, each of which is either a head or a tail. If the coin is fair, then there are Ω(N) = 2N possible configurations occurring with equal probability, and statistical entropy is thus S(N) = \log(2^N) = N\log 2 which is linear in N.

Now suppose we add another constraint: that the number of heads is exactly U. Now the number of configurations is \Omega(N, U) = C(N, U) so the entropy is given by S(N, U) = \log C(N,U). Applying Stirling’s approximation for factorials, we can also see that if U/Np is constant, then S is linear in N, but since we won’t need this fact, there’s no necessity to dwell on it any more.

And how does the statistical entropy relate to the thermodynamic one?

To answer that, we look at the case of an ideal gas: we have to “count” the number of states with a given (P, V, N) and show that it’s logarithm is precisely the thermodynamic entropy.

Consider the position and velocity of each particle: (x_i, y_i, z_i, v_{x,i}, v_{y,i}, v_{z,i}), where i = 1, 2, …, N. The microstate of the entire system then corresponds to a point in the 6N-dimensional space which, to put it mildly, is a staggeringly huge space. Since the total energy U = \frac 1 2 \sum_i m_i (v_{x,i}^2 + v_{y,i}^2 + v_{z,i}^2) is constant, we’re looking at a (6N-1)-dimensional hypersurface D defined by:

\sum_i m_i (v_{x,i}^2 + v_{y,i}^2 + v_{z,i}^2) = 2U. (*)

Now we partition the entire space into small blocks \prod_{i=1}^N dx_i dy_i dz_i dv_{x,i} dv_{y,i} dv_{z,i}, written as \prod_{i=1}^N d\mathbf{r}_i d\mathbf{v}_i for short, and count the “number” of accessible configurations subjected to the constraint that U is constant. Since our space is continuous, by “number” we really mean volume.

\Omega(P, V, N) = \int_D 1 \cdot \prod_{i=1}^N d\mathbf{r}_i d\mathbf{v}_i = V^N \int_{D'} 1 \cdot\prod_{i=1}^N d\mathbf{v}_i,

since each \mathbf{r}_i (i = 1, 2, … , N) can occur anywhere in a space of volume V. Here D’ is the (3N-1)-dimensional subspace defined by the equation (*) above. This is a huge hyper-ellipsoidal surface which we’ll approximate with a cube (if you feel perturbed by this, we will justify it at the end of the article: basically the approximation just changes the entropy by a constant multiple of N):

|v_{x_i}|, |v_{y,i}|, |v_{z,i}| \le \sqrt{2U/(6mN)},

for a constant m which approximates mi. The (hyper)volume of D’ is then given by \lambda^N \sqrt{U/N}^{3N} for a constant λ which is independent of N and U. This shows that:

\Omega(P, V, N) \approx \lambda^N V^N (U/N)^{3N/2} = \lambda'^N V^N (PV/N)^{3N/2},

which gives the statistical entropy:

S(P, V, N) = \log \Omega(P, V, N) \approx N(\log V + \frac 3 2\log (V/N) + \frac 3 2 \log P) + kN.

This doesn’t quite match the thermodynamic entropy. What’s wrong?

Gibbs Paradox

The problem is that if we swap the states of two particles (\mathbf{r}_i, \mathbf{v}_i) and (\mathbf{r}_j, \mathbf{v}_j), there is no change in the state of the system even on a microscopic scale. So in counting the number of states, we really should divide \Omega by a factor of N!.

If you’re still not convinced, here’s a qualitative argument: consider the case where we have two identical bodies of gases with the same (P, V, N) separated by an adiabatic wall. Initially, the combined system has \Omega = \Omega_1 \times \Omega_2 possible states and thus entropy S = S_1 + S_2. Upon releasing the middle wall, there should be no change in the total entropy since the two gases are identical. But if we assume all the particles are distinct, then by pairing up particles in the two chambers (there’re effectively N particles in each, despite the micro-fluctuations) we gain a factor of 2N in the total number of states, which contributes an additional factor to the entropy. This is known as Gibbs paradox. To offset it, we assume particles which are interchanged do not contribute to additional states.

Thus, the correct number of states is

\Omega(P, V, N) \approx \lambda'^N \frac{V^N (PV/N)^{3N/2}}{N!}.

By Stirling’s approximation, we can take \log N! \approx N(\log N - 1) so this gives

S(P,V,N) \approx N(\frac 5 2\log(V/N) + \frac 3 2\log P) + kN,

which matches the thermodynamic entropy.

To be honest, this process looks rather dubious. The fact that the coefficients match (3/2 and 5/2) is something of interest but appending a factor of -log(N!) seems forced. Indeed, Boltzmann faced a lot of opposition during his time in promoting his theory, which led to his depression and eventual suicide. After repeated success of the theory, however, the definition of S = \log \Omega has become one of the founding principles of statistical physics. [ Boltzmann’s story had a bitter-sweet ending to it – his theory eventually gained acceptance and the formula S = k \log W was carved onto his grave as remembrance for his great work. Here k is a scaling factor to convert our ideal gas temperature to Kelvin. ]

Finally, we take note of some choices we subtly made in computing S = S(P, VN) and how changing them would affect S.

  • We specifically chose d\mathbf{r}_i d\mathbf{v}_i as a measure in computing the volume. What if we had multiplied this measure by a constant factor? Specifically, if say we had chosen d\mathbf{r}_i' = \alpha d\mathbf{r}_i instead, then the whole measure would be multiplied by a factor of \alpha^N. The net effect this has on the entropy is N\cdot \log \alpha which is absorbed by the kN term.
  • We had chosen to approximate the hyper-ellipsoid surface \sum_i m_i(v_{x,i}^2 + v_{y,i}^2 + v_{z,i}^2) = 2U with a cube instead. To justify this, let m and M be the minimum and maximum of the mi‘s. If M/m is not too huge, we can bound the expression mv_x^2 \le m_i(v_x^2 + v_y^2 + v_z^2); and for the reverse inequality, if |v_x|, |v_y|, |v_z| < K then m_i(v_x^2 + v_y^2 + v_z^2) \le 3MK^2. Once again, we only affect the entropy by a constant multiple of N.

One thus sees and appreciates the amount of difficulty in computing the constant k in the formula for S. In particular, one must choose a specific measure when partitioning the large 6N-dimensional state space. What’s the right measure for us? The answer is supplied by quantum mechanics via Planck’s constant, which will give the Sackur-Tetrode equation, but that’s another story for another day.

Posted in Notes | Tagged , , , , , , , | Leave a comment

Thermodynamics for Mathematicians (IV)

Note: for this installation, we now require multi-variate calculus, specifically partial differentiation.

10. More State Parameters

For this section, let’s consider the case of a homogeneous gas, whose state is parametrised by (PVN), where N is kept constant for now. Recall that the first law of thermodynamics says: dQdUP·dV, where dQ is a small amount of heat added to the system and dU is a small change in the system’s internal energy. But we also know from definition that dQ/T = dS, which is a small change in the system’s entropy. This gives:

 dUT·dS – P·dV.

This suggests that if we’re looking at the state variable U, then it’s a good idea to re-parametrise it as UU(SVN). The above equation then tells us:

\left.\frac{\partial U}{\partial S}\right|_{V,N} = T,\quad \left.\frac{\partial U}{\partial V}\right|_{S,N} = -P.

 [ For students who’re unfamiliar with partial differentiation, this means: if we keep the parameters V and N constant and differentiate U with respect to S, then the result is T. E.g. if f(x,y,z) = (y+z)x^2 + \sin(y)x + e^z, then \left.\frac{\partial f}{\partial x}\right|_{y,z} = 2(y+z)x + \sin(y). ]

What if we wish to parametrise in terms of TV and N? The best approach is then to define FU – ST, the Helmholtz free energy. If we perturb the system by a little bit, then the change in F is:

dF = dU – S·dT – T·dS = –S·dT – P·dV.

If we parametrise F via F(TVN), then \left.\frac{\partial F}{\partial T}\right|_{V,N} = -S and \left.\frac{\partial F}{\partial V}\right|_{T,N} = -P. But is there any physical meaning to F other than a mere symbolic convenience? Well, yes. There’re at least two ways of looking at it.

  • It’s the maximum amount of useful work you can get out of a system at an isothermal setting. Imagine connecting the system to a heat source of temperature T (i.e. same temperature as the system) and letting it do work by changing P and V. This process is reversible. Let be the amount of heat the system extracts from the heat source.
    • Since T = constant, change in entropy S_2 - S_1 = Q/T.
    • Change in internal energy is U_2 - U_1.
    • Thus the work done is Q - (U_2 - U_1) = -(E_2 - S_2 T)+(E_1 - S_1 T), which is the negation of the change in free energy. Hence one can envisage converting the free energy into work.
  • It’s the amount of energy required to form the system when it’s connected to a heat source T. Without the heat source, the amount of energy required is clearly U. However, the heat source helps out by contributing ST.

To parametrise in terms of SP and N, we let HU+PV be the enthalpy of the system. Then dHdUP·dVV·dP = T·dS + V·dP and we get:

\left.\frac{\partial H}{\partial S}\right|_{P,N} = T, \quad \left.\frac{\partial H}{\partial P}\right|_{S,N} = V.

This is the amount of energy required to create the system when it’s fixed at constant pressure P (called an isobaric setting): in addition to the internal energy U, the system has to do work in order to maintain its pressure. Since P is constant, the work done is PV.

Finally, as the reader might have guessed, for (T, PN), we define the Gibbs free energy GU + PV – ST.  One obtains:

\left.\frac{\partial G}{\partial T}\right|_{P,N} = -S, \quad \left.\frac{\partial G}{\partial P}\right|_{T,N} = V.

This is the amount of energy required to create the system in an isobaric and isothermal setting (i.e. its pressure is kept constant and at the same time, it’s connected to a constant heat source).

11. Intensive and Extensive Variables

Let’s summarise the large number of state parameters we have:

  • pressure P,
  • volume V,
  • temperature T,
  • entropy S,
  • number of particles N,
  • internal energy U,
  • Helmholtz free energy F,
  • Gibbs free energy G,
  • enthalpy H.

Each of these state parameters can be classified from the following question: if we take the union of two identical systems, do we get the same parameter, or twice the parameter? If it’s the former, we say the parameter is intensive; for the latter, extensive. In this list, the only intensive state variables are P and T while the rest are extensive. Thus, for the internal energy U, we have:

U(\lambda S, \lambda V, \lambda N) = \lambda \cdot U(S, V, N)

for any positive value of λ. We also get:

  • F(T, \lambda V, \lambda N) = \lambda \cdot F(T, V, N);
  • G(T, P, \lambda N) = \lambda \cdot G(T, P, N);
  • H(\lambda S, P, \lambda N) = \lambda \cdot H(S, P, N).

The second relation is particularly interesting: if T and P are kept constant, then G is linear in N! Hence we can write G(T, P, N) = \mu(T, P)N for some positive \mu(T, P). This is called the chemical potential of the system. This parameter is clearly intensive.

Example 9. Let’s go back to the problem of finding the entropy of an ideal gas. We had obtained:

S(P, V, N) = N(c\log P+(c+1)\log(V)) + t(N).

To figure out the right t(N), we use the fact that SVN are extensive while P is intensive, i.e. S(P, \lambda V, \lambda N) = \lambda S(P, V, N). Simplifying this relation, we obtain:

\lambda N \log\lambda + t(\lambda N) - \lambda\cdot t(N) \equiv 0.

Replacing \lambda N and N by N and N0 respectively, we see that t(N) is of the form: t(N) = -N\log N + k N for some constant k. Hence, the formula for entropy is now:

S(P, V, N) = N(c\log P+(c+1)\log(V/N)) + kN.

We will leave it at that, since figuring out k is rather tricky business which requires quantum mechanics. The complete version of the equation, known as the Sackur-Tetrode equation, even depends on the mass of each particle!

12. Application of Thermodynamics: Introduction to Phase Transition

Phase transition occurs when the system experiences a sudden change in its physical properties. Typical examples that we see in everyday life include boiling / condensation and freezing / melting. The theory of thermodynamics allows us to model the behaviour of matter, if not quantitatively, then at least qualitatively.

The following is extracted and rephrased from David Tong’s notes on statistical physics, but any mistake most likely belongs to me. Most of the diagrams below are also stolen / modified from his notes.

Let’s write down van der Waal’s equation for a not-so-ideal gas/liquid:

P = \frac{T}{v-b} - \frac a {v^2}, where vV/N.

If we plot out the Pv graph (warning: not PV graph) for various temperatures, we obtain the following:

Note that if V is close to b, then the material is in a liquid state since compressing it requires a disproportionately large amount of force. Conversely, if V is huge, then it is in gaseous state.

For temperatures below a critical limit T < T_c the graph has two turning points. The state (PV) of the material is unstable between these two points. For if we reduce the volume a little by compressing it, then the decrease in pressure causes it to be compressed further; conversely, if we expand it a little, then the increase in pressure will continue to expand it. Indeed, within this region the material is in a mixture of liquid and gaseous state (we say it is in vapour-liquid equilibrium for the pure system).

How do we find the interval where vapour-liquid equilibrium occurs?

In order for vapour and liquid to coexist, the intensive parameters of both components must be equal. The temperature is already equal. Let the common pressure be P. The final parameter is the chemical potential μ (or the Gibbs free energy per particle), which is the Gibbs energy per particle since G(T, P, N) = μ(TP) N. Thus we have:

\left.\frac{\partial\mu}{\partial P}\right|_{T} = \frac 1 N \left.\frac{\partial G}{\partial P}\right|_{T,N} = \frac V N.

Since our T and N are indeed fixed, we get \mu(P, T) = \mu_0 + \int \frac{V(P',T)} N dP' (here, P doesn’t increase monotonically over the integration region!). Recall that v = V/N. The result of the integral is hence the difference of the areas of the two shaded regions below. [ Note: this is non-trivial, do take a minute to figure out what the integration means over a non-monotonically increasing variable. ]

Hence, for (Pμ) of both the solid and liquid states to be equal, we need \int \frac V N dP' = 0, i.e. the two shaded regions have the same area. This is known as Maxwell construction, which gives us the condition for which the liquid and gaseous state can coexist. Note that the region is wider than the interval between the two turning points. The remaining intervals correspond to meta-stable states; these states can exist if we prepare the material sufficiently slowly and carefully but they’re highly delicate and unstable.

Now let the temperature increase until T \to T_c. This approaches the critical point, beyond which there is no clear distinction between the liquid and gaseous states.

As one might imagine, this temperature is rather high (for water, it’s about 374°C). From van der Waal’s equation, this critical point can be determined via algebra, for it is where the Pv graph has an inflection point:

\frac{\partial P}{\partial v} = 0, \quad \frac{\partial^2 P}{\partial v^2} = 0.

Solving gives T_c = 8a/27b. The corresponding pressure and volume at the critical point are P = a/27b^2 and v = V/N = 3b.

And this concludes our very brief introduction to phase transition. Needless to say, we’ve barely scratched the surface of the applications of thermodynamics, but we hope the reader’s interest is sufficiently piqued to seek out other more definitive sources.

Posted in Notes | Tagged , , , , , | Leave a comment

Thermodynamics for Mathematicians (III)

Note: calculus, specifically integration, is essential in this article, though students can possibly just substitute it with summation to achieve the same result.

In the previous installation, we defined a temperature scale by looking at the efficiency of a heat engine. However, recall we could also define a temperature scale in the case of an ideal  gas via f(PVN) = PV/N. How do these two compare?

7. Optional Interlude: Temperature for an Ideal Gas

We prepare a homogeneous ideal gas with parameters (P_1, V_1), constant N, whose ideal gas temperature is T_1 = P_1 V_1 / N. Next we subject it to the following cycle:

  1. Connect the gas to a heat source at temperature T_1. Expand it to (P_2, V_2).
  2. Seal the gas adiabatically and expand it further to (P_3, V_3). Measure the temperature and write it as T_2 = P_3 V_3 / N.
  3. Connect the gas to a heat source at temperature T_2. Contract it to (P_4, V_4).
  4. Seal the gas adiabatically and contract it till we obtain the original parameters (P_1, V_1).

We call the above sequence a Carnot cycle. Recall that two ideal gases at the same temperature satisfies PV = P'V' since N is constant, and during adiabatic transformation, we have (P'/P) = (V'/V)^{-(c+1)/c}. If we look at the state of the gas throughout the entire transformation, we get the following PV graph:

Next, observe that each of the four steps is reversible since each process is either adiabatic or performed at the same temperature as the heat source (a process which occurs at constant temperature is said to be isothermal). Since the initial and final states of the gas are identical, dU = 0, so the net result is an extraction of heat Q_1 from the hotter source during the first step, and a deposit of heat Q_2 into the colder source during the third step, i.e. it is a reversible heat engine! Let’s calculate Q_2 / Q_1 by writing out the equations relating the state parameters.

  1. P_1 V_1 = P_2 V_2 \implies P_2/P_1 = V_1/V_2 since this is isothermal;
  2. (P_3/P_2) = (V_3/V_2)^{-(c+1)/c} since this is adiabatic;
  3. P_3 V_3 = P_4 V_4 \implies P_4/P_3 = V_3/V_4 since this is isothermal;
  4. (P_1/P_4) = (V_1/V_4)^{-(c+1)/c} since this is adiabatic.

Multiplying the four relations then gives us V_1 V_3 = V_2 V_4, or V_2 / V_1 = V_3 / V_4.

To compute Q_1, since step 1 occurs isothermally, there’s no change in the internal energy c·PV of the gas (reminder: this is an ideal gas; for a non-ideal gas the internal energy can depend on its volume/pressure even if the temperature is kept constant). So we only need to find the amount of work done:

Q_1 = \int_{V_1}^{V_2} P\cdot dV = T_1 N \int_{V_1}^{V_2} \frac 1 V dV = T_1 N \log(V_2/V_1).

Likewise Q_2 = T_2 N \log(V_3/V_4). Hence the ratios do match: Q_1 / Q_2 = T_1 / T_2, so the ideal gas temperature scale is consistent with the thermodynamic temperature up to a scaling factor.

8. Defining Entropy

From here onwards, we shall assume that our temperature is the thermodynamic temperature.

Suppose we have a system which is connected to heat sources T_1, T_2, \dots, T_r sequentially, and takes in heat Q_i from the heat source T_i. Here, Q_i can be negative, in which case the system actually releases heat into the heat source. We shall prove the following.

If the initial and final states of the system are identical, then \sum_{i=1}^r Q_i/T_i \le 0. Equality holds if and only if the entire process is reversible.

[ Conceptual exercise: since the initial and final states of the system are identical, does that mean the internal energy is unchanged? If so, is the net heat \sum_{i=1}^r Q_i extracted equal to zero? ]

Proof. Create a new heat source at some arbitrary temperature T. For i=1, 2, …, r we connect a reversible heat engine between T and Ti such that

  • the heat engine takes in heat \frac T {T_i}\times Q_i from T;
  • it then releases heat Q_i into T_i;
  • and converts the balance into work.

When this is performed together with the earlier process, we see that the net transfer of heat into each of T_1, T_2, \dots, T_r is zero. On the other hand, the transfer of heat from source T is given by:

\sum_{i=1}^r \frac T {T_i} Q_i = T \times \sum_{i=1}^r \frac {Q_i}{T_i},

and all of it is converted into work. Kelvin Law then stipulates that this value must be non-positive. So the result follows. To prove the second statement, if the sum is zero, then the two processes are mutual reverses of each other; conversely, if the process is reversible then so is the concatenation of the two processes, which is the conversion of work into heat. By Kelvin Law again, the only way this can be reversible is when the amount of work converted is zero. ♦

Let’s consider the continuous variant of the above result.

During a small time step, suppose a system absorbs heat dQ from a heat source at temperature T. If the process is reversible and the initial and final states of the system are identical, then

\int \frac 1 T dQ = 0.

This means for any two states A and B of a system, if we take a reversible process which brings the system from A to B, then the integral \int \frac 1 T dQ along the path is independent of the process we pick, as long as it is reversible. Thus we can define a state parameter S such that S(B) - S(A) = \int \frac 1 T dQ, or in differential format, dS = \frac 1 T {dQ}. This is the entropy of the system, which is well-defined up to an additive constant.

Example 7. We meet our old friend, the ideal gas with state (PVN). What is its entropy? For that, we need to find a reversible process which takes the gas from (P_1, V_1, N) to (P_2, V_2, N), assuming N is fixed for now. What better choice than to pick parts of the Carnot cycle?

  • For an isothermal process, S_2 - S_1 = \int \frac 1 T dQ = \frac Q T, where Q is the total amount of heat pumped in. Since the temperature of the system remains constant, so is the internal energy (remember: ideal gas!) and thus all heat is used to do work, i.e. S_2 - S_1 = \int_{V_1}^{V_2} P/T\cdot dV = N \log(V_2/V_1) since P = TN/V. This happens when (P_2/P_1) = (V_2/V_1)^{-1}.
  • For an adiabatic process, S_2 - S_1 = 0 since dQ = 0 throughout (the very definition of adiabatic). This happens when (P_2/P_1) = (V_2/V_1)^{-(c+1)/c}.

 Putting these together, we can take:

 S(P, V, N) = N(c\log P+(c+1)\log(V)) + t(N)

where t(N) is an additive constant dependent on N. We will figure out a more complete expression for this later.

9. More About Entropy

One important consequence is that the entropy of an adiabatically sealed system must never decrease. More generally, suppose there’s a process C which brings a system from state A to state B (i.e. the system is now interacting with its environment). Now we construct a reversible process D which brings the system from state B back to state A. If we perform D followed by C, then we get:

0 \ge \int_{C\cup D} \frac 1 T dQ = \int_C \frac 1 T dQ + \int_D \frac 1 T dQ.

Since D is reversible, the second summand is just S(A) – S(B). So we have

\int_C \frac 1 T dQ \le S(B) - S(A).

In the case of an adiabatically sealed system, the left-hand-side is 0, so S(B) \ge S(A). This gives the more commonly known variant of the second law of thermodynamics: the entropy of an adiabatically sealed system tends to increase. However, we now see that this is actually a consequence – in fact, without Clausius or Kelvin Law, we wouldn’t be able to define a total ordering on the set of temperatures, let alone talk about heat engines or define entropy.

Example 8. This is the quintessential example of an increase in entropy of an isolated system. Suppose heat Q flows from an area of higher temperature T_1 to an an area of lower temperature T_2. The hotter area thus loses entropy Q/T_1 while the colder area gains entropy Q/T_2. Thus, if we only look at the union of these two heat sources, the change in entropy is Q(T_2^{-1} - T_1^{-1}) > 0.

Finally, the Third Law of Thermodynamics states that in general, the entropy of a system is non-negative and as T \to 0, the entropy of the system also tends to 0. In particular, the third law enables us to find the “additive constant” for the formula of entropy.

Posted in Notes | Tagged , , , , , | Leave a comment

Thermodynamics for Mathematicians (II)

We continue our discussion of thermodynamics. Warning: some of the examples will involve calculus and a bit of (ordinary) differential equations.

3. Work Done by/on the System

A system can do work, e. g. when a gas expands and pushes against a piston, the work done can be used to power an engine. When energy is supplied to a system in the form of heat, it’s found experimentally that not all of the heat is converted to work by the system. One then postulates that the system absorbs a certain amount of energy also.

Specifically, one postulates that there is a state parameter U of every system, which tells us the amount of internal energy in it. The first law then follows from the law of conservation of energy.

First Law of Thermodynamics. Consider a small infinitesimal time step. Let dU be the change in internal energy in the system, dW be the work done by the system and dQ be the amount of energy supplied to the system in the form of heat. Then dQ = dW + dU.

[ Note that we don’t really care about the absolute internal energy of a system since the first law really deals with the change in energy. This is a recurring theme in classical physics, where the relative change in energy is relevant while the absolute energy is only well-defined up to an additive constant. ]

One must be cautious about the differentials dQdW and dU, specifically dU is an exact differential but dQ and dW are inexact differentials. What do we mean by that? It means, over a longer time step, if we only have access to the system at the initial and final states, then we can measure dU (since it’s only a change in a state variable) but there’s no way to determine the work done dW, as the following example illustrates.

Example 5. Consider the case of a homogeneous gas (PVN) with fixed N. By kinetic theory of gases, one deduces that the small amount of work done is given by dWP·dV. Hence, over time, the total amount of work done is the integral W = \int_{V}^{V'} P\cdot dV. If we only know the initial and final states (PVN) and (P’V’N), we can’t evaluate the integral without knowing the entire path.

Hence for a homogeneous gas, we can write the First Law as: dQ = dU + P\cdot dV, which is the version normally taught in JCs (in Singapore).

Example 6. In the case of an ideal gas, kinetic theory of gases tells us that U = c·PV for some constant c. Hence, we have:

 dQ = dU + P\cdot dV = c (P\cdot dV + V \cdot dP) + P\cdot dV = (c+1) P\cdot dV + c V \cdot dP.

In the case of an adiabatic setting, we have dQ = 0 which gives us a differential equation for the adiabatic expansion / compression of an ideal gas: P^{-1} dP = -\frac{c+1} c V^{-1} dV. This is not hard to solve:

\int_P^{P'} P^{-1} dP = -\frac{c+1}c\int_V^{V'} V^{-1} dV \implies \left(\frac{P'}{P}\right) = \left(\frac{V'}{V}\right)^{-(c+1)/c}.

This gives the equation for an adiabatic transformation of an ideal gas.

4. No Free Lunch

Recall our definition of temperature as an equivalence class under thermal equilibrium. In fact, a moment of thought reveals that two diabatically separated systems A and B behave in the following way:

  • there is no net transfer of heat between A and B (i.e. A and B are in T.E.);
  • there is net transfer of heat from A to B;
  • there is net transfer of heat from B to A.

For the 2nd case we shall say that A is hotter than B or that B is colder than A. Now let’s assume the case where A is hotter than B which is in turn hotter than C. What’s gonna happen when we put A and C together (separated by diabatic wall of course)? In the cases where A and C are in T.E. or C is hotter than A, by connecting the three systems in a loop, we get a never-ending flow of energy from A to B to C and back to A!

The Second Law of Thermodynamics says this is impossible, so we must have A hotter than C. In other words, the set of temperatures forms a totally ordered set!

There’re two variants of this law.

Clausius Law: there is no process whose only net result is the transfer of heat from a body of lower temperature to a body of higher temperature.

Kelvin Law: there is no process whose only net result is the absorption of heat from a source and 100% conversion to (positive) work.

How does one prove the equivalence of Clausius Law and Kelvin Law? For that, we’ll need to rely on some physical intuition. E.g. suppose Kelvin Law fails; then there is a method to extract heat from a source to convert it to 100% work, but we can convert the work to friction to heat up any source, even a colder one, and so Clausius Law fails too. Conversely, suppose Clausius Law fails. Then we can form a never-ending cycle of energy flow as mentioned above and tap the energy to give work, thereby violating Kelvin Law.

5. Heat Engines

Let’s say we now have two huge heat sources at temperatures T1 and T2 respectively, where T1T2. This just means T1 is hotter than T2, i.e. there’s a net flow of heat from the first to second heat source; it does not mean we’ve assigned a real-valued temperature scale. In fact, the very purpose of this section is to do just that: define a temperature scale in terms of non-negative real numbers.

Definition. A heat engine is a process whose net result is to extract heat (energy) Q1 from the first source (at temperature T1), deposit heat Q2 to the second source (at temperature T2), and convert the balance to positive work.

We’ll assume that the heat sources are so huge that the heat engine has little effect on their temperatures.

The value \eta = 1 - Q_2/Q_1 is called the efficiency of the heat engine. To motivate this definition, imagine a mechanical engine which taps into a hot pool of gas, performs work, and loses the remaining energy to the environment (at temperature T2). The work done by the engine is useful since it can be converted to other forms of mechanical work, while the energy lost to the environment is wasted.

The key result we are going to prove is the following. Suppose we connect heat engines E and F to the heat sources T1 and T2 in turn. The first heat engine extracts heat Q1 from the first source and deposits Q2 to the second. The second heat engine extracts heat Q3 from the first source and deposits heat Q4 to the second.

Then the following result holds.

Key result. If E is reversible, then Q_2/Q_1 \le Q_4/Q_3. Equality holds if and only if F is also reversible. [ In particular, this implies that the efficiency of a reversible heat engine is as good as it gets. ]

Proof. Let’s prove Q_3/Q_1 \le Q_4/Q_2 which is equivalent. Approximate Q_4/Q_2 by a rational number M/N, where M and N are positive integers. Thus it suffices to show M/N \ge Q_3/Q_1, or NQ_3 - MQ_1 \le 0. Let’s perform some iterations of the above heat engines: N times of F, followed by M times of the reverse of E.

What’s the net outcome of this exercise? The heat deposited in the cold source is N Q_4 - M Q_2 = 0 while the heat extracted from the hot source is N Q_3 - M Q_1, all of which is converted to work! By Kelvin Law, the work done cannot be positive, so we have N Q_3 - M Q_1 \le 0 as desired.

To prove the second part, if F is reversible, then we can swap E and F to obtain the reverse inequality. Conversely, if equality holds, then M cycles of F form the reverse of N cycles of E so E is reversible. ♦


6. Definition of Temperature

In particular, the efficiency of a reversible heat engine depends only on T1 and T2So let us write f(T_1, T_2) = Q_1/Q_2, which is the reciprocal of the rate of wastage. It follows that if T_1 > T_2 > T_3, then from the following diagram:

the overall net result is an extraction of heat Q_1 from the temperature source T_1 and a deposit of heat Q_3 to the temperature source T_3, with the difference converted to work. But this is precisely the work of a heat engine for T_1 > T_3. [ Note: one subtle assumption is that we can tune F to take in an exact amount of heat Q_2 from source T_2. ]  So, all in all, we have shown:

f(T_1, T_2) f(T_2, T_3) = f(T_1, T_3), when T_1 > T_2 > T_3.

Next, we’ve only defined f(T_1, T_2) for T_1 > T_2. More generally, we shall define f(T_1, T_2) as 1 if T_1 = T_2 and f(T_2, T_1)^{-1} if T_1 < T_2. Under this new definition, we see that for any temperatures T_1, T_2, the reversible heat engine will extract heat Q_1 from the source T_1 and deposit heat Q_2 to the source T_2 and convert the balance to work, where the work done can be negative!

For any three temperatures T_1, T_2, T_3, we now have (regardless of the temperature ordering):

  • f(T_1, T_1) = 1;
  • f(T_1, T_2) = f(T_2, T_1)^{-1};
  • f(T_1, T_2) f(T_2, T_3) = f(T_1, T_3).

Finally, let us fix some base temperature T_0 and define g(T) := f(T, T_0). The above 3 relations then tell us f(T_1, T_2) = g(T_1)/g(T_2) for any two temperatures. So the only function that really matters for reversible heat engines is a single parameter gWe shall define this as our temperature scale.

Definition. For any temperature T, we shall associate to it a real-valued number g(T), which gives us a temperature scale. This is called the thermodynamic temperature of the system. Note that if we had chosen some other base temperature T_0, we would get a constant factor of this scale.

 (To be continued…)

Posted in Notes | Tagged , , , , , | Leave a comment

Thermodynamics for Mathematicians (I)

“Every mathematician knows it is impossible to understand an elementary course in thermodynamics.” – V. I. Arnold.

The following is the culmination of at least three attempts to understand thermodynamics at various times during the last two years. Why’s it so hard, you might wonder? In hindsight, I’m not entirely sure either, but I do know that thermodynamics has broken the record for my number of iterations of the confusion-clarification cycle which one must inevitably encounter when learning a new topic.

You might also wonder why there’s a post about physics in this blog. Not only is this non-Olympiad, but it doesn’t even appear related to mathematics. However, classical thermodynamics is based on a deductive system which is particularly appealing and satisfying to a mathematician’s sensibilities. One can think of the four (not three!) laws of thermodynamics as the axioms, and obtain the theory via a mixture of deductions and physical intuition. The level of rigour of course is not as airtight as that of a mathematical theory, but coupled with some basic physical assumptions it should suffice.

Finally, I must confess I’m a bit jittery about writing a post on physics here, since there will  no doubt be errors in the post. For this, I beg the readers to be patient and any corrections will be appreciated. If you wish a proper treatment of thermodynamics which is similar in spirit to what I have here, there are:

  • Enrico Fermi, “Thermodynamics“, Dover Publications, 1936.
  • A. B. Pippard, “The Elements of Classical Thermodynamics“, Cambridge University Press, 1966.

1. Basic Concepts

Ok let’s begin with the definitions. To fix ideas, we will assume that our system comprises of one or more types of liquids/gases. The system is said to be in equilibrium if there is no discernible macroscopic change over time, although on a microscopic level things are certainly far from tranquil. We will assume all systems to be in equilibrium unless otherwise stated. When a system is in equilibrium, we can then define the state of the system by a list of values, known as state variables, which can be measured.

Example 1. If the system comprises of a single homogeneous gas, then we may measure the volume (V), the pressure (P) and the number of particles (N). These are the only variables we consider for now. What about the temperature, you might ask? Problem is, we don’t know how to define the temperature. Indeed, both volume and pressure make sense from the theory of classical mechanics, but not temperature. In fact, part of the purpose of thermodynamics is to introduce, define and develop the concept of temperature.

Example 2. If the system comprises of two gases separated by a wall, then we need to provide the parameters for each gas, i.e. (P_1, V_1, N_1) and (P_2, V_2, N_2).

Now, a stand-alone system may exchange energy with the environment in the form of heat. A system is said to be adiabatically sealed if no such exchange takes place. A wall separating two systems is said to be diabatic if heat exchange can take place – however, the presence of the wall means material exchange does not occur. In short: diabatic = possible heat exchange; adiabatic = no heat exchange. In both cases, particles don’t cross the wall.

The next definition is extremely important.

Given two systems, we say they are in thermal equilibrium (T.E.) if, when we separate them by a diabatic wall (and seal the whole union adiabatically), no net exchange of heat takes place.

Clearly, the concept of T.E. satisfies the following:

  • reflexive (every system is in T.E. with itself, by sheer definition);
  • symmetric (if system A is in T.E. with system B, then so does system B with system A).

The Zero-th Law of Thermodynamics then states that T.E. is transitive, which means it is an equivalence relation!

Zero-th Law of Thermodynamics. If system A is in T.E. with system B, and system B is in T.E. with system C, then system A is in T.E. with system C.

Now we’re ready to define temperature. We shall define the abstract temperature to be the equivalence class of all systems, under thermal equilibrium. [ For those who don’t know equivalence classes, we take all systems at T.E. with each other and assign a unique label to each one, calling it its temperature. ]  Unfortunately, that’s an abstract definition, and we can’t make head or tail out of it in practice, let alone perform computations.

Example 3. Consider two separate homogeneous gases, parametrised by (PVN) and (P’, V’, N’) as in example 1 above (adiabatically sealed of course). If we bring them together, there will be a real-valued function f(PVN) such that the two gases are in T.E. if and only if:

f(PVN) = f(P’V’N’).

In the case of an ideal gas, kinetic theory of gases (which we shan’t go into here) gives us Boyle’s law, so we can take f(PVN) = PV/N. A more realistic model, however, is the van der Waal’s equation f(P,V,N) = (P + \frac a {v^2})(v - b), where vV/N. Notice that the ideal gas equation is the limiting case of van der Waal’s equation where ab = 0. [ Van der Waal’s equation assumes that particles have positive size and have mutually attractice / repulsive forces, hence the difference. But we digress. ]

Anyway, in the case of an ideal gas, we could take the temperature to be PV/N. But let’s not do that, since it requires us to peg the temperature to a fixed type of system which does not even exist in reality, namely that of a homogeneous ideal gas.


2. When the State Changes

A system may not remain at the same state forever. Hence, we will look at processes whereby a system may change its state from A to B (e.g. when a gas is compressed). At each time step during the process, we may take instant snapshots and consider the state of the system. This forms a curve in a multi-dimensional space, where the number of dimensions is precisely the number of state variables.

But there’s a problem: notice we only defined states for systems at equilibrium, whereas if a system changes its state too quickly, the snapshots may not be at equilibrium. Hence, we will assume the ideal case of quasi-static processes, i.e. processes which occur so slowly that the instant snapshots that we take are always in equilibrium. From now onwards, we will assume all processes are quasi-static unless otherwise stated. Thus it makes sense to talk about the path of the system in the state space.

Example 4. In the case of a homogeneous gas sealed within an elastic chamber, the number of particles N remains unchanged, so a quasi-static process is given by a path on the PV plane.

The next definition is key to the whole theory of thermodynamics.

A process is said to be a reversible cycle if the states of the entire universe at the initial and final points are identical. More generally, a process is said to be reversible if it is part of a reversible cycle.

 Important! A reversible process always has to be considered from a global scale. Later on, we will talk about a certain system undergoing a reversible process. By that, we mean that it is possible to reverse the entire process on a global scale, i.e. it is possible to bring the whole universe back to the initial state. In particular, if we have a homogeneous gas and we only look at the path on the PV plane, we cannot tell if the process is reversible without knowing what the exact process is!

We repeat: reversible processes must be global.

Or, instead of looking at the whole universe, we could just seal a system adiabatically and look at it in isolation. Since we’re only looking at heat transfer, this cuts off all interactions between a system and its environment; specifically, we could imagine the process taking place inside the system while the rest of the universe stands still. It wouldn’t make an iota of difference to the system anyway because it’s adiabatically sealed from it.

Of course, if all processes are reversible, this definition would be vacuous. The whole point of the Second Law of Thermodynamics is that this is not true. One usually speaks of two variants – Clausius Law and Kelvin Law – but to make sense of Clausius Law, one needs to consider the work done by the system. And for that, one requires the First Law….

(To be continued…)

Posted in Notes | Tagged , , , , , | 7 Comments

Power Series and Generating Functions (II): Formal Power Series

For today’s post, we shall first talk about the Catalan numbers. The question we seek to answer is the following.

Let n be a positive integer. Given a non-associative operation *, find the number of ways to bracket the expression x_1 * x_2 * x_3 * \dots * x_n while preserving the order of the variables. For example, if n = 4, we have 5 possible expressions:

w*(x*(y*z)),  w*((x*y)*z),  (w*x)*(y*z),  (w*(x*y))*z,  ((w*x)*y)*z.

Write C_{n-1} for the number of such expressions; the slightly shifted index is so that we get a nice expression in C_n. Using the technique of divide-and-conquer, it is not hard to obtain a recurrence relation in the sequence. Indeed, if we pick an expression with n+1 variables, it must be of the form: (expr1) * (expr2), where the two expressions have (say) i and n+1-i variables respectively, where i = 1, 2, …, n. Hence we have:

 C_n = C_0 C_{n-1} + C_1 C_{n-2} + \dots + C_{n-1} C_0,  C_0 = 1. (*)

This enables us to easily compute C_n until pretty large values of n, so in practice it’s already quite good. But in this case, we can go even further by obtaining an explicit formula for C_n. Indeed, let P(x) = C_0 + C_1 x + C_2 x^2 + \dots . Then the coefficient of x^{n-1} in P(x)^2 is precisely the expression on the right of (*). Hence we have:

x\cdot P(x)^2 + 1 = P(x) \implies P(x) = \frac{1 \pm \sqrt{1 - 4x}} {2x}.

Now we apply the generalised binomial expansion on \sqrt{1 - 4x} = (1-4x)^{1/2}. For n > 1, the coefficient of x^n in its Taylor expansion is:

\frac {(-4)^n} {n!} \left(\frac 1 2 \cdot \left(-\frac 1 2\right) \cdot \dots \cdot \left(\frac 1 2 - (n-1)\right)\right) = -\frac{2^n \cdot 1 \cdot 3 \cdot \dots \cdot (2n-3)}{n!}.

Multiplying the numerator and denominator by (n-1)!, a bit of algebraic manipulation gives us the final answer: -\frac 2 n {2n-2 \choose n-1}. Since the C_n are all positive, we have:

C_n = \frac 1 {n+1} {2n \choose n}.

Note: the above computations all hold since the power series has a positive radius of convergence, also the generalised binomial theorem holds for all |x| < 1:

(1 + x)^r = 1 + rx + \frac{r(r-1)}{2!}x^2 + \frac{r(r-1)(r-2)}{3!}x^3 + \dots

More Appearances of Catalan Numbers.

1. Consider an n \times n grid. We wish to count the number of paths from the southwest corner to the northeast along the edges of the grid which are (i) shortest, and (ii) remain below the line joining the southwest and northeast corner. E.g. when n=3, we have 5 possible paths:

The number of such paths is precisely C_n.

2. Fix a convex (n+2)-gon on a plane. The number of triangulations of the polygon is exactly C_n. E.g. for n=3, we have 5 possibilities.

Formal Power Series

[ Important note: only the main definitions and properties are listed here, hopefully enough to give the reader a gist of the underlying theory. Loads of details have been omitted as including them would have spanned about ten pages or so. ]

As mentioned previously, we will also look at the case where we have power series of the form:

P(x) = a_0 + a_1 x + a_2 x^2 + \dots

without explicitly substituting values in x. For instance, if a_n = n^n, then the series doesn’t converge except at x = 0, but it still makes sense to talk about the series purely in notational form. We shall call such a power series a formal power series.

Definition. A formal power series is an expression of the form f(x) = a_0 + a_1 x + a_2 x^2 + \dots, where each a_i is a real (or complex) number.

At the risk of appearing repetitive, we re-iterate that f(x) is merely a notation and we have no intention of letting x take any value. In fact, we could have just as easily expressed it as an infinite-tuple: f = (a_0, a_1, a_2, \dots) without any loss of data. However, the power series notation appeals more to our intuition due to the definitions which are about to follow.

Definitions. Let f(x) = a_0 + a_1 x + a_2 x^2 + \dots, g(x) = b_0 + b_1 x + b_2 x^2 + \dots be any two formal power series. Then their sum and product are power series defined respectively as follows:

s(x) = c_0 + c_1 x + c_2 x^2 + \dots, p(x) = d_0 + d_1 x + d_2 x^2 + \dots,

where c_n = a_n + b_n and d_n = a_n b_0 + a_{n-1} b_1 + \dots + a_0 b_n.

Let’s go back to the example of f(x) = 1^1 x + 2^2 x^2 + 3^3 x^3 + \dots which fails to converge for any non-zero x. Despite this, we can consider f(x) as a formal power series and take its square: f(x)^2 = x^2 + 8x^3 + 70x^4 + \dots. Note that we can calculate any coefficient as we please, but there doesn’t seem to be a nice closed formula for the general coefficient.

It’s not hard to show that the standard arithmetical properties hold under sum and product: for any formal power series f, g and h, the following hold:

  • commutativity: f+g = g+f, f \times g = g\times f;
  • associativity: (f+g)+h = f+(g+h), (f\times g)\times h = f \times(g\times h);
  • distributivity: f\times(g + h) = (f\times g) + (f\times h);
  • identity: the ‘0’ polynomial satisfies 0 + f = f, the ‘1’ polynomial satisfies 1 \times f = f;
  • no non-zero zero divisors: (#) if f \times g = 0, then f or g is zero (this will prove to be important later).

Also, addition has an inverse: namely, for any formal power series f, there is a unique power series g such that f+g = 0 (the zero-polynomial). We will denote this g by (-f). It’s easy to see that the each coefficient of g is the negation of the corresponding coefficient of f. Subtraction of formal power series is then defined by f - g \equiv f + (-g).

To recap, we have just defined addition, multiplication and subtraction on the set of all power series. [ For those who know abstract algebra, we’ve just defined a ring. ]

The next question is of interest:

When is there a multiplicative inverse of f(x)? In other words, for which f can we find a g for which f \times g = 1?

The answer may surprise you: a formal power series has a multiplicative inverse if and only if its constant term is non-zero. We’ll quickly sketch the proof: in one direction, this is easy: if f \times g = 1 then the constant terms of f and g must multiply to 1, so they are non-zero.

Conversely, suppose f(x) = a_0 + a_1 x + a_2 x^2 + \dots, where a_0 \ne 0. If g(x) = b_0 + b_1 x + b_2 x^2 + \dots is its inverse, what would it look like? For starters, clearly a_0 b_0 = 1 \implies b_0 = a_0^{-1}. Now suppose we have already found b_0, b_1, \dots, b_n; in order to compute the next coefficient, we need a_0 b_{n+1} + a_1 b_n + \dots + a_{n+1} b_0 = 0, which gives:

b_{n+1} = -\frac 1 {a_0} (a_1 b_n + a_2 b_{n-1} + \dots + a_{n+1} b_0).

Hence, we can define the b_n‘s inductively from b_0 onwards. It follows quite easily from our definition that the resulting power series g satisfies f(x) g(x) = 1.

Ok, we’re done with the definition of arithmetic operations on formal power series. Our next definition is the derivative.

Definition. The formal derivative of a formal power series f(x) = a_0 + a_1 x + a_2 x^2 + \dots is defined to be the formal power series:

\frac{d}{dx} f(x) \equiv a_1 + 2a_2 x + 3a_3 x^2 + \dots.

The relationship between the arithmetic operations and the formal derivative is precisely what we expect: if f and g are formal power series, then

  • \frac{d}{dx} (f+g) = \frac {d}{dx} f + \frac {d}{dx} g;
  • \frac{d}{dx} (f\times g) = f \times \frac{d}{dx} g + \frac{d}{dx} f \times g;
  • if f is invertible, then \frac{d}{dx}(f^{-1}) = - \frac{d}{dx}f \times (f^{-1})^2 (exercise: prove it, try to make use of the first two properties and avoid dealing with the explicit coefficients).

Finally, the following definition is quite important.

Let f(x) and g(x) be formal power series, where g has constant term zero and f(x) = a_0 + a_1 x + a_2 x^2 + \dots. The composition f(g(x)) is defined to be:

f(g(x)) = (f\circ g)(x) = a_0 + a_1 g(x) + a_2 g(x)^2 + \dots.

Note that to calculate the coefficient of x^n in f\circ g, one only needs to take the first n terms a_0 + a_1 g(x) + \dots + a_n g(x)^n.

Quick exercise : what do we require g to have no constant term? The important properties for power series composition are:

  • (f+g)\circ h = (f\circ h) + (g\circ h);
  • (f\times g)\circ h = (f\circ h) \times (g\circ h);
  • (f\circ g)\circ h = f\circ (g\circ h);
  • \frac d {dx} (f\circ g) = \frac{df}{dx}\circ g \times \frac{dg}{dx} (chain law of derivative).

Now, armed with the theory of formal power series, we can define the usual power series formally without worrying on the convergence:

  • \sin(x) \equiv x - \frac {x^3} {3!} + \frac {x^5} {5!} - \frac{x^7}{7!} + \dots;
  • \cos(x) \equiv 1 - \frac{x^2}{2!} + \frac{x^4}{4!} - \frac{x^6}{6!} + \dots;
  • \exp(x) \equiv 1 + x + \frac{x^2}{2!} + \frac{x^3}{3!} + \dots;
  • \log(1+x) \equiv x - \frac{x^2} 2 + \frac{x^3} 3 - \frac{x^4} 4 + \dots;
  • for real r(1+x)^r \equiv 1 + rx + \frac{r(r-1)}{2!}x^2 + \frac{r(r-1)(r-2)}{3!}x^3 + \dots.

Upon this definition, we find that our usual trigonometric / exponential identities hold. So our intuition carries over:

Example 1 : for any real r and s,  \exp(rx) \times \exp(sx) = \exp((r+s)x).

Sketch of proof : First, show that any power series satisfying \frac{df}{dx} = f is a constant multiple of exp(x) (left as an exercise; hint: find a recurrence relation among the coefficients of f). Now let f(x) = \exp(rx) \exp(sx) - \exp((r+s)x). Then its derivative satisfies, by the sum and product formulae:

\frac{df}{dx} = r \exp(rx)\exp(sx) + s \exp(rx)\exp(sx) - (r+s)\exp((r+s)x) = (r+s) f(x).

Hence, g(x) = f((r+s)^{-1}x) satisfies \frac {dg}{dx} = g and g is a multiple of exp(x). Since g has no constant term, it must be 0. ♦

Example 2 : \sin(2x) = 2 \sin(x) \cos(x).

Sketch of proof : (using complex numbers). Write \cos(x) = \frac 1 2 (\exp(ix) + \exp(-ix)) and \sin(x) = \frac 1 {2i} (\exp(ix) - \exp(-ix)), where i = \sqrt{-1}. Use example 1. ♦

Example 3 : for any real r and s, (1+x)^r \times (1+x)^s = (1+x)^{r+s}.

Sketch of proof : let f(x) = (1+x)^r \times (1+x)^s - (1+x)^{r+s}. It’s easy to show that any power series satisfying (1+x)\frac{df}{dx} = r f is a constant multiple of (1+x)^r (left as an exercise). Hence, the derivative of f is:

(1+x)\frac{df}{dx} = r(1+x)^r \times (1+x)^s + (1+x)^r \times s(1+x)^s - (r+s)(1+x)^{r+s} = (r+s) f,

so f is a constant multiple of (1+x)^{r+s}. Since the constant term of f is zero, we have f = 0.

Exercise: prove that composition gives \exp(\log(1+x)) = 1+x.

Having established all this background, one can now work out the arguments for the derivation of Catalan numbers by looking at formal power series alone. However, there’s one slight caveat we need to address: we found the quadratic equation x\cdot P(x)^2 + 1 = P(x) satisfied by P and concluded that P can be obtained by the binomial expansion. While it’s true that the resulting binomial expansion does give Q such that Q(x)^2 + 1 = Q(x), how do we conclude that P = \pm Q?

Answer: upon completing the squares, we find that this boils down to the following deduction R(x)^2 = S(x)^2 \implies R(x) = \pm S(x). In order to derive this, we simplify R(x)^2 - S(x)^2 = (R(x) + S(x))(R(x) - S(x)). The fact that there are no non-zero zero divisors (#) then says R+S or R-S is zero.

Exercises

1. Obtain Binet’s formula for the Fibonacci sequence F_0 = 0, F_1 = 1, F_{n+1} = F_n + F_{n-1} for n \ge 1 as follows.

  • Define the formal power series f(x) = F_0 + F_1 x + F_2 x^2 + \dots.
  • Prove that we have (1 - x - x^2) f(x) = x, as an equality of formal power series. Hence we get f(x) = x (1 - x - x^2)^{-1}.
  • Use partial fractions to write \frac x {1-x-x^2} = \frac a {1-rx} + \frac b {1-sx} for some real constants a, b, r, s.
  • As power series, we thus obtain f(x) = a(1 + rx + r^2 x^2 + \dots) + b(1 + sx + s^2 x^2 + \dots). Obtain Binet’s formula.

2. Let f be the formal power series for the sequence a_0, a_1, \dots.

  • Prove that the formal power series for the partial sums s_n = a_0 + a_1 + \dots + a_n is given by f(x)(1-x)^{-1}.
  • What is the formal power series for the sequence s_n = a_1 + 2a_2 + \dots + na_n, s_0 = 0?
  • Express F_1^2 + 2F_2^2 + \dots + nF_n^2 in closed form.
Posted in Notes | Tagged , , , , | Leave a comment

Power Series and Generating Functions (I): Basics

[ Background required: basic combinatorics, including combinations and permutations. Thus, you should know the formulae {n\choose r} = C^n_r = \frac{n!}{r!(n-r)!} and P^n_r = \frac{n!}{(n-r)!} and what they mean. Also, some examples / problems may require calculus. ]

Note: this post is still highly relevant to competition-mathematics. 🙂

To begin, for a given finite series a_0, a_1, \dots, a_n, one can write the polynomial P_a(x) = a_0 + a_1 x + \dots + a_n x^n in x. We shall call this polynomial the generating function for the sequence. This is useful when one needs to evaluate certain identities based on the a_i‘s.

For example, for a fixed positive integer n, consider the series:

a_0 = {n\choose 0}, a_1 = {n\choose 1}, \dots, a_n = {n\choose n} \implies P_a(x) = \sum_{i=0}^n {n\choose i} x^i = (1+x)^n.

The fact that there’s a nice closed form for the generating function allows us to evaluate a large number of identities.

Example 1. Let x = 1/2. We obtain the equality:

{n\choose 0} + \frac 1 2 {n\choose 1} + \frac 1 {2^2} {n\choose 2} + \dots + \frac 1 {2^n} {n\choose n} = (\frac 3 2)^n.

Example 2. We can differentiate the polynomial identity (with respect to x) to obtain:

1 {n\choose 1} + 2 {n\choose 2}x + 3 {n\choose 3} x^2 + \dots + n {n\choose n} x^{n-1} = n(1+x)^{n-1}.

Since we also know the RHS is n( {n-1 \choose 0} + {n-1\choose 1}x + \dots + {n-1\choose n-1}x^{n-1} ) we get the following combinatorial identity by looking at the coefficient of x^{r-1}:

r {n\choose r} = n {n-1 \choose r-1}.

Not entirely impressive I know, but read on.

Example 3. From the identity (1+x)^m \times (1+x)^n = (1+x)^{m+n}, we thus obtain, by looking at the coefficient of x^r on both sides:

{m \choose 0}{n\choose r} + {m\choose 1}{n \choose r-1} + \dots + {m\choose r}{n\choose 0} = {m+n\choose r},

 where {n\choose i} is defined to be zero whenever i is not in the range 0, 1, …, n. In particular, if m = n = r, then using the equality {n\choose i} = {n \choose n-i} we obtain:

{n\choose 0}^2 + {n\choose 1}^2 + \dots + {n\choose n}^2 = {2n \choose n}. 

Exercises.

  1. Find the value of 1^2 {n\choose 1} + 2^2 {n\choose 2} + \dots + n^2 {n\choose n}.
  2. Find the value of
    • {n\choose 0} + {n\choose 2} + {n\choose 4} + \dots;
    • {n\choose 0} + {n\choose 3} + {n\choose 6} + \dots, if you know complex numbers.
  3. Find the value of {n\choose 0} + \frac 1 2 {n\choose 1} + \frac 1 3 {n\choose 2} + \dots + \frac 1 {n+1} {n\choose n}.
  4. For a given 0 \le r \le n, simplify the sum:

{n\choose 0}{n\choose r} - {n\choose 1}{n\choose r-1} + {n\choose 2}{n\choose r-2} - \dots + (-1)^r {n\choose r}{n\choose 0}.

But why stop at finite series? Can we not do this for an infinite series a_0, a_1, a_2, \dots? Answer is yes! We can, at least, define the power series:

P_a(x) = a_0 + a_1 x + a_2 x^2 + \dots

Once again, this is called the generating function of the infinite sequence a_i. Whether this series is meaningful, however, is an entirely different story. There’re two ways to approach this:

  • we explicitly refrain from substituting values into x, and just manipulate the power series;
  • if we insist on substituting values, however, we’d better make sure the series converges.

Let’s look at the second approach first. The following theorem is helpful:

Theorem. For any power series P(x) = \sum_{n=0}^\infty a_n x^n, there is a unique non-negative real number R (or possibly +∞) such that:

  • if |x| < R, then the series converges for x;
  • if |x| > R, then the series does not converge for x.

We call R the radius of convergence for the power series.

The proof of this is another story for another day, so we’ll skip it for now. The theorem, in fact, works for the complex domain as well, i.e. if the a_i‘s and x are complex. Two questions immediate skip to one’s mind:

  • What happens at the boundary case where |x|=R? Answer: it may or may not converge. One can only decide on a case-by-case basis.
  • How do we calculate R? Answer: there’s a nice formula for this:

R = \left(\lim_{n\to\infty} |a_n|^{1/n}\right)^{-1}

if the limit exists. Actually, there’s a nicer formula which replaces lim with limsup – the advantage of this new formula is that the limsup of a sequence is guaranteed to exist. But since that’s beyond the syllabus, let’s stick to the above formula for now.

Example 1. Consider the case a_n = n^n. Then by the above formula R = 0. So the only value of x for which \sum_{n=0}^\infty a_n x^n converges is x = 0.

Example 2. Let a_n = 1. By the above formula R = 1, so the series 1 + x + x^2 + \dots converges for all x < 1. In fact, you should know that this is \frac 1 {1-x}.

Example 3. Let a_n = n. We can apply l’Hôpital’s rule to the above formula to obtain R = 1. In fact, we can differentiate example 2 with respect to x to obtain:

1 + x + x^2 + \dots = \frac 1 {1-x} \implies 1 + 2x + 3x^2 + \dots = \frac 1 {(1-x)^2}.

[ Warning: there’s a little subtlety here involving fine points of infinite sums of infinite sums, which we’ll conveniently sweep under the rug. ]

Example 4. Let F_n be the Fibonacci sequence defined by F_0 = 0, F_1 = 1 and F_{n+1} = F_n + F_{n-1} for n \ge 1. We can calculate the radius of convergence thanks to Binet’s formula:

F_n = \frac 1 {\sqrt 5} \left( \left( \frac{1 + \sqrt 5} 2\right)^n - \left(\frac{1-\sqrt 5} 2\right)^n \right).

Thus we get \lim_{n\to\infty} |F_n|^{1/n} = \frac {1+\sqrt 5} 2. [ Exercise: prove it; hint: use (a^n + b^n)^{1/n} = a (1 + (\frac b a)^n)^{1/n}. ]  So the radius of convergence is

R = \left( \frac {1 + \sqrt 5} 2\right)^{-1} = \frac{\sqrt 5 - 1} 2.

In fact, we can have a nice closed expression for the power series P(x) = F_0 + F_1 x + F_2 x^2 + \dots. Using the identity F_{n+1} = F_n + F_{n-1}, we have:

(1 - x -x^2) P(x) = x + (F_2 - F_1 - F_0)x^2 + (F_3 - F_2 - F_1)x^3 + \dots = x.

Now, both sides are equal within the radius of convergence |x| < \frac{\sqrt 5 - 1}2. Hence, P(x) = \frac x {1 - x - x^2} for all such x. In particular, when x=1/2, we obtain:

F_0 + \frac 1 2 F_1 + \frac 1 4 F_2 + \dots + \frac 1 {2^n} F_n + \dots = 2.

Example 5. Let a_n = \frac 1 {n!}. By bounding the value of n! (e.g. by using integration, as we covered earlier), we get lim_{n\to\infty} (n!)^{1/n} = +\infty. Thus, the radius of convergence is +∞ and the sequence

P(x) = \sum_{n=0}^\infty \frac{x^n}{n!}

is convergent for all x (even complex values, if you care). Now you may or may not recognise this function, but we’ll spend a minute to derive it anyway: letting y = P(x), we have \frac{dy}{dx} = \sum_n \frac{n x^{n-1}}{n!} = y. To solve for y, we invert the derivative to obtain \frac{dx}{dy} = y^{-1} so x = \log(y)+c.  Substituting x=0, y=1 gives us c=0. So in short:

1 + x + \frac{x^2}{2!} + \dots + \frac{x^n}{n!} + \dots = e^x.

Example 6. Take the power series P(x) = x - \frac {x^3} 3 + \frac {x^5} 5 - \frac{x^7} 7 + \dots. Problem arises when we attempt to use the \lim_n |a_n|^{1/n} formula: since the even terms are zero while the odd terms are not, the limit does not exist! However, the main theorem still holds, i.e. the radius of convergence R must exist, regardless of whether the above limit does.

Writing t = x^2, we get P(x) = x (1 - \frac {t}3 + \frac{t^2}5 - \frac{t^3}7 + \dot), we see that the bracketed term converges for |t|<1 and diverges for |t|>1. Hence, P(x) converges for |x|<1 and diverges for |x|>1, so the radius of convergence is 1.

By integrating (1 + x^2)^{-1} = 1 - x^2 + x^4 - x^6 + \dots, we see that this power series is precisely the arc-tangent:

x - \frac {x^3}3 + \frac{x^5} 5 - \frac{x^7}7 + \dots = \tan^{-1} x.

What happens at the boundary of |x|=1? The LHS is 2 (\frac 1 {1\times 3} + \frac 1 {5\times 7} + \frac 1{9\times 11} + \dots) and the bracketed term is less than \sum_{n=1}^\infty \frac 1 {n^2} < \infty. So the power series does converge at x=1 and x=-1, and we get:

1 - \frac 1 3 + \frac 1 5 - \frac 1 7 + \dots = \tan^{-1} 1 = \frac\pi 4.

Exercises

  1. Find the value of 1^2 + \frac 1 2 2^2 + \frac 1 4 3^2 + \dots + \frac 1 {2^{n-1}} n^2 + \dots.
  2. Express the following as multiples of e (the resulting sequence forms the Bell numbers and has a combinatoric interpretation(!) – we will hopefully cover that a few posts later):
    • \frac {1^2}{1!} + \frac {2^2}{2!} + \frac {3^2}{3!} + \frac {4^2}{4!} + \dots;
    • \frac {1^3}{1!} + \frac {2^3}{2!} + \frac {3^3}{3!} + \frac {4^3}{4!} + \dots;
    • \frac {1^4}{1!} + \frac {2^4}{2!} + \frac {3^4}{3!} + \frac {4^4}{4!} + \dots.
  3. Prove that \log 2 = 1 - \frac 1 2 + \frac 1 3 - \frac 1 4 + \frac 1 5 - \dots.
  4. Calculate F_0^2 + \frac 1 2 F_1^2 + \frac 1 4 F_2^2 + \dots + \frac 1 {2^n} F_n^2 + \dots.

Coming up next: formal power series, where we manipulate power series as mere expressions, without substituting any explicit values for x. In particular, we’ll see how Catalan numbers come about.

Posted in Notes | Tagged , , , , , | Leave a comment