Topology: Subspaces

First, suppose (Xd) is a metric space. If Y is a subset of X, then one can restrict the metric to d' = d|_{Y\times Y}:Y\times Y\to\mathbf{R}, i.e. for any y,y'\in Y, we set d’(yy’) := d(yy’). It’s not hard to show that the resulting function is a metric on Y. The resulting pair (Yd’) is called a metric subspace of (Xd).

The open subsets of Y are related to those of X as follows.

Proposition. A subset V\subseteq Y is open if and only if V = U\cap Y for some open U\subseteq X.

Proof.

First note that if a\in Y, then an open ball in Y is of the form:

N_{d'}(a,\epsilon) = \{y\in Y: d'(y,a)<\epsilon\} = \{x\in Y:d(y,a)<\epsilon\} \cap Y = N_d(a,\epsilon) \cap Y.

  • Suppose U is open in X and VU ∩ Y. Each a\in V is also in U, and so N_d(a,\epsilon) \subseteq U for some ε>0. Hence N_{d'}(a,\epsilon) = N_d(a,\epsilon)\cap Y \subseteq U\cap Y = V. So V is open in Y.
  • Conversely, let V be open in Y. For each a in V, we have N_{d'}(a,\epsilon)\subseteq V for some ε>0 depending on a. Take the union U of all such N_d(a,\epsilon); then U is open in X since it’s a union of open balls in X. Also

U\cap Y =\cup_{a\in V}(N_d(a,\epsilon)\cap Y) = \cup_{a\in V} N_{d'}(a, \epsilon)=V.

Exercise

If U is an open ball in X, is U ∩ Y necessarily an open ball in Y?

[ Answer (highlight to read): no, consider X=R and Y=R-{0}. Then (-1, +1) is an open ball N(0, 1) in X but (-1, +1) ∩ Y is the disjoint union of (-1, 0) and (0, +1) which is not an open ball in Y. ]

This inspires us to extend the definition of subspaces to general topological spaces.

Definition. Let (X, T) be a topological space and Y be a subset of X. The subspace topology on Y is defined by:

T_Y =\{U\cap Y : U\in T\},

i.e. V\subseteq Y is open if and only if it’s of the form V = U ∩ Y for some open subset U of X.

Equivalently, the class of closed subsets of Y is given by DY – (UY) = (XU) ∩ YC ∩ Y for some closed subset C of X.

Examples

  1. A subspace of (Xd) with the discrete metric is still discrete.
  2. Pick the half-open interval Y=[0, 1)\subset \mathbf{R}. Then [0, 1/2) = (-1, 1/2)\cap Y is open in Y but not open in R.
  3. Consider the subset Y=(-\infty, -1]\cup\{0\}\cup[1, \infty) of the real line R. The singleton set {0} is an open subset of Y since {0} = N(0, 1/2). Furthermore, it is closed since the complement is a union of two open subsets.
  4. Consider the subset Z of R under the usual metric. Then the resulting subspace is the discrete space even though the induced metric d(mn) = |mn| is not exactly the discrete metric.
  5. Let X=\mathbf{R}^2 and Y be the set of points (xy) satisfying x^2 + y^2 = 1. Geometrically, Y is a circle. Here, we’ll think of it as a topological space with the subspace topology inherited from X. The space is denoted S^1. More generally, for each positive integer n, the space S^n is the subspace of \mathbf{R}^{n+1} comprising of all points (x_1, x_2, \ldots, x_{n+1}) satisfying \sum_{i=1}^{n+1} x_i^2 = 1.
  6. Consider XN under the right order topology.
    1. If Y = {1, 2, 3}, then the subspace topology gives { (empty set), {3}, {2, 3}, Y }.
    2. If Y is the set of even numbers, then the bijection X\to Y, n\mapsto 2n preserves the structure of topological spaces. We say that the two spaces are homeomorphic. We will say more about this in a later article.

Since we’re often dealing with multiple spaces and subspaces, when describing open/closed subsets, it’s essential to qualify with “(XX) is open/closed in (YY)” instead of merely saying “(XX) is open”. E.g. in example 2 above, Y = [0, 1) is a subspace of R and the subset [0, 1/2) is open in Y but not in R.

The following diagram illustrates some open subsets of Y in example 3.

subspace_open

Exercises

  1. Consider the subspace Y = {1/nn positive integer} of R, under the usual metric. Is this space discrete? What about Y^* := Y\cup\{0\}?
  2. Example 2 gives us the concept of clopen subsets, i.e. subsets of a topological space which are both open and closed. In any topological space X, \emptyset and X are always clopen. Are there any other clopen subsets of Q, where Q inherits the subspace topology from R?

Answers

  1. Y is discrete since each {1/n} = (1/n – ε, 1/n + ε) ∩ Y for some small ε>0. On the other hand Y* is not discrete since any open subset containing 0 must also contain some 1/n.
  2. Yes, Q has infinitely (in fact, uncountably) many clopen subsets. If r is irrational, then (-∞, r) ∩ Q and (r, ∞) ∩ Q form a disjoint union of Q by open subsets.

blue-linBasic Properties of Subspaces

The following basic property is often taken for granted.

Theorem. If (X, T) is a topological space and Z\subseteq Y\subseteq X are subsets, then we can form the subspace topology on Z in two ways:

  • by taking the subspace topology T\mapsto T_Z from Z\subseteq X; and
  • by taking successive subspaces T\mapsto T_Y which is a topology on Y, then T_Y\mapsto (T_Y)_Z.

The two topologies are identical.

The proof is straightforward: in the first case, the class of all open subsets of Z is given by U ∩ Z for open subsets U of X; in the second case, the class is given by (U ∩ Y) ∩ ZUZ for open subsets U of X. They’re identical. ♦

The following properties are also surprisingly useful in practice.

Theorem. Let Y be a subspace of (X, T). If Y is open in X, then any open subset of Y is an open subset of X. If Y is closed in X, then any closed subset of Y is a closed subset of X.

Proof.

For the first statement, an open subset of Y is of the form V = U ∩ Y for some open subset U of X. Since U and Y are both open in X, so is V = U ∩ Y. The same proof holds for the second statement by replacing ‘open’ with ‘closed’. ♦

Furthermore, for bases and subbases, we have:

Theorem. Let (X, T) be a topological space with subspace (Y, T_Y).

  • If B is a basis for T, then B_Y :=\{U\cap Y : U\in B\} is a basis for Y.
  • If S is a subbasis for T, then S_Y:=\{U\cap Y: U\in S\} is a subbasis for Y.

basis_of_subspace

Proof.

For the first statement, we first verify that B_Y is indeed a basis of some topology over Y:

  • \cup_{V\in B_Y} V = \cup_{U\in B} (U\cap Y) = \left(\cup_{U\in B} U\right)\cap Y = X\cap Y = Y.
  • Any two elements of B_Y are of the form U_1\cap Y, U_2\cap Y for some basic open subsets U_1, U_2\in B. Since B is a basis, U_1 \cap U_2 = \cup_i V_i for some V_i \in B. This gives (U_1 \cap Y)\cap (U_2\cap Y) = (U_1\cap U_2)\cap Y = \left(\cup_i V_i\right)\cap Y=\cup_i (V_i\cap Y) which is a union of elements in B_Y.

Finally, we need to show B_Y generates the topology T_Y.

  • Every element U\cap Y \in B_Y (for some element U in B) is open in Y by definition. So B_Y\subseteq T_Y.
  • Conversely, any element of T_Y is of the form U\cap Y for some open subset U of X. Since B is a basis for (XT), U=\cup_i V_i for some V_i\in B. So U\cap Y = \cup_i (V_i\cap Y) which is a union of elements in B_Y.

This concludes the proof for the first statement.

For the second, suppose the intersections of finitely many elements of S form basis B. It suffices to show that the intersections of finitely many elements of S_Y form basis B_Y.

  • In one direction, note that S\subseteq B\implies S_Y \subseteq B_Y.
  • Conversely, any element of B_Y is of the form U ∩ Y for some U\in B. So U=V_1\cap V_2 \cap \ldots \cap V_n for finitely many V_i\in S. But this gives U\cap Y = \cap_i (V_i\cap Y) for finitely many V_i\cap Y\in S_Y.

And the proof is complete. ♦

Summary.

  1. Any subset of a topological space X inherits a topology from it. The inheritance is consistent across inclusion chains of topological spaces.
  2. With spaces and subspaces, one should be more careful when talking about “open sets”, i.e. mention what it’s open in.
  3. If Y is open (resp. closed) in X and Z is open (resp. closed) in Y, then Z is open (resp. closed) in X.
  4. If Y is a subspace of X, then a basis (resp. subbasis) of X restrict to give a basis (resp. subbasis) of Y.
Posted in Notes | Tagged , , , , , , , | Leave a comment

Topology: Bases and Subbases

Bases

Recall that though a subring or ideal of a ring may be rather huge, it often suffices to specify just a few elements which will generate the subring or ideal. Likewise, in a topology,  one can specify a few open sets and generate the rest via unions and finite intersections. We’ll expound upon that in what follows.

First, note that when (Xd) is a metric space, a subset U of X is open if and only if it is a union of (possibly infinitely many) open balls. Indeed, since every open ball is open, so is a union of multiple open balls. Conversely, if U is open, each element x\in U is contained in some open ball N(x,\epsilon) \subseteq U for some ε>0 which depends on x; hence U is the union of all these N(x, ε).

In other words, consider B = {N(x, ε) : x in X, ε>0}, the collection of all open balls; a subset of X is open if and only if it is a union of elements of B. Generalising this concept for topological spaces gives us the following definition.

Definition. A basis for a topology (X, T) is a collection of open sets, B\subseteq T such that every open subset U of X is a union U=\cup_i V_i of elements V_i\in B. We shall call an element V\in B a basic open set.

Examples

  1. As we saw above, the set B of open balls in a metric space (Xd) forms a basis of the induced topology.
  2. In particular, the set of open intervals (ab) in R forms a basis.
  3. In the discrete topology, the collection of singleton sets {x} forms a basis. In fact, if you take the collection of open balls {N(x, 1/2)} for the discrete metric, you get the same basis.
  4. Since the metrics dd1 and d on Rn give rise to the same topology, we can pick the set of open balls under any one metric to produce a basis.
  5. Exercise: find all possible bases of N under the right-order topology. [ Answer: there’s no non-trivial basis, i.e. any basis must include all open sets. ]

topology_bases

Our next question is: suppose B\subseteq \mathbf{P}(X) is some collection of subsets of X. Is it always the basis of some topology?

If it were, then clearly the resulting topology would be:

T=\{\cup_i V_i : \{V_i\}\subseteq B\}.

Let’s check if T satisfies the axioms of a topology.

  • Empty set and X : the empty set can be written as a union of an empty collection of V_i, so no problem there; for X, we need to specify that \cup_i V_i = X.
  • Arbitrary union : this is obvious, since the union of sets U_i := \cup_j V_{ij} (where each V_{ij} \in B) is also of the form \cup_{i, j} V_{ij} which lies in T.
  • Finite intersection : write U_1 = \cup_i V_{1i} and U_2 = \cup_j V_{2j}, with each V_{1i}, V_{2j}\in B. Since intersection is distributive over union, we get:

U_1\cap U_2 = (\cup_i V_{1i})\cap (\cup_j V_{2j}) = \cup_{i,j} (V_{1i} \cap V_{2j}).

For this to be in T, a sufficient condition is that V_1\cap V_2\in T for all V_1, V_2\in B. On the other hand, this condition is obviously necessary since if V_1, V_2\in B, we have V_1, V_2\in T\implies V_1\cap V_2 \in T. Thus we have proven:

Theorem. A subset B\subseteq \mathbf{P}(X) is a basis for some topology if and only if:

  • the union of all V\in B is the whole X; and
  • for any V_1, V_2\in B, the intersection V_1 \cap V_2 is a union of elements from B.

Application: Furstenberg’s Proof of the Infinitude of Primes

While he was an undergraduate, Hillel Furstenberg found an innovative proof that there’re infinitely many prime numbers, via the concept of topology.

Theorem. There are infinitely many prime numbers.

Proof (Furstenberg)

Define a topology on set of integers Z as follows. For integers a, b let V(a, b) be the set of all integers am+b for integer m. Then any intersection V(a,b)\cap V(c,d) corresponds to solutions of simultaneous linear congruences x\equiv b\pmod a, x\equiv d\pmod c. From elementary number theory, this intersection is either empty or V(e, f) for some integers e, f (where e = lcm(ac)). Since V(1, 0) = Z, {V(a, b)} forms a basis for some topology T on Z.

Since the complement ZV(a, b) is a union of V(a, b’) for various b’, it is open, i.e. V(a, b) is both open and closed. Now, since any integer other than ±1 is divisible by some prime p, we have:

\mathbf{Z} -\{-1, +1\} = \cup_{p \text{ prime}} V(p, 0).

If there’re finitely many primes, then the RHS is a union of finitely many closed sets and is hence closed. Thus, {-1, +1} is open, which is ridiculous: since every basic open set V(a, b) is infinite, every open set must be infinite too. ♦

blue-lin

Subbases

Let’s fix an underlying set X and consider various topologies on it. If \{T_i\} is a collection of topologies on X, one easily checks that so is T:=\cap_i T_i where a subset of X is open in T if and only if it’s open in all Ti. In short, an intersection of topologies is still a topology; so given any subset S\subseteq \mathbf{P}(X), one can consider the collection of all topologies containing S (this is a non-empty collection since it includes the discrete topology)  and take their intersection.

Definition. Under the above definition, the resulting topology is called the topology generated by S. One also says that S is a subbasis for the topology T.

Put in another way, T is the “smallest” topology on X containing S, in the following sense:

  • S\subseteq T;
  • if T’ is any topology containing S, then T\subseteq T'.

[ This is similar to case where an arbitrary subset of a group or ring can be used to generate a subgroup or subring. ]

Theorem. The topology T generated by subbasis S has, as basis,

B(S) := \{W_1 \cap W_2 \cap \ldots \cap W_k : W_i\in S\} \cup \{X\}.

[ Note: the additional term {X} may be superfluous if one interprets X as being the intersection of no Wi‘s. The more terms we intersect, the smaller the set becomes, so if we intersect no terms at all, we get the universal set X.]

Proof.

  • The intersection of any two elements of B(S) still lies in it, so B(S) is indeed a basis for some topology T’.
  • Since S\subseteq B(S)\subseteq T', we have T \subseteq T' too, since T is the smallest topology containing S.
  • Finally, if V=W_1\cap \ldots \cap W_k \in B(S), for some W_i\in S, then since W_i\in T, we have V\in T as well. Thus, T'\subseteq T. ♦

In general, bases and subbases can be useful tools in topological proofs since to verify a property, it often only suffices to do it on basic open sets, or even subbasic open sets.

Summary. We have:

\text{Subbasis S} \stackrel{\footnotesize \begin{matrix}\text{take}\\ \text{finite} \\ \text{intersections}\end{matrix}}{\implies} \text{Basis }B \stackrel{\footnotesize\begin{matrix}\text{take}\\ \text{unions}\end{matrix}}{\implies} \text{Topology }T.

Examples

  1. On R, the set of intervals of the form (-∞, b) and (a, ∞) forms a subbasis; indeed, the intersection gives (-∞, b) ∩ (a, ∞) = (ab) or the empty set.
  2. If |X|>2, then the collection of two-element subsets {ab} forms a subbasis since if abc are distinct, then {ab} ∩ {bc} = {a}.
  3. For the topology on Z in Furstenberg’s proof, the set of V(ab) for prime power a forms a subbasis by Chinese Remainder Theorem.
Posted in Notes | Tagged , , , , , , , | Leave a comment

Topology: Basic Definitions

Motivation and Definition

While studying analysis, one notices that many important concepts can be defined in terms of “open sets”. One gets the inkling that this concept is critical in forming our notions of continuity, limits etc. In this article, we shall abstractise this further by considering only open sets. At first glance, it does seem rather unlikely that such a general definition could be useful in any manner. However, it will soon become clear that abstractisation has the following advantages:

  • It brings out the essence of proofs of theorems, by highlighting specific properties of an object which are critical to the proofs.
  • The concepts can be applied to vastly different areas of mathematics, including for example, number theory, Galois theory, commuative algebra (in more ways than one!) etc, and have proven to be extraordinarily fruitful in furthering their study.

Enough talk, let’s begin. Let X be any set. As mentioned above, we wish to define a topology on it by stating what constitutes “open sets”.

Definition. Let \mathbf{P}(X) be the power set of X (i.e. set of all subsets of X). A topology on X is a subset T\subseteq \mathbf{P}(X) satisfying:

  • \emptyset, X \in T;
  • if U_1, U_2 \in T, then U_1 \cap U_2\in T;
  • if U_i\in T is a collection of elements of T, then \cup_i U_i\in T.

If U is a subset of X belonging to T, we say U is an open subset of X. If C is a subset of X whose complement X-C belongs to T, we say C is a closed subset of X.

Some observations:

  1. The definition does not preclude the possibility that a subset can be both open and closed, so the terminology is a little unfortunate. For a quick motivation of why closed subsets are of interest, read here.
  2. By repeatedly applying intersection, one notes that if U_1, U_2, \ldots, U_n are open, then so is U_1\cap U_2\cap\ldots\cap U_n. In other words, a topology is closed under an intersection of finitely many terms, but not infinitely many terms in general (see the example of R later, where the intersection of all U_n = (-\frac 1 n, +\frac 1 n) is {0}, which is not open).

Note that we could also define a topology by stating what constitutes “closed subsets” of X. Thus:

  • \emptyset, X are closed subsets of X;
  • if C_1, C_2 are closed subsets of X, then so C_1 \cup C_2 is closed;
  • if C_i is a collection of closed subsets of X, then \cap_i C_i is closed.

Definition. Suppose T_1, T_2 are both topologies on X. If T_1\subseteq T_2, then we say T_2 is a finer topology than T_1 and conversely T_1 is a coarser topology than T_2.

coarse_fine

[ Note: this diagram is meant as an illustration of finer/coarser; it does not represent actual topologies. ]

Examples of Topologies

  1. On any set X, we have the discrete topology TP(X) where every subset is open; this is the finest topology possible for X. Also, there’s the trivial topology T=\{\emptyset, X\}, which is the coarsest topology possible for X.
  2. Let X=\mathbf{R}^n. The standard topology on X was already defined earlier. Recall that U\subseteq X is open if and only if for each x in U, there is an ε>0, such that any y in X satisfying |y – x|<ε must be in U also.
  3. Let X be a set. The cofinite topology is defined via having U\subseteq X open if and only if XU is finite. In other words, the class of closed subsets is precisely the class of finite subsets, from which it’s clear that the cofinite topology satisfies the set of axioms of a topology.
  4. Let N = {1, 2, … } be the set of positive integers. The right order topology on N is where the only non-empty open subsets are U_m = \{m, m+1, m+2, \ldots\} for m=1, 2, 3, … . Note that U_m \cap U_n = U_{\max(m,n)} and \cup_i U_{m_i} = U_{\min(m_i)}; the fact that the order topology is closed under arbitrary union follows from that every subset of N has a minimum.
  5. If \mathbf{N}^* = \mathbf{N}\cup\{\infty\}, where ∞ is just a dummy symbol here, then we can define a topology on N* by decreeing that the only non-empty open subsets are U_m^* = U_m \cup\{\infty\}. Thus, any open subset containing ∞ contains all sufficiently large natural numbers.

right_order_topology_N[ Side note: more generally, one can define the order/right-order and left-order topology on any totally ordered set. Note that the right-order topology on N* differs from the one described in example 5, since {∞} is an open subset in the right-order topology. ]

blue-linMetric Spaces

One common source of topologies is via a metric function, which provides a geometry by indicating the “distance” between any two points in a set.

Definition. A metric on a set X is a function d:X\times X\to\mathbf{R} such that:

  • d(xy) ≥ 0 for any xy in X, with equality if and only if x=y;
  • d(xy) = d(yx) for any xy in X;
  • d(xz) ≤ d(xy) + d(yz) for any xyz in X (triangular inequality).

The pair (X, d) is then called a metric space.

The triangular inequality ensures that the function d behaves like a “distance function”.

Examples

  1. The discrete metric on any X is defined by d(x, y) = 0 if x=y and 1 otherwise.
  2. Let X=\mathbf{R}^n. The Euclidean metric is the standard distance function d(x,y) = \sqrt{\sum_{i=1}^n (x_i - y_i)^2}.
  3. Let X=\mathbf{R}^n again. This time we take the metrics d_1(x,y)=\sum_{i=1}^n |x_i - y_i| and d_\infty(x,y)=\max_{i=1}^n |x_i - y_i|. [ See diagram below. ]
  4. Let p be a prime number and X=Z, the set of integers. Given any integer m let |m|_p = 1/p^r where p^r is the highest power of p dividing m; in the case m=0, since all powers of p divide m we have |m|_p = 0. Then d(x,y) = |x-y|_p is a metric, called the p-adic metric on Z. This was briefly alluded to in a prior post. [ Note: the triangular inequality follows from the fact that if p^r divides both xy and yz, then it divides xz. One way of looking at this metric is that two integers are near if their difference is divisible by a high power of p. ]
    • E.g. suppose p=5. Then d(5, 3)=|5-3|5 = 0, d(57, -18)=|75|5 = 1/25, and d(152, 27) = 1/125.

three_metricsThe following theorem shows that a metric space gives rise to a topology.

Theorem. Let (X, d) be a metric space. For x in X and ε>0, the open ball with radius ε>0 and centre x is defined by

N(x, \epsilon) = \{y\in X: d(x, y)<\epsilon\}.

We say that a subset U\subseteq X is open if for any x in U, there is an open ball N(x,\epsilon)\subseteq U for some ε>0. Then the collection of open sets form a topology.

[ Note: the above diagrams for dd1 and d illustrate the open balls for the respective metrics. ]

Proof.

Clearly, \emptyset and X are both open. Now, suppose U_1, U_2\subseteq X are open (in the sense of metric space). If x\in U_1\cap U_2, then x lies in both U1 and U2. Thus, we can find \epsilon_1, \epsilon_2>0 such that:

N(x, \epsilon_1) \subseteq U_1 and N(x,\epsilon_2)\subseteq U_2.

So N(x, \min(\epsilon_1, \epsilon_2)) \subseteq N(x,\epsilon_1)\cap N(x,\epsilon_2) \subseteq U_1 \cap U_2.

Finally for arbitrary union of open subsets, let U_i\subseteq X be open and U = \cup_i U_i. If x lies in U, then it must lie in some Ui. Thus, for some ε>0, N(x,\epsilon) \subseteq U_i \subseteq U. ♦

For example,

  • the above metric space (\mathbf{R}^n, d) induces the topology on Rn defined earlier;
  • the discrete metric on any set X induces the discrete topology;
  • the open ball N(x, ε) itself is an open subset of X (the proof is an exact replica of the one in an earlier post, by replacing |xy| with d(xy)).

We say that a topology (XT) is metrisable if there’s a metric d on X which induces it. But even if such a d exists, it is usually far from unique. For example, the above metrics d, d1 and d on Rn all induce the same topology as we will show a short while later.

Definition. Two metrics d and d’ on X are said to be topologically equivalent if they induce the same topology. 

Theorem. This happens if and only if:

  • for any x in X and ε>0, there exists δ>0 such that N_{d'}(x,\delta) \subseteq N_d(x, \epsilon);
  • for any x in X and ε>0, there exists δ>0 such that N_{d}(x,\delta) \subseteq N_{d'}(x, \epsilon);

[ Since there’re two metrics here, we use the subscript Nd to denote the open ball induced by d. ]

Proof

Suppose d and d’ induce topologies T and T’ respectively. It suffices to show that the first condition is equivalent to that T’ is finer than T.

For the forward direction, suppose U\in T, i.e. it is open in the metric space (Xd). Then for any x in U, N_d(x, \epsilon)\subseteq U for some ε>0. By the first condition, N_{d'}(x, \delta)\subseteq N_d(x,\epsilon) for some δ>0. Hence it follows that U is open in the metric space (Xd’) too.

For the converse, suppose let x be in U and ε>0. Then since U:=N_d(x,\epsilon) is open in T, it is also open in T’. Hence N_{d'}(x,\delta)\subseteq U for some δ>0. ♦

For example, the following diagram shows that d1 and d are topologically equivalent on Rn.

topo_equivConclusion: every metric space induces a topology and many different metrics on the same space can induce the same topology. We’ll see later that some topologies are inherently non-metrisable.

Posted in Notes | Tagged , , , , , , , | Leave a comment

Burnside’s Lemma and Polya Enumeration Theorem (2)

[ Acknowledgement: all the tedious algebraic expansions in this article were performed by wolframalpha. ]

Counting Graphs

One of the most surprising applications of Burnside’s lemma and Polya enumeration theorem is in counting the number of graphs up to isomorphism.

Problem. Find the number of simple graphs with n vertices, up to isomorphism.

Let V be a set of n vertices, and E be the set of all 2-element subsets of V. E.g. for n=5:

  • V = {abcde};
  • E = {{ab}, {ac}, {ad}, {ae}, {bc}, {bd}, {be}, {cd}, {ce}, {de}}.

The full symmetric group G=S(V) \cong S_n then acts on E. Each graph corresponds to a colouring of E by Y = {black, white} and two graphs are isomorphic if and only if they’re isotopic under G – i.e. belong to the same orbit under G. Our objective is to compute the number of unique colourings by applying Polya enumeration theorem.

Clearly, c(g) – the number of orbits of g acting on E – only depends on the equivalence class of g in G; in particular for n=5, the number of cases is reduced from 5!=120 to 7.

  • g=e : c(g) = C(5, 2) = 10;
  • [10×] g=(ab): edges ac and bc must share the same colour, as do {adbd}, {aebe}, so c(g) = 7;
  • [20×] g=(abc): edges adbd and cd must have the same colour, as do {aebece} and {abbcac}, so c(g) = 4;
  • [30×] g=(abcd): edges aebece and de have the same colour, as do {abbccdda} and {acbd}, so c(g) = 3;
  • [24×] g=(abcde): we get two orbits {abbccddeae} and {acbdceadbe} so c(g)=2;
  • [15×] g=(ab)(cd): we get {aebe}, {cede}, {acbd}, {adbc}, {ab}, {cd} so c(g)=6;
  • [20×] g=(ab)(cde): we get {ab}, {cddece} and all the remaining edges form a single orbit so c(g)=3.

Hence the number of graphs with 5 vertices, up to isomorphism, is:

\frac 1 {120} (1\times 2^{10} + 10\times 2^7 + 20\times 2^4 + 30\times 2^3 + 24\times 2^2 + 15\times 2^6 + 20\times 2^3) = 34.

The following diagram lists all isomorphism classes of simple graphs with 5 vertices and up to 5 edges.

many_graphs

Counting Graphs II

In fact, with just a bit more effort, we can compute the number of graphs with 5 vertices and e edges, for each 0 ≤ e ≤ 10. Recall that each graph (ignoring symmetry) is simply a map E → {black, white}. Let X be the set of all such graphs; the group G\cong S_5 acts on X by permuting the edges around.

Now, we assign a weight to each colour: black→1 and white→0. Each graph E → {black, white} then gets a weight which is its number of edges. Let Xi be the set of elements in X with weight i; it’s easy to see that |X_i| = C(10, i).

Our objective is to compute n_i := |G\backslash X_i|, the number of isomorphic classes of simple graphs with 5 vertices and i edges.

To that end, define the polynomial P(t) := \sum_{i=0}^{10} n_i t^i. Burnside’s lemma gives:

P(t) =\sum_i |G\backslash X_i| t^i = \frac 1 {|G|} \sum_i \sum_{g\in G} |X_i^g|t^i = \frac 1 {|G|}\sum_{g\in G}\left(\sum_i |X_i^g|t^i\right).

Thus, we need to evaluate the power series P_g(t) = \sum_i |X_i^g| t^i for each g.

  • g=e : since X_i^g = X_i, this is just P_g(t) = (1+t)^{10}.
  • g=(ab) : as worked out earlier, the non-trivial orbits are {acbc}, {adbd} and {aebe}. For each of these three orbits, the weight is either 0 or 2, so we get a factor of (1+t^2)^3. For the remaining 4 singleton orbits, we get a factor of (1+t)^4, so the summand is P_g(t) = (1+t^2)^3 (1+t)^4.
  • g=(abc): the orbits are {de}, {adbdcd}, {aebece} and {abbcac}. Each of the three 3-element sets contributes a factor of (1+t^3) while the singleton set contributes (1+t), hence the summand is (1+t^3)^3(1+t).

Eventually, we get all the seven power series.

group_table_v2Hence, P(t) equals:

\begin{aligned}\frac 1 {120}(&(1+t)^{10}+10(1+t^2)^3(1+t)^4 + 20(1+t^3)^3(1+t) + 30(1+t^4)^2(1+t^2)\\&+24(1+t^5)^2 + 15(1+t^2)^4(1+t)^2+20(1+t)(1+t^3)(1+t^6)),\end{aligned}

which gives 1+t+2t^2 + 4t^3+6t^4+6t^5+6t^6 + 4t^7+2t^8+t^9+t^{10}. Thus, there are 1 graph with 1 edge, 2 graphs with 2 edges, 4 graphs with 3 edges, … up to isomorphism.

blue-lin

Polya Enumeration Theorem Revisited

Inspired by the above example, we define:

Definition. Let G be a finite group acting on finite set X of size k. Then the cycle index is the polynomial:

Z_G(t_1, t_2, \ldots, t_k) := \frac 1{|G|}\sum_{g\in G} t_1^{j_1(g)} t_2^{j_2(g)} \ldots t_k^{j_k(g)},

where j_c(g) is the number of cycles (or orbits) in g of length c.

Examples.

  1. Consider S3 acting on {1, 2, 3}. Then each term gives:
    • g=e : t_1^3;
    • [3×] g=(a, b): t_1 t_2;
    • [2×] g=(a, b, c): t_3, so we have Z_G(t_1, t_2, t_3) = \frac 1 6(t_1^3 + 3t_1 t_2 + 2t_3).
  2. Consider G\cong S_5 acting on the set of edges E earlier. We have:

Z_G(t_1, t_2, \ldots, t_{10}) = \frac 1 {120}(t_1^{10}+10t_1^4 t_2^3 + 20 t_1 t_3^3 + 30t_2 t_4^2 + 24t_5^2 + 15t_1 t_2^4 + 20t_1 t_3 t_6).

Now we’re ready to describe a more general form of Polya enumeration theorem. As before, G acts on set X, and we consider the set of all colourings X→Y. But now each colour is endowed with a weight. If we let X_i be the set of colourings with weight i, then our objective is to compute n_i := |G\backslash X_i| via the power series

P(t) := \sum_{i\ge 0} n_i t^i.

As worked out before, to do that we need to compute:

P_g(t) := \sum_{i\ge 0} |X_i^g| t^i, for each g in G.

Now if we write g as a product of cycles: g = (cycle 1)(cycle 2) … , where there are j_c(g) cycles of length c, then each cycle must be of uniform colour! In other words, each cycle contributes a term of (f_0+f_1 t^c + f_2 t^{2c} + f_3 t^{3c}+ \ldots) where f_w is the number of colours of weight w. Hence:

\begin{aligned}P_g(t) = &(f_0+f_1 t + f_2 t^2 +\ldots)^{j_1(g)} (f_0+f_1 t^2 + f_2 t^4+\ldots)^{j_2(g)}\\ &(f_0+f_1 t^3+f_2 t^6+\ldots)^{j_3(g)} \ldots\end{aligned}

Writing f(x) = f_0 + f_1 x + f_2 x^2 + \ldots, this gives: P_g(t) = f(t)^{j_1(g)} f(t^2)^{j_2(g)} f(t^3)^{j_3(g)}\ldots. Summing up:

P(t) = \frac 1 {|G|} \sum_{g\in G} P_g(t) = Z_G(f(t), f(t^2), f(t^3), \ldots).

Let’s summarise all we’ve learnt.

Polya Enumeration Theorem B. Suppose we have f_w colours of weight w and f(x) := \sum_w f_w x^w. Then the number of colourings with a specified weight is given by the generating function:

P(t) =Z_G(f(t), f(t^2), f(t^3), \ldots).

E.g. in our earlier example of graph counting, we have f(x) = 1+x so the resulting generating function is P(t) = Z_G(1+t, 1+t^2, 1+t^3, \ldots) as we saw above.

Example 1

On a 2k × 2k chessboard, each square is to be coloured black or white. Two colourings are identical if they can be obtained from each other by rotation or reflection. Find the generating function for the number of colourings with a specified number of black squares.

Solution

Let’s consider the symmetries of a square.

colour_symmetries_2

In the first case, we get k2 cycles of length 4 each. In the second and third cases, we get 2k2 cycles of length 2 each. In the final case, we get 2k cycles of length 1 and k(2k-1) cycles of length 2. Hence:

Z_G(t_1, t_2, \ldots) = \frac 1 8 (t_1^{4k^2} + 2t_1^{2k}t_2^{k(2k-1)} + 3t_2^{2k^2} + 2t_4^{k^2}).

The final answer is thus:

\begin{aligned}f(t) &= Z_G(1+t, 1+t^2, \ldots)\\&= \frac 1 8 ((1+t)^{4k^2} + 2(1+t)^{2k}(1+t^2)^{k(2k-1)} + 3(1+t^2)^{2k^2} + 2(1+t^4)^{k^2}).\end{aligned}

Here are some small examples.

  • k=1: \begin{aligned} f(t) = t^4+t^3+2t^2+t+1\end{aligned};
  • k=2: \begin{aligned} f(t)=&t^{16}+3t^{15}+21t^{14}+77t^{13}+252t^{12} +567t^{11}+1051 t^{10}+1465 t^9\\&+1674 t^8+1465 t^7+1051 t^6+567 t^5+252 t^4+77 t^3+21 t^2+3 t+1.\end{aligned}

blue-linMore generally, we can consider colours with vectorial weights. For example, if each colour has weight (vw), then we write:

f(x,y) = \sum_{v,w} f_{v,w} x^v y^w

where f_{v,w} denotes the number of colours of weight (vw). Then Polya’s enumeration theorem can be further generalised: the number of colourings with a specified vectorial weight (vw) is given by the coefficient of t^v u^w in:

P(t, u) = Z_G(f(t,u), f(t^2, u^2), f(t^3, u^3), \ldots).

Example 2

Find the number of ways to colour the 12 edges of a cube with {red, green blue} such that each colour occurs on an even number of edges. Two colourings are considered identical if they’re obtained from each other via a rotation.

Solution

Label the 12 edges of a cube as follows, together with three axes of rotation.

cube_labels

Let’s compute the orbits of the various rotations. There should be 24 rotations in all.

  • Identity: 12 singleton orbits, {1}, {2}, …, {12}.
  • [6×] Rotation about axis 1 (90 deg): {1, 2, 3, 4}, {5, 6, 7, 8}, {9, 10, 11, 12}.
  • [3×] Rotation about axis 1 (180 deg): {1, 3}, {2, 4}, {5, 7}, {6, 8}, {9, 11}, {10, 12}.
  • [6×] Rotation about axis 2: {5}, {7}, and 5 orbits of size 2.
  • [8×] Rotation about axis 3: 4 orbits of size 3.

Hence the cycle index is:

Z_G(t_1, t_2, \ldots) = \frac 1 {24}(t_1^{12}+6t_4^3 + 3t_2^6+6 t_1^2 t_2^5 + 8t_3^4).

[ Note: without tracing the exact orbits, we can conclude that the rotation about axis 2 has exactly five orbits of size two. Indeed, since the rotation has order 2, each cycle size is either 1 or 2. The only fixed edges are 5 and 7 so the remaining edges must be paired into five orbits of size two each. ]

We need to compute the number of colourings where green and blue each occurs an even number of times, so assign weight to each colour: red=(0, 0), green=(1, 0) and blue=(0, 1). The weight polynomial is f(xy) = 1+x+y so the resulting generating function is:

\begin{aligned} P(t, u)= &\frac 1 {24}[(1+t+u)^{12} + 6(1+t^4+u^4)^3 + 3(1+t^2+u^2)^6\\&+6(1+t+u)^2 (1+t^2+u^2)^5 + 8(1+t^3+u^3)^4].\end{aligned}

To sum up the coefficients of t^i u^j where i and j are both even, compute the sum

\frac 1 4(P(1, 1)+P(1,-1)+P(-1,1)+P(-1,-1))=5823. ♦

blue-linThe Case of Full Symmetric Group

Suppose |X|=k and |Y|=n and G is the full symmetric group S(X) \cong S_k. In the previous article, considering this group led us to the formulation of Stirling numbers. Let’s consider the case of the cycle index:

Z_G(t_1, t_2, \ldots) = \sum_{g\in S_k} t_1^{j_1(g)} t_2^{j_2(g)}\ldots.

What can we say about this power series?

To answer them, let’s assign some weights to the n colours. To distinguish all the n distinct colours, we’ll assign an (n-component) vectorial weight:

colour 1 ↔ (1, 0, …, 0),   colour 2 ↔ (0, 1, …, 0),   … …,  colour n ↔ (0, 0, …, 1).

Hence, the weight of a colouring tells us the number of occurrences of each colour. The corresponding weight polynomial is then a linear function in n variables:

f(x_1, x_2, \ldots, x_n) = x_1 + x_2 + \ldots + x_n.

Applying the multivariate version of the Polya enumeration theorem, we see that the number of colourings of a specified weight (w_1, w_2, \ldots, w_n) is given by the generating function:

Z_G(f(u_1, u_2, \ldots, u_n), f(u_1^2, u_2^2, \ldots, u_n^2), \ldots) = Z_G(p_1, p_2, p_3, \ldots),

where p_i = u_1^i + u_2^i + \ldots + u_n^i is the i-th power symmetric function. The question is: how many colourings are of weight (w_1, w_2, \ldots, w_n), up to symmetry? Since we have the full symmetric group acting on X, the answer is obvious: 1 if \sum_i w_i = k and 0 otherwise. Now if we multiply this by an auxiliary t^k and sum over all k=|X|, we get the resulting power series:

\begin{aligned}\sum_{k=0}^\infty Z_{S_k}(p_1, p_2, p_3, \ldots) t^k &= \sum_{w_1, \ldots,w_n\ge 0} u_1^{w_1} u_2^{w_2}\ldots u_n^{w_n}\cdot t^{\sum w_i}\\&= \frac 1 {1-u_1t}\cdot \frac 1{1-u_2t}\cdot\dots \cdot \frac 1 {1-u_n t}.\end{aligned}

Writing each term

(1-u_i t)^{-1} = \exp(\log(1-u_i t)) = \exp\left(u_i t + \frac {u_i^2 t^2} 2 + \frac{u_i^3 t^3} 3 + \ldots\right),

we get the final result:

\begin{aligned}\sum_{k=0}^\infty Z_{S_k}(p_1, p_2, \ldots)t^k = \exp\left(p_1 t + \frac {p_2} 2 t^2 + \frac {p_3} 3 t^3 + \ldots\right).\end{aligned}

Example

To compute the cycle index for G=S_5, let T = p_1 t + \frac {p_2} 2 t^2 + \frac{p_3}3 t^3 + \ldots. We need to compute the expansion of exp(T) until T5 and find the coefficient of t5.

  • T : p_5/5;
  • T^2 : p_1 p_4/2 + p_2 p_3/3;
  • T^3 : p_1^2 p_3 + 3p_1 p_2^2/4;
  • T^4 : 2p_1^3 p_2;
  • T^5 : p_1^5.

Hence, the final sum is

\begin{aligned}&\frac{p_5}5 + \frac 1 {2!}\left(\frac{p_1 p_4}2 + \frac{p_2 p_3}3\right) + \frac 1 {3!}\left(p_1^2 p_3 + \frac {3 p_1 p_2^2}4\right) + \frac 1 {4!}\left(2p_1^3 p_2\right) + \frac 1 {5!}p_1^5 \\= &\frac 1 {120}\left(24p_5 + 30p_1p_4 + 20p_2p_3 + 20p_1^2 p_3 + 15 p_1 p_2^2 +10p_1^3 p_2 + p_1^5\right).\end{aligned}

In conclusion, in the group S_5, there are:

  • 1 identity element,
  • 24 elements of type (abcde),
  • 30 elements of type (abcd),
  • 20 elements of type (ab)(cde),
  • 20 elements of type (abc),
  • 15 elements of type (ab)(cd), and
  • 10 elements of type (ab).

All these were obtained via generating functions.

Posted in Notes | Tagged , , , , , , , , | Leave a comment

Burnside’s Lemma and Polya Enumeration Theorem (1)

[ Note: this article assumes you know some rudimentary theory of group actions. ]

Let’s consider the following combinatorial problem.

Problem. ABC is a given equilateral triangle. We wish to colour each of the three vertices A, B and C by one out of n possible colours. Furthermore, two colourings are considered identical if we can obtain one from the other by rotating or reflecting the triangle. How many distinct colourings are there?

For example, all the following colourings are considered identical.

colour_symmetries

Note that the symmetries comprise of three rotations in the top row and three reflections in the bottom row. More generally, the group of symmetries of a regular n-gon comprises of n rotations – including the identity rotation – as well as n reflections; this is called the dihedral group of order n.

For our case, the dihedral group D3 is isomorphic to the full symmetric group S3. Relabeling the vertices AB and C as 1, 2 and 3, the six symmetries above are respectively given by 1, (1, 3, 2), (1, 2, 3), (2, 3), (1, 3) and (1, 2).

Without symmetries, we clearly have n^3 colourings since each vertex can take n colours independently. Letting X denote the set of these colourings, the group S3 acts on X by permuting the vertices. Our objective is now to compute |S_3\backslash X|, the number of orbits under this group action. To that end, the following result is phenomenally useful.

Burnside’s Lemma. Let G be a finite group acting on a finite set X. Then:

|G\backslash X| = \frac 1 {|G|}\sum_{g\in G} |X^g|,

where X^g=\{x \in X: gx = x\} is the set of all elements of X fixed by g.

Proof.

Consider the set S = \{(g,x)\in G\times X: gx = x\}. We’ll compute |S| in two ways. Summing over g, we get |S| = \sum_{g\in G} X^g, the RHS.

Summing over x, we get |S| = \sum_{x\in X} |G_x|, where G_x is the isotropy group of x and |G_x| = |G|/|Gx|, where Gx is the orbit of x. Hence,

|S| = \sum_{x\in X}\frac{|G|}{|Gx|} = |G|\sum_{x\in X} \frac 1 {|Gx|} = |G|\times |G\backslash X|,

where the last equality follows from the fact that since each element x contributes 1/(orbit size) to the sum, the grand total is just the number of orbits. Matching the two expressions, we’re done. ♦

Solution to Our Problem.

To compute |S_3\backslash X|, let’s compute |X^g| for each element g\in S_3:

  • g=e : all colourings are fixed, so |X^g| = |X| = n^3;
  • g=(1, 2): a colouring lies in Xg if and only if vertices 1 and 2 have the same colour, this gives |X^g| = n^2;
  • g=(2, 3), (1, 3): similarly, |X^g| = n^2;
  • g=(1, 2, 3), (1, 3, 2): a colouring lies in Xg if and only if all vertices have the same colour; thus |X^g| = n.

Hence, Burnside’s lemma tells us the answer to our problem is:

\frac 1 6 (n^3 + 3n^2 + 2n) = \frac {n(n+1)(n+2)}6. ♦

Exercise.

Find the number of colourings of the vertices of a square ABCD by n colours, where two colourings are identical if they can be obtained from each other by rotation or reflection.  What about a regular pentagon ABCDE? Or a solid tetrahedron ABCD? Or a solid tetrahedron excluding reflections?

Answer (highlight to read)

  • For square: n(n+1)(n2+n+2)/8.
  • For pentagon: n(n2+1)(n2+4)/10.
  • For tetrahedron with reflections: n(n+1)(n+2)(n+3)/24.
  • For tetrahedron without reflections: n2(n2+11)/12.

blue-lin

Polya Enumeration Theorem

A large class of problems can be classified under the case of “colouring problems“. Namely, suppose finite group G acts on a finite set of points X. If Y denotes a finite set of colours, the set of colourings is denoted by S = Y^X, which is the set of all functions X→Y. The action of G on X extends to an action on S as follows:

If g\in G and f:X\to Y is a colouring, then the action is given by the composition

g\cdot f = f\circ g^{-1}:X\to Y.

[ The inverse is necessary so that (g_1g_2)\cdot f = g_1\cdot(g_2\cdot f). ]

Burnside’s lemma then tells us that |G\backslash S| = \frac 1{|G|} \sum_{g\in G}|S^g|. But |S^g| is not hard to compute: if we express the action of g on X as a permutation, and c(g) denotes its number of cycles (or orbits), then |S^g| = |Y|^{c(g)} since the points in each cycle must share the same colour. This gives:

|G\backslash S| = \frac 1{|G|} \sum_{g\in G} n^{c(g)}, where n = |Y|.

In summary…

Polya Enumeration Theorem A. For a finite group G acting on a finite set X, the number of colourings X→Y up to symmetry (two colourings are identical if they can be obtained from each other via G-action) is given by:

\frac 1{|G|} \sum_{g\in G} |Y|^{c(g)},

which is a polynomial in |Y|.

For example, in the case where X = {1, 2, 3} in our first problem. Then:

  • c(e) = 3;
  • c((1,2)) = c((2,3)) = c((1,3)) = 2;
  • c((1,2,3)) = c((1,3,2)) = 1.

And we obtain \frac 1 6 (n^3 + 3n^2 + 2n) as above. Let’s consider more examples.

Example 1

On a 2k × 2k chessboard, each square is coloured with one out of n possible colours. Two colourings are identical if they can be obtained from each other by reflecting or rotating the chessboard. Find the number of colourings.

Solution

Let’s consider the 8 symmetries of the square:

colour_symmetries_2

The grey area shows the orbit representatives when we let each symmetry act on the 4n2 squares. From the above diagram and Polya enumeration theorem, we get:

number of unique colourings = \frac 1 8 (1\times n^{4k^2} + 2\times n^{k^2} + 3\times n^{2k^2} + 2\times n^{k(2k+1)}).

Notice that when k=1, this simplifies to \frac 1 8(n^4 + 2n^3 + 3n^2 + 2n) which is precisely the number of ways to colour the four vertices of a square with n colours, up to rotational/reflection symmetry. ♦

Example 2

Consider a 3 × 3 chessboard. We wish to colour each of its 9 squares with one out of n colours. Two colourings are identical if we can obtain one from the other by first permuting the rows and then permuting the columns. How many unique colourings are there?

colour_symmetries_3

Solution

First note that we have G=S3 acting on the set of 9 squares in two distinct ways. If S denotes the set of columns and T denotes the set of rows, then the whole board is simply S × T, with G × G acting on it component wise.

For each (g, h)\in G\times G, consider the number of orbits c((gh)). This only depends on the conjugancy classes of g and h. For example:

  • g=(1, 2), h=(1, 3): the orbits of (gh) are given by {(1, 1), (2, 3)}, {(1, 2), (2, 2)}{(1, 3), (2, 1)}, {(3, 1), (3, 3)}, and {(3, 2)};
  • g=(1, 2, 3), h=(1, 2, 3): the orbits of (gh) are {(1, 1), (2, 2), (3, 3)}, {(1, 2), (2, 3), (3, 1)} and {(1, 3), (2, 1), (3, 2)}.

This gives us the following table:

group_tableso the solution is \frac 1 {36} (n^9 + 6n^6 + 9n^5+8n^3 + 12n^2). ♦

blue-lin

Stirling Numbers of the First Kind

Suppose the full symmetric group GS(X) acts on X, and |X|=k, |Y|=n. Recall that Polya theorem gives the number of colourings as a polynomial in n (depending on k):

\frac 1 {k!}\sum_{g\in S_k} n^{c(g)},

where the coefficient of n^c is the number of elements g\in S_k with exactly c orbits.

On the other hand, we can simplify the expression by elementary combinatorics! Indeed, since we can permute the elements of X freely, the colouring is uniquely determined by the number of occurrences of each colour. To put it explicitly, denote the colours by 1, 2, …, n. Then we can permute the elements of X so that the colours occur in increasing order: (1, 1, …, 1, 2, 2, …, 2, 3, 3,… ).

Hence each colouring corresponds to a solution to the linear Diophantine equation:

x_1 + x_2 + x_3 + \ldots + x_n = k,

where each x_i\ge 0 is the number of occurrences of colour i. Elementary combinatorics then tell us the number of solutions is:

{n+k-1\choose k} = \frac 1 {k!} (n+k-1)(n+k-2)\ldots(n+1)n.

Equating the two expressions, we obtain:

Theorem. The number of elements of g\in S_k with exactly c orbits is given by the coefficient of x^c in the polynomial:

x(x+1)(x+2)\ldots (x+k-1).

This number is denoted by \left[\begin{matrix}k\\c\end{matrix}\right] and is called a Stirling number of the first type.

For example, for k=5, we get:

x(x+1)(x+2)(x+3)(x+4) = x^5 + 10x^4 + 35x^3 + 50x^2 + 24x,

so there are 35 elements of S_5 with exactly 3 cycles (or orbits). These are of the form:

  • (a)(b)(cde) : C(5, 3) × 2 = 20;
  • (a)(bc)(de) : C(5, 2) × C(3, 2) / 2 = 15.

Exercises

  1. Find the number of ways of colouring the 8 vertices of a cube with n colours, where two colourings are identical if one can be obtained from the other via rotation. [ Hint: there are 24 rotations of a cube. ]
  2. Repeat example 1 above for a (2k+1) × (2k+1) chessboard, with n colours.
  3. Find the number of ways to place k chairs around a table if there are n types of chairs available (each type of unlimited quantity). Two placings are identical if they can be obtained from each other via a rotation.
  4. On a 8 × 8 chessboard, each square is coloured black or white such that every row and every column have an even number of black squares. Two colourings are identical if they can be obtained from each other by reflecting / rotating the square. Find the number of unique colourings.
  5. Prove the following identities involving Stirling numbers of the first kind.
    1. \left[\begin{matrix}n+1\\k\end{matrix}\right] = n\left[\begin{matrix}n\\k\end{matrix}\right] + \left[\begin{matrix}n\\k-1\end{matrix}\right]
    2. \left[\begin{matrix}n\\n-2\end{matrix}\right] = \frac 1 4(3n-1){n\choose 3}
    3. \left[\begin{matrix}n\\2\end{matrix}\right] = (n-1)! (1 + \frac 1 2 + \frac 1 3 + \dots + \frac 1 {n-1}).

Answers (Highlight to Read)

  1. Consider the rotations of a cube through an axis, with the corresponding terms in Polya formula:
    • identity: c = 8;
    • axis passes through a face (90 deg): c = 2 for 3×2 = 6 cases;
    • axis passes through a face (180 deg): c = 4 for 3 cases;
    • axis passes through an edge: c = 4 for 6 cases;
    • axis passes through a vertex: c = 4 for 4×2 = 8 cases.
    • Thus, answer = (n8 + 17n4 + 6n2)/24.
  2. The terms in Polya formula:
    • rotation of 90 deg: = k2k + 1;
    • rotation of 180 deg: c = 2k2 + 2k + 1;
    • reflection about horizontal/vertical axis (2×): c = (k+1)(2k+1);
    • reflection about diagonal (2×): c = (k+1)(2k+1).
    • Thus, answer = (n^(4k2 +4k + 1) + 2n^(k2 + k + 1) + n^(2k2 + 2k + 1) + 4n^((k+1)(2k+1)))/8.
  3. For each rotation of t steps, t = 0, …, k-1, we have c(t) = gcd(tk), where gcd(0, k) = k. On the other hand, for each d|k, there are exactly φ(k/d) numbers whose gcd with k is exactly d, where φ is the Euler totient function. Thus, the final sum is (1/k) ∑d|k φ(k/d)nd. Note that when k is prime, we get (nk + (k-1)n)/k, so Fermat’s Little Theorem occurs as a corollary.
  4. We can’t use Polya theorem directly, so let’s revert to Burnside’s lemma. The full set of colourings, excluding symmetry, is 249, by considering linear algebra over Z/2.
    • Symmetry under ±90 deg rotation: we pick the top-left 4×4 square as a set of representatives. However, we have 3 linear relations (a priori 4, but 1 is superfluous), so total number = 213.
    • Symmetry under 180 deg rotation: pick the top-half 4×8 rectangle as a set of representatives. With 7 linear relations (a priori 8, but 1 is superfluous), this gives 225.
    • Symmetry under reflection about horizontal axis: pick top half rectangle as representatives. With 4 linear relations, this gives 228. Same with vertical.
    • Symmetry under reflection about diagonal axis: pick the 36 squares on or above the main diagonal. With 8 linear relations, this gives 228.
    • Final answer = (249 + 2 × 213 + 225 + 4 × 228)/8.
Posted in Notes | Tagged , , , , , , , | Leave a comment

Basic Analysis: Closed Subsets and Uniform Continuity

Let’s consider another question: suppose fD → R is continuous, where D is a subset of R. If (xn) is a sequence in D converging to some real L, is it true that (f(xn)) is also convergent? Now if L is in D, then we know that (f(xn)) → (f(L)). But what if it’s not?

Examples

  1. Let f(x) = 1/x, defined on (0, ∞). Take the sequence x= 1/n which converges to 0. Then the resulting sequence (f(xn)) = (n) diverges.
  2. Let f(x) = sin(1/x), defined on (0, ∞). Then the sequence x= 2/((2n-1)π) converges to 0 but the resulting (f(xn)) = (1, -1, 1, -1, … ) diverges.

Closed Subsets

Let’s first consider the case where the limit L must lie in D. To that end, we have the following result.

Proposition. For an arbitrary subset D\subseteq \mathbf{R}, the following are equivalent.

  1. If (xn) is a sequence in D converging to L, then L is in D.
  2. D contains all its points of accumulations.
  3. The complement Dc = R-D is an open subset of R.

Proof.

(1)→(2) : Let x be a point of accumulation of D. Hence for each ε = 1, 1/2, 1/3, …, there exists x_n\in D, such that 0 < |xnx| < 1/n for each n. Then (xn)→x must be in L.

(2)→(3) : Let a\in D^c. Since a is not in D it’s not a point of accumulation of D. By definition, this means that there exists ε>0 such that the set of x satisfying 0 < |xa| < ε does not intersect D. Together with the fact that a is not in D, we see that N(a,\epsilon)\subseteq D^c, so Dc is open.

(3)→(1) : Let (xn)→L. If L\in D^c, an open subset of R, there must be an open ball N(L,\epsilon)\subseteq D^c. On the other hand, since (xn)→L, some x_n\in D must lie within N(L, ε), which is a contradiction. ♦

Definition. We say that a subset U\subset D is a closed subset if D-U is an open subset of D.

Warning. A subset of D can be both open and closed. E.g. the empty set (or the entire set D) is so.

Useful note: in particular, a closed and bounded subset D of R must have a maximum and minimum. Indeed, since D is bounded, let M be its supremum; we need to show that M lies in D. Suppose it doesn’t. For any positive ε, M-ε is not an upper bound of D, so there exists an element x\in Dx > M-ε and x < M. This shows that M is a point of accumulation of D, and by the above proposition, it must lie in D, which contradicts our assumption. Thus M is a maximum of D. The case for minimum is similar.

Examples

  1. Consider the set D = [1, 2]\cup \{3\}. It is closed in R since the complement is (-\infty, 1)\cup (2, 3)\cup (3,\infty) which is a union of open intervals. Clearly D is bounded. Being both closed and bounded, it has a minimum and a maximum (in this case, 1 and 3 respectively).
  2. Consider the set D = [1, 2). It is not closed because its complement is E=(-\infty, 1)\cup [2, \infty) and the point 2 has no open neighbourhood of R which is wholly contained in E. Or, look at it another way, if it were closed,  then since it’s obviously bounded it would contain a maximum and a minimum, but 2 is the supremum of D which is not contained in D itself.
  3. Consider the set D = \{0\} \cup [1, \infty). The complement is (-\infty, 0)\cup (0, 1) which is a union of open intervals and is hence open. But it’s not upper-bounded so it doesn’t have a maximum.

From the above proposition, if D is a closed subset of R and fD → R is continuous, then any convergent sequence in D gets mapped to a convergent sequence.

Important Exercises

  1. Prove that fD → R is continuous if and only if for each closed subset C of R, the set f^{-1}(C) is closed in D.
  2. Find a continuous function f : R → R and a closed subset C of R such that f(C) is not closed in R.
  3. Prove that a union of finitely many closed subsets of D is still closed in D.
  4. Prove that an intersection of (possibly infinitely many) closed subsets of D is still closed in D.
  5. Suppose E\subseteq D.
    1. Prove that if C is closed in D, then B := C ∩ E is closed in E.
    2. Prove that conversely, if B is closed in E, then BC ∩ E for some closed subset C of D.

Answers [Highlight to read. Compare with the earlier case of open subsets. ]

  1. If C is closed in R, then D – f-1(C) = f-1(D – C) is open in D iff f-1(C) is closed in D. Then use the fact that f is continuous iff for each open subset U of Rf-1(U) is open in D.
  2. Let f be defined by f(x)=1/x for x>1 and f(x)=1 for x≤1. The set C=[1, ∞) is closed in but f(C) = (0, 1] is not closed in R.
  3. Follows from the fact that the intersection of finitely many open subsets of D is also open in D.
  4. Follows from the fact that the union of arbitrarily many open subsets of D is also open in D.
  5. For (A), if C is closed in D, then DC is open in D and EBE – (C ∩ E) = (DC) ∩ E is open in E by definition. Thus B is closed in E. For (B), since EB is open in E, we can write EBU ∩ E for some open subset U of D. Then BE – (U ∩ E) = (DU) ∩ E where DU is closed in D.

Uniform Continuity

Now let’s consider the case where D is not closed and (xn) approaches an L not in D.

Definition. The function f:D\to\mathbf{R} is uniformly continuous (on D) if:

  • for any ε>0, there is a δ>0 such that when x, y in D satisfies |x-y|<δ, we have |f(x)-f(y)|<ε.

Pictorially:

Once again we note the difference between standard continuity and uniform continuity. In the former case:

  • For every y in D and ε>0, there exists δ>0 such that for all x in D satisfying |xy|<δ, we have |f(x)-f(y)|<ε.

In the latter case:

  • For every ε>0, there exists δ>0 such that for all x, y in D satisfying |xy|<δ, we have |f(x)-f(y)|<ε.

Using logical quantifiers, these can be succinctly expressed as:

\forall \epsilon, \forall y, \exists \delta, \forall x, P(x,y,\delta,\epsilon) and \forall \epsilon,\exists\delta, \forall y, \forall x, P(x,y,\delta,\epsilon),

where P(xy, δ, ε) says “|xy|<δ implies |f(x)-f(y)|<ε”.

Examples of Uniform Continuity

  1. Let f(x)=x on any subset S of R. We claim that f is uniformly continuous. Indeed, for any ε>0, let δ=ε. Then whenever |xy| < δ and xy in S, we easily get |f(x)-f(y)| = |xy| < δ = ε, which satisfies the definition for uniform continuity.
  2. Let f(x)=1/x on (0, ∞). We shall prove that f is not uniformly continuous. Negating the definition of uniform continuity, we need to find an ε>0 such that for any δ>0, we can find xy>0 with |xy|<δ but |1/x-1/y|≥ε. So let’s take ε=1; for any δ>0, pick x=δ and = δ/(δ+1) < x. Then |xy| = xy < δ but |1/x-1/y|=1.
  3. On the other hand, f(x)=1/x on [1, 2] is uniformly continuous. Indeed, whenever 1 ≤ xy ≤ 2, we have |1/x – 1/y| = |xy|/|xy| ≤ |xy| since |xy| ≥ 1. Hence, given any ε>0, we can just let δ=ε. Now whenever xy in [1, 2] satisfy |xy| < δ, we also have |f(x)-f(y)| ≤ |xy| < δ = ε.

Examples 2 and 3 tell us that uniform continuity depends heavily on the domain of the function.

Main Theorems

The first main result in this article is:

Theorem. If (xn) is a Cauchy sequence in D and f : D → R is a uniformly continuous function, then (f(xn)) is also a Cauchy sequence.

Proof. Let ε>0.

  • Since f is uniformly continuous, pick δ>0 such that when xy in D satisfies |xy|<δ, we have |f(x)-f(y)|<ε.
  • Since (xn) is Cauchy, pick an N such that when mnN, we have |xmxn|<δ.

This implies that when mnN, |f(xm)-f(xn)|<ε. ♦

For example, consider f(x)=1/x on (0, ∞). The sequence x_n = 1/n is Cauchy but the resulting f(x_n) = n is not a Cauchy sequence. Hence, this tells us f(x) is not uniformly continuous, as we had verified earlier directly in example 2 above.

Finally, let’s combine uniform convergence and uniform continuity.

Theorem. Suppose f_n:D \to \mathbf{R} is a sequence of uniformly continuous functions on D, which converges uniformly to f:D\to\mathbf{R}. Then f:D\to\mathbf{R} is also uniformly continuous.

Proof.

Let ε>0. Then by uniformly convergence, there exists N such that whenever nN, we have ||f_n-f||<\epsilon. Fix n. Since f_n:D\to\mathbf{R} is uniformly continuous, there exists δ>0 such that whenever x,y\in D satisfies |xy|<δ, we also have |f_n(x)-f_n(y)|<\epsilon. Hence:

|f(x)-f(y)| \le |f(x)-f_n(x)|+|f_n(x)-f_n(y)|+|f_n(y)-f(y)|<\epsilon+\epsilon+\epsilon = 3\epsilon.

Now just go back and replace ε by ε/3. ♦

Posted in Notes | Tagged , , , , , , | Leave a comment

Basic Analysis: Uniform Convergence

Once again, let D\subseteq \mathbf{R} be a subset. Suppose we now have a sequence of functions

f_n : D\to\mathbf{R}, where n = 1, 2, 3, … ,

such that for each x in D, the sequence (f_n(x)) converges to some real value. We’ll denote this value by f(x), thus defining a function fD → R. We then say that the sequence of functions fn(x) converges to f(xpointwise.

The big question is: if each fn(x) is continuous at x=a, is it true that f(x) is also continuous at x=a?

It may surprise you that the answer is no. [ Historical note: Cauchy actually believed the answer is yes and provided an erroneous proof. This shouldn’t be held against Cauchy since analysis was still in its infancy in the 19th century. On the contrary, one should use this as an indication of how counter-intuitive results in analysis can be. ]

Example

Consider the sequence of functions

f_n : [0,1]\to\mathbf{R}, \quad f_n(x) = x^n.

If 0≤x<1, then clearly fn(x) → 0 as n approaches infinity. On the other hand, for x=1, the sequence is a constant sequence (1, 1, 1, …). Thus fn(x) converges to f(x) pointwise, where f takes any element in [0,1) to 0, and 1 to 1.

(Graphs provided by wolframalpha)

Uniform Convergence

In order to ensure that f is continuous, we need a further condition.

Definition. The sequence of functions fn(x) is said to converge uniformly to f(x) if:

  • for each ε>0, there exists N such that whenever n>N and x is in D, we have |fn(x)-f(x)|<ε.

Pictorially, we have:

[ For large n, the function fn(x) lies between f(x)-ε and f(x)+ε. ]

Let’s consider the difference between pointwise convergence and uniform convergence. In the former case, we have:

  • For each x in D and ε>0, there exists N such that for all n>N, we have |fn(x)-f(x)|<ε.

On the other hand, the latter says:

  • For each ε>0, there exists N such that for all x in D and n>N, we have |fn(x)-f(x)|<ε.

Using logical quantifiers, the two statements become:

\forall x,\forall\epsilon, \exists N, P(N,x,\epsilon) and \forall \epsilon, \exists N, \forall x, P(N,x,\epsilon),

where P(Nx, ε) says that whenever n>N, we have |fn(x)-f(x)|<ε. Written in this manner, it’s clear that uniform convergence implies pointwise convergence.

Main Therorem. If fn(x) converges to f(x) uniformly, and each fn(x) is continuous at x=a, then so is f(x).

Proof. Let ε>0.

  • Since fn(x) → f(x) uniformly, there exists N such that for all x in D and n>N, we have |fn(x)-f(x)|<ε/3. Fix any n>N.
  • Since fn(x) is continuous at x=a, there exists δ>0 such that for all x in D satisfying |xa|<δ, we have |fn(x)-fn(a)|<ε/3, which gives:

|f(x)-f(a)|\le |f(x)-f_n(x)|+|f_n(x)-f_n(a)|+|f_n(a)-f(a)|<\frac\epsilon 3+\frac\epsilon 3+\frac\epsilon 3=\epsilon. ♦

Alternate Definition

Before going through some examples, we consider an alternate definition of uniform convergence which is somewhat conceptually clearer.

Consider the set V of all functions D → R. This forms a real vector space via pointwise addition and scalar multiplication: i.e.

(f+g)(x) := f(x) + g(x), \quad (c\cdot f)(x) = c\cdot f(x), \ c\in\mathbf{R}.

Now we define the norm of f to be ||f|| := \sup\{|f(x)| : x\in D\} which is either a non-negative real number or +∞.  It’s not hard to show that ||f+g|| \le ||f||+||g||, ||fg|| \le ||f||\cdot||g|| and ||c\cdot f|| = |c|\cdot ||f||.

[ Indeed, to prove the first inequality, just note that for any x in D, we have |(f+g)(x)| = |f(x)+g(x)| ≤ |f(x)|+|g(x)| ≤ RHS. The same proof holds for the second inequality. ]

Alternate Definition. The sequence f_n(x) \to f(x) uniformly if:

  • for every ε>0, there exists N such that when n>N, we have ||f_n - f||<\epsilon.

Note that this definition is quite similar to that of convergent sequences, which makes it convenient to work with. [ Side note: the acute reader may suspect that one can study analysis on the above function space V and he’d be right! Indeed, this is the primary motivation behind functional analysis. ]

Examples

  1. Let’s go back to the example f_n(x) = x^n on D=[0,1]. Since the functions converge to a discontinuous function, the above theorem tells us the convergence is not uniform, but let’s verify this directly. Indeed, the function f_n-f takes xn on [0,1) and 0 at x=1. Thus, ||f_n-f||=1 for all n, which shows that the convergence is not uniform.
  2. On the other hand, if we take f_n(x) = x^n on D=[0, 1/2], then they converge to f(x) = 0 uniformly. Indeed, we have ||f_n - f|| = ||f_n|| = 1/2^n for each n, so \lim_{n\to\infty} ||f_n - f|| = 0. As expected, since each fn(x) is continuous, so is the resulting converged function f(x).
  3. If we now take f_n(x) = x^n on D=[0, 1), then they converge to f(x) = 0, albeit not uniformly. Indeed, it’s easy to see that ||f_n - f|| = 1 for each nIn other words, one can have pointwise convergence of a sequence of continuous functions to a continuous function, even if the convergence is not uniform.
  4. Consider the Riemann zeta function f(x)=\zeta(x) = 1^{-x} + 2^{-x} + 3^{-x} + \ldots on (1, ∞). We know this converges at every x>1, but is the function continuous? Indeed, it is: via the following useful trick. First, write f_n(x) = 1^{-x} + 2^{-x} + \ldots + n^{-x}, so that \lim_{n\to\infty} f_n(x) \to \zeta(x) pointwise. Suppose we wish to show that ζ(x) is continuous at x=a>1.
    • Pick any b, 1<b<a.
    • Let gn (resp. g) be the restriction of fn (resp. f) to [b, ∞).
    • Now ||g_n - g||=(n+1)^{-x} + (n+2)^{-x} + \ldots \le (n+1)^{-b} + (n+2)^{-b} + \ldots which is a convergent series since b>1.
    • Hence ||g_n-g|| \to 0, which shows that gn converges uniformly to g.
    • Thus means g is continuous, in particular f is continuous at a. Hence, f is continuous at any a>1.

blue-linBasic Properties of Uniform Convergence

Let’s figure out some basic properties, specifically suppose f_n : D\to \mathbf{R} and g_n : D\to\mathbf{R} are sequences of functions which converge uniformly to f, g:D\to\mathbf{R} respectively. What about the sum f_n+g_n, product f_n g_n and reciprocal 1/g_n?

1. The case of sum is easy.

Indeed, f_n + g_n \to f+g uniformly. To see why, write:

||(f_n+g_n)-(f+g)|| = ||(f_n-f)+(g_n-g)|| \le ||f_n-f|| + ||g_n-g||.

Now, for any \epsilon>0, ||f_n - f|| < \frac\epsilon 2 and ||g_n-g|| < \frac\epsilon 2 for large n. Thus, we also have ||(f_n+g_n)-(f+g)|| < \epsilon for large n.

2. The case of product is true with some additional conditions.

Indeed, let’s write:

||f_n g_n - fg || = ||(f_n-f)g_n + f(g_n - g)|| \le ||f_n - f||\cdot ||g_n|| + ||f||\cdot ||g_n - g||.

Now, if ||f|| and ||g|| are both finite, then in particular, we have ||g_n|| \le ||g_n - g|| + ||g|| < ||g||+1 for large n. It then follows that we can write ||f||, ||g|| < B for some positive bound B and ||f_n g_n - fg|| \le B(||f_n - f|| + ||g_n - g||). Hence, it also follows that f_n g_n \to fg converges uniformly.

What if either ||f|| or ||g|| is infinite? A counter-example is easy:

f_n, g_n:\mathbf{R} \to\mathbf{R}, \ f_n(x) = x, g_n(x) = \frac 1 n.

Then f_n(x) \to x and g_n(x) \to 0 uniformly. However, ||f_n g_n - 0|| = \sup\{\frac {|x|} n: x\in\mathbf{R}\} = \infty.

3. The case of reciprocal is true with some additional conditions.

Let’s consider when 1/g_n \to 1/g uniformly, assuming that g(x) and gn(x) are never 0. Taking the difference gives:

||\frac{1}{g_n} - \frac{1}{g}|| = ||\frac{g_n - g}{g_n g}|| \le ||g_n -g|| \cdot ||\frac{1}{g_n}||\cdot ||\frac 1 g||.

Hence, suppose there is a positive B such that ||1/g_n|| < B for all n. This means |g_n(x)| > 1/B for all n, and x\in D. For some N, we have ||g_n - g|| < 1/(2B), i.e. |g_n(x) - g(x)| < 1/(2B) for all n>N and x\in D. Thus:

|g(x)| \ge |g_n(x)| - |g_n(x)-g(x)| > 1/B - 1/(2B) = 1/(2B), when x\in D, i.e. ||1/g||\le 2B.

Thus, ||\frac 1 {g_n}-\frac 1 g|| \le ||g_n - g||\cdot B\cdot(2B) for all n and it follows that 1/g_n \to 1/g converges uniformly.

What if such a B doesn’t exist, i.e. the infimum of all ||1/g_n|| is zero? Consider the example of g_n: (0, 1]\to\mathbf{R}, g_n(x)=(1+\frac 1 n)x \ . Then g_n(x) converges to g(x)=x uniformly since ||g_n-g|| = ||\frac x n||=\frac 1 n. On the other hand,

||\frac 1 {g_n}-\frac 1 g|| = ||\frac n {(n+1)x}-\frac 1 x|| = ||\frac 1 {(n+1)x}||=+\infty,

so 1/g_n \to 1/g pointwise, but not uniformly.

Summary. Suppose f_n \to f and g_n \to g uniformly, where each f_n, g_n, f, g:D\to\mathbf{R}. Then:

  • f_n+g_n\to f+g uniformly as well;
  • if ||f||, ||g||<\infty, then f_n g_n \to fg uniformly;
  • if there is a positive B such that ||g_n||>B for all n, then 1/g_n \to 1/g uniformly.
Posted in Notes | Tagged , , , , , , | 1 Comment

Basic Analysis: Differentiation (2)

Finding Extremum Points

One of the most common applications of differentiation is in finding all local maximum and minimum points.

Definition. We say f(x) has a local maximum (resp. minimum) at x=a, if there is an open interval (b, c) containing a, such that f(x) attains a maximum (resp. minimum) in this interval at x=a.

Theorem. If f(x) has a local maximum or minimum at x=a and f'(a) exists, then f'(a) = 0.

localextreme

Proof.

We’ll only prove it for maximum (for minimum, replace f with –f). Restrict f to a small open interval about a so that f(x) takes the maximum there.

By definition, the derivative f’(a) is \lim_{x\to a}\frac{f(x)-f(a)}{x-a}. Now when x>a, we have f(x)≤f(a), so \frac{f(x)-f(a)}{x-a}\le 0 and so the limit ≤ 0. On the other hand, when x<a, the inequality f(x)≤f(a) also gives \frac{f(x)-f(a)}{x-a}\ge 0 so the limit ≥ 0. These two inequalities force the limit to be 0. ♦

Example 1

We have a piece of wire of length 2L. We wish to bend it to form an isosceles triangle so that the area is maximum. What’re the optimum dimensions?

Unlike secondary school calculus, we’ll solve this in a rigourous manner. Bear with us if the reasoning below seems a little lengthy, but we wish to have no gaps in our proof.

First, it’s not even clear a maximum exists even if the area is upper-bounded. To solve this problem: we’ll quote a theorem without proof.

Theorem. If C is a closed and bounded subset of Rn, and f:C→Rn is continuous, then f(C) is also closed and bounded.

In particular, since f(C) is bounded it has a supremum and an infimum. Since it’s closed, both these values must belong to f(C). Thus, f(C) has a maximum and minimum, which is the crucial property we need.

Here, a subset C\subseteq \mathbf{R}^n is closed if its complement Rn-C is an open subset of Rn. We’ll have more to say about closed subsets in a later article.

[ Question to think about: is it possible to find C closed but not bounded, and continuous f, such that f(C) is neither closed nor bounded? What about a bounded but not closed C? ]

Proving this requires the concept of compactness, which we’ll delay until a later article. We urge the reader to take our word for now.

Back to the problem. Bending the wire into a triangle of side lengths aab gives 2a+b = 2L. By Heron’s formula, the area is:

\sqrt{L\cdot (L-a)^2 (L-b)}

so we need to maximise f(a) := (L-a)^2 (L-b) = (L-a)^2 (2a -L). Now to form a triangle, we need L/2 ≤ a ≤ L which gives a function f : [L/2, L] → R. Since [L/2, L] is closed and bounded, the image of f attains a maximum and a minimum.

Suppose f(x) attains a maximum at x=a. Then a either lies in (L/2, L) or a=L/2 or a=L. The last two cases are out since f(L/2) = f(L) = 0. The first case is a local maximum which gives f’(a) = 0. But: 

f(x) = (L-x)^2 (2x-L) = 2x^3 - 5L\cdot x^2 + 4L^2\cdot x - L^3 \implies \frac{df}{dx} = 6x^2 - 10L\cdot x + 4L^2.

Setting \frac{df}{dx} = 0 gives us  x=L or (2/3)L. Clearly, the latter is the correct answer. So the optimum dimensions are (2L/3, 2L/3, 2L/3), an equilateral triangle.

Example 2.

Suppose we have a piece of wire of fixed length 2L and we wish to bend it to form a triangle. What is the largest possible area of the triangle?

First, label the sides of the triangle by ab, 2L-(a+b). To form a triangle, we need the following inequalities:

  • a+b \ge 2L-(a+b) \implies a+b \ge L;
  • a+2L-(a+b) \ge b \implies 0\le b \le L;
  • b+2L-(a+b) \ge a \implies 0\le a\le L.

Thus, we get a map fC → R, where C is the subset of R2 defined by the above inequalities and f denotes the area of the resulting triangle. Since each inequality defines a closed subset, the resulting intersection is also closed. Finally, the second and third inequalities clearly imply region C is bounded.

Hence, f(C) is closed and bounded, so in particular it attains a maximum and a minimum. Suppose f attains a maximum at (ab) in C.

Now fix a and vary b, where max{0, La} ≤ b ≤ L. Now either b is an edge-point in this region, or it’s a local maximum for f(ab). But the edge-points give f = 0, so it must be a local maximum. Write the area as the square root of L(L-a)(L-b)(a+b-L). Since b varies, we differentiate this with respect to b and set it to zero, which gives b = L-(a/2), i.e. the sides are a, L-(a/2) and L-(a/2) so we get an isosceles triangle, which reduces this to the previous problem.

Conclusion: the largest area occurs when the triangle is equilateral.

In summary, since the image of a closed and bounded set under a continuous map is closed and bounded, it must attain a maximum and minimum (if the map is real-valued). Both these extrema points are either in the interior, in which case the derivative is zero since it’s a local extremum, or on a boundary.

blue-linRolle’s Theorem

This says:

Rolle’s Theorem. Suppose f:[a, b] → R (a<b) is a continuous function which is differentiable in the interior (a, b). If f(a) = f(b), then there is a point a < x < b for which f'(x) = 0.

rollestheorem

Proof.

Since [ab] is closed and bounded, we know from the above that f([ab]) attains a maximum at x and minimum at y. Now if axb, then x is a local extremum so we have f’(x) = 0 as desired. Same goes for y. The only remaining possibility is that both x and y are edge-points, i.e. a or b. But that means f(a)=f(b) is both the maximum and the minimum so f(x) is constant on [ab] in which case any ax < b would do. ♦

This can be generalised as follows.

Mean Value Theorem. Suppose f:[a, b] → R (a<b) is a continuous function which is differentiable in the interior (a, b). There exists x in (a, b) such that f'(x) = \frac{f(b)-f(a)}{b-a}.

meanvalue

Proof

Apply a slight transformation to reduce it to the case of Rolle’s theorem: let g(x) = f(x) - cx, for some constant c such that g(a) = g(b). Clearly, c=\frac{f(b)-f(a)}{b-a}. Rolle’s theorem says there exists x in (ab) such that g’(x) = 0, whence we have f'(x) = g'(x) + c = c. ♦

One particularly useful corollary is as follows.

Corollary. Suppose f : (a, b) → R is differentiable and its derivative at every a < x < b is positive. Then f is strictly increasing, i.e. if x < y, then f(x) < f(y).

Proof.

If y and f(x) ≥ f(y), then by Mean Value Theorem, we can find xy such that f'(z) = \frac{f(y)-f(x)}{y-x} \le 0, which is a contradiction. ♦

Example 3.

Suppose we have the function f : (0, ∞) → R, given by f(x) = x^{1/x}. Let’s check the behaviour of f(x) as x increases. The above results suggest that the derivative is helpful: indeed, writing f(x) = \exp(\log(x)/x) gives:

f'(x) = \exp(\log(x)/x)\frac{x \cdot (1/x) - \log(x)}{x^2} = \frac{1 - \log(x)}{x^2}.

We thus see that:

  • for 0 < xef’(x) > 0 so the function is strictly increasing on (0, e);
  • for xef’(x) < 0 so the function is strictly decreasing on (e, ∞);
  • when xef’(x) = 0.

We claim that there’s a local maximum at x=e. Indeed, if e satisfies f(x) ≥ f(e), then Mean Value Theorem says there exists yxye such that f'(y) = \frac{f(x)-f(e)}{x-e}\ge 0 which contradicts the second point. Likewise, if xe satisfies f(x) ≥ f(e), then this contradicts the first point.

Exercise

Prove the Extended Mean Value Theorem : suppose f and g are continuous functions [ab] → R which are differentiable on (ab). If g’(x) ≠ 0 for all x in (ab), then there exists xaxb, such that \frac{f'(x)}{g'(x)} = \frac{f(b)-f(a)}{g(b)-g(a)}.

Answer (highlight to read) : find a constant c such that h(x) := f(x) + c·g(x) satisfies h(a) = h(b). Show that c exists. Apply Rolle’s theorem to h. ]

blue-linL’Hopital’s Rule

This theorem is useful for evaluating the limit of f(x)/g(x) when both the numerator and denominator approach 0. However, we have to be careful in stating it since there’re many traps for the unwary. First, we consider one variation.

Theorem (L’Hopital’s Rule). Suppose f and g are differentiable functions (a, b) – {c} → R. If:

  • \lim_{x\to c} f(x) = \lim_{x\to c} g(x) = 0,
  • g'(x) \ne 0 for all x in (a, b), x ≠ c,
  • \lim_{x\to c}\frac {f'(x)}{g'(x)} = L,

then \lim_{x\to c}\frac{f(x)}{g(x)} = L.

Proof.

We’ll need the Extended Mean Value Theorem in the above exercise: for every x in (ab), x≠c, there exists y between x and c such that:

\frac{f'(y)}{g'(y)} = \frac{f(x) - f(c)}{g(x) - g(c)} = \frac{f(x)}{g(x)}.

Since f’(x)/g’(x) → L, for every ε>0, |f’(x)/g’(x) – L| < ε when x is close to c. The above equality tells us that |f(x)/g(x) – L| < ε when x is close to c. This proves the theorem. ♦

Warning

Even if f’(x)/g’(x) does not converge as x → c, it may still be possible for \lim_{x\to c} f(x)/g(x) to converge. E.g. take our earlier example of f(x) = x^2 \sin(1/x) and g(x) = x, defined on x≠0. Then the first two conditions are satisfied, but now

\lim_{x\to 0}\frac{f'(x)}{g'(x)} = 2x\sin(1/x)-\cos(1/x)

which doesn’t converge. On the other hand \lim_{x\to 0} \frac{f(x)}{g(x)}\to 0 by squeeze theorem. In summary, even if f’(x)/g’(x) fails to converge as x→c, it’s still possible for f(x)/g(x) to converge.

Example 4.

Suppose f(x) = 3sin(x) – sin(3x) and g(x) = x – sin(x). The first two conditions of L’Hopital’s Rule are satisfied. So \lim_{x\to 0} \frac{f(x)}{g(x)}=L if \lim_{x\to 0} \frac{f'(x)}{g'(x)}=L. [ Note that the converse isn’t true, as the above warning tells us. ]

So, we write:

\frac{f'(x)}{g'(x)} = \frac{3\cos(x) - 3\cos(3x)}{1-\cos(x)}.

Again the first two conditions of L’Hopital’s Rule are satisfied for this expression, so let’s look at the next derivative:

\frac{f''(x)}{g''(x)} = \frac{-3\sin(x)+9\sin(3x)}{\sin(x)}.

The same thing happens, so we take the derivative again:

\frac{f'''(x)}{g'''(x)} = \frac{-3\cos(x)+27\cos(3x)}{\cos(x)}.

This time, as x→0, the expression approaches 24. Thus, the original f(x)/g(x)→24 as x→0.

Posted in Notes | Tagged , , , , , | Leave a comment

Basic Analysis: Differentiation (1)

In this article, we’ll look at differentiation more rigourously and carefully. Throughout this article, we suppose f is a real-valued function defined on an open interval (bc) containing a, i.e. f : (bc) → R with bac.

Theorem. The derivative of f(x) at a is defined by:

f'(a) := \lim_{x\to a} \frac{f(x) - f(a)}{x-a}.

If the limit exists, we also say f(x) is differentiable at x=a. If f(x) is differentiable at every point in (b, c), then we get a function (b,c) \to \mathbf{R}, x\mapsto f'(x). The resulting function is then written as \frac{df}{dx}.

Pictorially, the derivative measures the gradient of the tangent at x=a:

Example 1

Consider R → R, given by f(x) = x3. At any point x=a, we have:

f'(a) = \lim_{x\to a}\frac{x^3 - a^3}{x-a} = \lim_{x\to a}\frac{(x-a)(x^2+xa+a^2)}{x-a}

which is equal to x^2 + xa + a^2 when xa. Hence, as xa, the expression tends to a^2 + aa + a^2 = 3a^2. Conclusion: the derivative exists at every point a on the real line and \frac{df}{dx} = 3x^2.

Example 2

Consider fR-{0} → R, given by f(x) = 1/x. At any point x=a≠0, note that f(x) is defined on an open interval containing a. Then:

f'(a) = \lim_{x\to a}\frac{x^{-1} - a^{-1}}{x-a} = \lim_{x\to a} -\frac 1{ax} = -\frac 1 {a^2}.

Hence, \frac{df}{dx} = x^{-2}. Conclusion: the derivative exists for every non-zero a.

Example 3

Take the function fR → R, given by f(x) = |x|. If x=a>0, then f(x)=x on an open interval containing a so it’s easy to check that f’(a) = 1. Likewise, if x=a<0, then f’(a) = -1. Finally, we shall show that f’(0) does not exist. But f’(0) is the limit of g(x) := (|x|-|0|)/(x-0) = |x|/x as x tends to 0. Note that g(x)=1 for x>0, and g(x)=-1 for x<0. Hence, the left limit is -1 while the right limit is +1. Since the two limits don’t match, the limit doesn’t exist. Conclusion: the derivative exists for every non-zero a.

Properties of the Derivative

We have the following basic properties.

Theorem. Let f(x) and g(x) be defined on an open interval about a. If they are both differentiable at x=a, then:

  • f(x) is continuous at x=a;
  • (addition rule) the derivative of the sum (f+g)(x) at x=a is (f+g)'(a) = f'(a) + g'(a);
  • (product rule) the derivative of the product (fg)(x) at x=a is (fg)'(a) = f'(a)g(a) + f(a)g'(a).

Proof.

1. For the first property, we have f'(a) = \lim_{x\to a}\frac{f(x)-f(a)}{x-a}, so:

\lim_{x\to a} (f(x) - f(a)) = \lim_{x\to a}\frac{f(x)-f(a)}{x-a} \cdot \lim_{x\to a}(x-a)

if both limits on the RHS exist. But they do: the first is f’(a) while the second is 0. So the LHS tends to 0 and f(x)→f(a) as xa as desired.

2. For the second property, write:

(f+g)'(a) = \lim_{x\to a}\frac{(f+g)(x)-(f+g)(a)}{x-a} = \lim_{x\to a}\frac{f(x)-f(a)}{x-a} + \lim_{x\to a}\frac{g(x)-g(a)}{x-a}.

The two limits on the right are f’(a) and g’(a) respectively. Hence, the middle limit equals f’(a)+g’(a) and we’re done.

3. For the final property, write (fg)'(a) as:

\lim_{x\to a}\frac{f(x)g(x) - f(a)g(a)}{x-a} = \lim_{x\to a} f(x)\frac{g(x)-g(a)}{x-a} + \lim_{x\to a}\frac{f(x)-f(a)}{x-a}\cdot g(a).

Since f is differentiable at x=a, it’s continuous there also. Thus, the limits on the RHS all exist the expression approaches f(a)g’(a) + f’(a)g(a), as desired. ♦

The next property is extremely useful computationally.

Theorem (Chain Rule). Suppose f(x) (resp. g(x)) is defined on an open interval containing a (resp. f(a)). If f'(a) and g'(f(a)) both exist, then the composed function h = g°f gives h'(a) = g'(f(a))·f'(a).

Proof.

One’s inclined to write:

\frac{h(x)-h(a)}{x-a} = \frac{g(f(x))-g(f(a))}{x-a} = \frac{(g(f(x))-g(f(a))}{f(x)-f(a)} \cdot \frac{f(x)-f(a)}{x-a},

if f(x) ≠ f(a), and let xa. But the problem is that f(x)=f(a) may occur even when x is arbitrarily close to a. This is a slight technical issue, which we’ll counter by defining:

G(y) = \begin{cases} \frac{g(y) - g(f(a))}{y-a}, &\text{ if } y\ne f(a), \\ g'(f(a)), &\text{ if } y=a.\end{cases}

Then G(y) is defined on an open interval about f(a) and is continuous there. Now we can write the above equality as:

\frac{g(f(x))-g(f(a))}{x-a} = G(f(x))\cdot\frac{f(x)-f(a)}{x-a},

if x ≠ a. Now since G is continuous at f(a) and f is continuous at a, taking the limit xa gives (g°f)'(a) = g’(f(a)) f’(a) on both sides. ♦

Now we can describe the derivative of f(x)/g(x).

Corollary. If f(x) and g(x) are differentiable at x=a, and g'(a)≠0, then

  • 1/g is differentiable at x=a and (1/g)'(a) = -g'(a)/g(a)2.
  • f/g is differentiable at x=a and (f/g)'(a) = \frac{g(a)f'(a) - f(a)g'(a)}{g(a)^2}.

Proof. The first property follows from the Chain Rule and Example 2. The second property follows from the first property and the product rule.

Summary. We have derived the standard laws of differentiation in secondary school calculus.

  • (addition rule) \frac{d}{dx}(f+g) = \frac{df}{dx}+\frac{dg}{dx};
  • (product rule) \frac{d}{dx}(fg) = \frac{df}{dx}g + f\frac{dg}{dx};
  • (quotient rule) \frac{d}{dx}(f/g) = \frac{g\frac{df}{dx} - f\frac{dg}{dx}}{g^2};
  • (chain rule) \frac{dg}{dx} = \frac{dg}{dy}\cdot\frac{dy}{dx}.

More Examples

The above rules for manipulating derivatives have multiple applications.

Example 4

Let’s differentiate f_n(x) = x^n for an integer n. Clearly, for n=0, the derivative is 0 and for n=1, it’s 1. For a positive integer n>1, the derivative of fn can be obtained by recursively applying the product rule:

f_n(x) = x\cdot f_{n-1}(x) \implies f_n'(x) = f_{n-1}(x) +x\cdot f_{n-1}'(x),

which gives f_n'(x) = n\cdot x^{n-1} for n≥0. By applying the division rule, we see that in fact this equation holds for all integers n.

Example 5

Let f(x) = (x^2 + 1)^{100}. We wish to compute \frac{df}{dx} in closed form. Let u(x) = x^2+1 and v(y) = y^{100}. Since f(x) = v(u(x)), the chain rule gives us

\frac{df}{dx} = 100(x^2+1)^{99}\cdot 2x = 200x(x^2+1)^{99}.

Example 6

For a fixed positive integer n, write the binomial expansion:

(x+1)^n = C^n_n x^n + C^n_{n-1} x^{n-1} + \ldots + C^n_1 x + C^n_0.

Differentiating both sides with respect to x, the chain rule applied to the LHS gives n(1+x)n-1 while the RHS gives us \sum_{k=1}^n k\cdot C^n_{k} x^{k-1}. Substituting x=1 then gives us the combinatorial equality:

1\cdot C^n_1 + 2\cdot C^n_2 + \ldots + (n-1) C^n_{n-1} + n\cdot C^n_n = n\cdot 2^{n-1}.

Many more examples can be found in our articles on generating functions.

Example 7 (with Warning)

We know that for |x|<1, the geometric series converges to:

1 + x+x^2 +x^3 + \ldots = \frac 1 {1-x}.

One’s tempted to differentiate both sides with respect to x, which gives 1+2x+3x^2+4x^3+\ldots on the LHS and \frac 1{(1-x)^2} on the RHS by using the chain rule with u(x) = 1-x. But there’s no guarantee that the addition rule holds for infinitely many terms, although in this case it does work. The underlying theory will be covered in greater detail in a later article.

Higher Derivatives

From the above results, we see that we can differentiate a function f(x) = p(x)/q(x), where p(x) and q(x) are both polynomials, at any point x=a which is not a root of q(x). A function of this form (p(x)/q(x) for polynomials p(x) and q(x)) is called a rational function. Furthermore, from the division rule the derivative \frac{df}{dx} is also a rational function, which means we can differentiate it as many times as we wish:

\frac{d^2 f}{dx^2} := \frac{d}{dx}\left(\frac{df}{dx}\right),\ \frac{d^3 f}{dx^3} := \frac{d}{dx}\left(\frac{d^2f}{dx^2}\right),\ \ldots.

These can also be denoted by f”f”’, etc, or f(2)f(3), … .

Now, if f(x) is second-differentiable (i.e. f” exists) at x=a, then by definition f’df/dx must exist on an open interval (bc) about a, and also be differentiable at x=a. In particular, since differentiability implies continuity, df/dx is continuous at x=a:

Definition. A function f(x) is said to be continuously differentiable at x=a if it’s differentiable on an open interval about a, and the derivative is continuous at x=a.

Clearly, we have the following implications:

\begin{matrix}\text{second diff.}\\ \text{at }x=a\end{matrix} \implies\begin{matrix}\text{continuously}\\ \text{diff. at }x=a\end{matrix}\implies \begin{matrix}\text{differentiable}\\ \text{at }x=a\end{matrix}\implies \begin{matrix}\text{continuous}\\ \text{at }x=a\end{matrix}

And the implications are strict:

  • Continuous but not differentiable function : easy, take f(x) = |x|; it’s continuous and example 3 tells us it’s not differentiable at x=0.
  • Differentiable but not continuously differentiable function : take f(x)=\begin{cases}x^2 \sin(1/x), &\text{if } x\ne 0,\\ 0, &\text{if } x=0.\end{cases} When a≠0, the derivative f'(a) = 2a \sin(1/a) + a^2\cos(1/a) (-1/a^2) = 2a\sin(1/a) -\cos(1/a), which does not converge as a→0. When a=0, the derivative is \lim_{x\to 0} \frac{f(x)-f(0)}{x-0} = \lim_{x\to 0} x\sin(1/x) = 0 by squeezing the function between |x| and -|x|.
  • Continuously differentiable but not second differentiable function : take f(x) = x|x|. It’s easy to see that the derivative is f’(a) = 2a if a≥0, and f’(a) = -2a if a<0. Thus f’(x) = 2|x| which is continuous but not differentiable.

More generally, it’s easy to see that the function f(x) = xn-1|x| is (n-1)th order continuously differentiable (i.e. the (n-1)th derivative exists and is continuous) but not n-th order differentiable.

Now, we say f(x) is infinitely differentiable if the n-th order derivative exists for all n. For such a function, one’s inclined to write down its Taylor series about x=a:

f(x) = f(a) + f'(a)(x-a)+ \frac{f''(a)}{2!}(x-a)^2 + \frac{f'''(a)}{3!}(x-a)^3 + \ldots

and hope that the Taylor series converges to f(x) in some open interval about a. If so, we say f(x) is analytic. Alas.

  • Infinitely differentiable but not analytic function : let f(x) = \begin{cases}\exp(-x^{-2}), &\text{if } x\ne 0,\\ 0, &\text{if }x=0.\end{cases} At the points x=a≠0, it’s easy to see the n-th derivative is of the form p(x)x^{-m} \exp(-x^{-2}) for some polynomial p(x) and positive integer m. To check its behaviour as x→0, let y=1/x; then x^{-m} \exp(-x^{-2}) = y^m/ \exp(y^2) which approaches zero since exponential increases much faster than any polynomial function. With this, one proves via induction that the n-th derivative at 0 is always 0, so the Taylor series is 0.

It should be noted that this anomaly doesn’t occur for complex analysis. Specifically, a differentiable function C → C on an open set is automatically analytic there and one immediately has all the wonderful results about Taylor series convergence etc. We hope to come around to that at a later date.

Posted in Notes | Tagged , , , , , | Leave a comment

Basic Analysis: Limits and Continuity (3)

Let’s consider multivariate functions f : D\to \mathbf{R}^m where D\subseteq \mathbf{R}^n. To that end, we need the Euclidean distance function on Rn. If x = (x1x2, …, xn) is in Rn, we define:

|\mathbf{x}| := \sqrt{x_1^2 + x_2^2 + \ldots + x_n^2}.

Note that |x| = 0 if and only if x is the zero vector 0. Now we are ready to define continuity.

[ In this and all subsequent articles, boldfaced variables axy, … denote parameters with multiple components. ]

Definition. The function f:D\to\mathbf{R}^m is said to be continuous at a if:

  • for every ε>0, there exists δ>0, such that whenever |xa|<δ and x belongs to D, we have |f(x)-f(a)|<ε.

Let D\subseteq \mathbf{R}^n. A point \mathbf{a}\in\mathbf{R}^n is said to be a point of accumulation, if for any ε>0, there exists \mathbf{x}\in \mathbf{R}^n such that

0 < |\mathbf{x}-\mathbf{a}|<\epsilon.

Theorem. If a is a point of accumulation of D, and we have continuous functions

g, h : D\cup\{\mathbf{a}\}\to \mathbf{R}^m

which agree on D-{a}, then g(a)=h(a).

We’ll skip the proof since it’s identical to the case of single-variable functions, but we’d encourage the conscientious reader to check that the logic follows through.

Basic Properties

Continuous functions satisfy the following basic properties:

Theorem. The property of continuity satisfies the following.

  1. Identity. The identity map D\to D\subseteq\mathbf{R}^n is continuous.
  2. Restriction. If D'\subseteq D and f:D\to\mathbf{R}^m is continuous at a in D’, then so is the restriction of f to D’.
  3. Composition. If f:D\to E is continuous at a and g:E\to\mathbf{R}^m is continuous at b=f(a), then g\circ f:D\to \mathbf{R}^m is continuous at a.
  4. Projection. The map \pi_i : \mathbf{R}^m \to\mathbf{R} which projects to i-th component is continuous at every point.
  5. Reduction of Codomain to R. The function f:D\to\mathbf{R}^m is continuous at a iff \pi_i\circ f is continuous for each i.
  6. Arithmetic. The addition and product functions \mathbf{R}\times\mathbf{R}\to\mathbf{R} are continuous, as is the reciprocal \mathbf{R}-\{0\}\to\mathbf{R}.

Proof.

All the properties have rather straightforward proofs. We’ll use our earlier lingo: say “when x close to a in D, P(x) holds” if there exists δ>0 such that whenever |xa|<δ and lies in D, the property P(x) holds.

  1. Identity. Easy: let δ=ε.
  2. Restriction. Again let δ=ε.
  3. Composition. Let ε>0. Since g is continuous at f(a), there exists δ>0 such that when |yf(a)|<δ and y is in E|g(y)-gf(a)|<ε. Likewise, since f is continuous at a, there exists δ’>0 such that when |xa|<δ’ and x is in D, then |f(x)-f(a)|<δ and hence |gf(x)-gf(a)|<ε.
  4. Projection. Use the inequality |\pi_i(\mathbf{x})-\pi_i(\mathbf{a})| \le |\mathbf{x}-\mathbf{a}|.
  5. Reduction of Codomain to R. If f is continuous, then \pi_i\circ f is continuous by the above two properties. Conversely, suppose each \pi_i\circ f is continuous. Let ε>0. For x close to a in D, we have |\pi_i f(\mathbf{x})-\pi_i f(\mathbf{a})|< \epsilon/\sqrt{m} for each i=1,2,…,m. Hence, when x is close to a in D, we have |f(x)-f(a)| < ε as well.
  6. Arithmetic.
    • For addition, use the inequality |(x+y)-(a+b)| ≤ |xa|+|yb| ≤ |(x,y)-(a,b)|.
    • For multiplication, |xyab| = |x(yb)+(xa)b| ≤ |x|·|yb|+|b|·|xa|. When |xa|<1, we then have |x|<|a|+1. Hence, for any ε>0, whenever |xa|<min(1, ε/(2|a|+2)) and |yb|<ε/(2|b|) we have |xyab|<ε. Hence set δ=min(1, ε/(2|a|+2), ε/(2|b|)).
    • For division, take |\frac 1 x-\frac 1 a| = \frac{|a-x|}{|a|\cdot|x|}. When |xa|<|a|/2, we have |x|>|a|/2. Hence, for any ε>0, whenever |xa|<min(|a|/2, ε|a|2/2) we get |\frac 1 x-\frac 1 a|<\epsilon.

Exercises

Prove the following using only the above properties, without using the ε-δ definition.

  1. Concatenation” of continuous functions gives a continuous function. Specifically, suppose f:D\to \mathbf{R} and f:E\to\mathbf{R} are continuous at and a’ respectively, where D\subseteq \mathbf{R} and E\subseteq\mathbf{R}. Then h=(f,g):D\times E\to\mathbf{R}^2 given by \mathbf{a}\mapsto (f(\mathbf{a}), g(\mathbf{a})) is continuous at (a, a’).
  2. Prove that if f, gD → R are continuous at a, for D\subseteq \mathbf{R}, then f+gf×g are also continuous at a. If f(a)≠0, then 1/f is continuous at a also. Hence, any polynomial function is continuous.

Answers (highlight to read).

  1. Consider D×→ D → R, where the first map is projection (and hence continuous at (aa’) ) and the second map is f. Since it’s a composition of two continuous maps, the result is continuous. Likewise, D×→ E → is continuous. On the other hand, the two maps are also given by composing (fg): D×→ R×R → R, where the 2nd map is a projection. The 4th property then tells us (fg) is continuous at (aa’).
  2. Use property 5: the composition D×D → R×R → R, where the first map is the concatenation (fg) and the second map is addition/product, is continuous.

Global Continuity

We say that a function f:D\to \mathbf{R}^m is continuous if it’s continuous at every point in the domain. From definition, it’s clear that continuity at a specific point is determined by what happens close to that point. To that end, let’s define open sets.

Definitions. Let D\subseteq \mathbf{R}^n be any subset. An open ball containing \mathbf{a}\in D is the set of points:

N(\mathbf{a}, \epsilon)=\{\mathbf{x}\in D : |\mathbf{x}-\mathbf{a}|<\epsilon\},

for some ε>0. [ Note that the definition depends on D, although the notation doesn’t reflect that. If there’s a possibility of confusion, we’ll write it as N_D(\mathbf{a},\epsilon). ]

A subset U\subseteq D is said to be an open subset of D if for each \mathbf{a}\in D, we have N(\mathbf{a}, \epsilon)\subseteq U for some ε>0.

E.g. each open ball U := N(a, ε) is open, for if x is in U, then |xa|<ε, so we let δ = ε-|xa|. We claim that N(\mathbf{x},\delta)\subseteq N(\mathbf{a},\epsilon). Indeed, any y in N(x, δ) satisfies |yx|<δ so

|ya| ≤ |yx| + |xa| < δ+|xa| = ε.

More Examples of Open Subsets

  1. Let DR. Then any open interval (bc) is an open subset of D since it’s an open ball. The punctured neighbourhood (bc) – {a} is also an open subset of D since it’s a union of two open balls. The half-open interval U = (bc] is not open in D since the point c in U isn’t contained in any open ball of R which is a subset of U. [ Any open ball containing c is of the form (c-ε, c+ε), which contains points outside U. ]
  2. Let D = [0, 1]\cup\{2\}. The subset U = [0, 1] is now an open subset of D. This may confuse the reader at first glance, but note that even if we pick the endpoint 1 (in U), we can pick the open ball N(1, 1/2) in D. By definition, this is the set \{x\in D: |x-1|<\frac 1 2\}, so it’s (1/2, 1] which is wholly contained in U. Likewise, the singleton subset V = {2} is also a subset of D, since the open ball N(2, 1/2) = {2} is wholly contained in U.
    • Which of the subsets [0, 1), (0, 1], {1}, [0, 1/2), (0, 1/2] are open in D? [ Answer: the first, second, fourth. ]

Here are further examples of open subsets U\subseteq D, where D is coloured in blue and U in red.

In the first two examples, D=\{(x,y)\in\mathbf{R}^2: x^2+y^2\le 1\} while in the other two, D=\{(x,y)\in\mathbf{R}^2:x^2 + y^2\le 1\}\cup\{(2,0)\}. The dotted boundary lines indicate that the boundaries are not included in U.

The main theorem we wish to prove is:

Theorem. Let f:D\to \mathbf{R}^m be a function, where D\subseteq \mathbf{R}^n. Then f is continuous if and only if for each open subset U\subseteq \mathbf{R}^m, f^{-1}(U) is an open subset of D.

Proof.

Suppose f is continuous. To show that f^{-1}(U) is open in D, let \mathbf{a}\in f^{-1}(U). Since f(a) is in U which is open, there exists an ε>0 such that N(f(a), ε) is a subset of U. And since f is continuous at a, there exists a δ>0 such that whenever |xa|<δ and x in D, we have |f(x)-f(a)|<ε. Hence

f(N(\mathbf{a}), \delta))\subseteq N(f(\mathbf{a}, \epsilon)\subseteq U\implies N(\mathbf{a},\delta)\subseteq f^{-1}(U)

which shows that f^{-1}(U) is open in D.

Conversely, suppose f^{-1}(U)\subseteq D is open for any open subset U\subseteq \mathbf{R}^m. To show f is continuous at a (an element of D), let ε>0. Since U:=N(f(\mathbf{a}),\epsilon)\subset \mathbf{R}^n is open, so is f^{-1}(U). But \mathbf{a}\in f^{-1}(U), so by definition of open subsets, there exists δ>0 for which

N(\mathbf{a}, \delta)\subseteq f^{-1}(U) \implies f(N(\mathbf{a},\delta))\subseteq U=N(f(\mathbf{a}),\epsilon)

so whenever |xa|<δ and x in D, we have |f(x)-f(a)|<ε. ♦

Main point. The above theorem suggests that the central concept behind analysis is that of open subsets. Indeed, the core idea behind topology is to build up an entire theory based only on the concept of open subsets. This will be elaborated in subsequent posts.

Important Exercises.

  1. Find a continuous function fR → R and an open subset U of R such that f(U) is not open in R.
  2. Prove that if U and V are open subsets of D, then so is U ∩ V. In particular, this means that finite intersections of open subsets are still open.
  3. Prove that if {Ui} is a collection of open subsets of D, then so is U:=\cup_i U_i.
  4. Suppose E\subseteq D.
    1. Prove that if U is open in D, then V := ∩ E is open in E.
    2. Prove that conversely, if V is open in E, then V∩ E for some open subset U of D.

[ Note: for Q4, be careful with the notation N(a, ε) since the ambient space can be either D or E. ]

Sketch of Answers [Highlight to read.]

  1. Let f(x)=x2 and U=(-1, 1). Then f(U) = [0, 1) is not open in R. However, in this case, it is open in the image of f, which is [0, ∞). Can you find an example where f(U) is not open in the image of f?
  2. Let a be in U ∩ V. For some ε>0 and ε’>0, N(a, ε) and N(a, ε’) are subsets of U and V respectively. Thus, N(a, min(ε, ε’)) is a subset of U ∩ V.
  3. Let a be in U. Then a lies in some Ui so for some ε>0, N(a, ε) is a subset of Ui and hence U.
  4. For (A), suppose a is in V. So a is in the open set U and for some ε>0, ND(a, ε) is a subset of U. Since NE(a, ε) = ND(a, ε) ∩ E, we also have NE(a, ε) is a subset U ∩ EV. For (B), let a be in V, which is open in E. For some ε>0, NE(a, ε) is a subset of V. Now let U be the union of all open balls ND(a, ε) in D. Each open ball is open in D and by Q3, U is also open in D. Clearly VU ∩ E.
Posted in Notes | Tagged , , , , , , , | Leave a comment