Mathematics and Such

Topology: Subspaces

Posted on January 25, 2013 by limsup

First, suppose (X, d) is a metric space. If Y is a subset of X, then one can restrict the metric to $d' = d|_{Y\times Y}:Y\times Y\to\mathbf{R}$ , i.e. for any $y,y'\in Y$ , we set d’(y, y’) := d(y, y’). It’s not hard to show that the resulting function is a metric on Y. The resulting pair (Y, d’) is called a metric subspace of (X, d).

The open subsets of Y are related to those of X as follows.

Proposition. A subset $V\subseteq Y$ is open if and only if $V = U\cap Y$ for some open $U\subseteq X$ .

Proof.

First note that if $a\in Y$ , then an open ball in Y is of the form:

$N_{d'}(a,\epsilon) = \{y\in Y: d'(y,a)<\epsilon\} = \{x\in Y:d(y,a)<\epsilon\} \cap Y = N_d(a,\epsilon) \cap Y.$

Suppose U is open in X and V = U ∩ Y. Each $a\in V$ is also in U, and so $N_d(a,\epsilon) \subseteq U$ for some ε>0. Hence $N_{d'}(a,\epsilon) = N_d(a,\epsilon)\cap Y \subseteq U\cap Y = V$ . So V is open in Y.
Conversely, let V be open in Y. For each a in V, we have $N_{d'}(a,\epsilon)\subseteq V$ for some ε>0 depending on a. Take the union U of all such $N_d(a,\epsilon)$ ; then U is open in X since it’s a union of open balls in X. Also

$U\cap Y =\cup_{a\in V}(N_d(a,\epsilon)\cap Y) = \cup_{a\in V} N_{d'}(a, \epsilon)=V.$ ♦

Exercise

If U is an open ball in X, is U ∩ Y necessarily an open ball in Y?

[ Answer (highlight to read): no, consider X=R and Y=R-{0}. Then (-1, +1) is an open ball N(0, 1) in X but (-1, +1) ∩ Y is the disjoint union of (-1, 0) and (0, +1) which is not an open ball in Y. ]

This inspires us to extend the definition of subspaces to general topological spaces.

Definition. Let (X, T) be a topological space and Y be a subset of X. The subspace topology on Y is defined by:

$T_Y =\{U\cap Y : U\in T\},$

i.e. $V\subseteq Y$ is open if and only if it’s of the form V = U ∩ Y for some open subset U of X.

Equivalently, the class of closed subsets of Y is given by D = Y – (U∩Y) = (X–U) ∩ Y = C ∩ Y for some closed subset C of X.

Examples

A subspace of (X, d) with the discrete metric is still discrete.
Pick the half-open interval $Y=[0, 1)\subset \mathbf{R}$ . Then $[0, 1/2) = (-1, 1/2)\cap Y$ is open in Y but not open in R.
Consider the subset $Y=(-\infty, -1]\cup\{0\}\cup[1, \infty)$ of the real line R. The singleton set {0} is an open subset of Y since {0} = N(0, 1/2). Furthermore, it is closed since the complement is a union of two open subsets.
Consider the subset Z of R under the usual metric. Then the resulting subspace is the discrete space even though the induced metric d(m, n) = |m–n| is not exactly the discrete metric.
Let $X=\mathbf{R}^2$ and Y be the set of points (x, y) satisfying $x^2 + y^2 = 1$ . Geometrically, Y is a circle. Here, we’ll think of it as a topological space with the subspace topology inherited from X. The space is denoted $S^1$ . More generally, for each positive integer n, the space $S^n$ is the subspace of $\mathbf{R}^{n+1}$ comprising of all points $(x_1, x_2, \ldots, x_{n+1})$ satisfying $\sum_{i=1}^{n+1} x_i^2 = 1.$
Consider X = N under the right order topology.
1. If Y = {1, 2, 3}, then the subspace topology gives { (empty set), {3}, {2, 3}, Y }.
2. If Y is the set of even numbers, then the bijection $X\to Y, n\mapsto 2n$ preserves the structure of topological spaces. We say that the two spaces are homeomorphic. We will say more about this in a later article.

Since we’re often dealing with multiple spaces and subspaces, when describing open/closed subsets, it’s essential to qualify with “(XX) is open/closed in (YY)” instead of merely saying “(XX) is open”. E.g. in example 2 above, Y = [0, 1) is a subspace of R and the subset [0, 1/2) is open in Y but not in R.

The following diagram illustrates some open subsets of Y in example 3.

Exercises

Consider the subspace Y = {1/n : n positive integer} of R, under the usual metric. Is this space discrete? What about $Y^* := Y\cup\{0\}$ ?
Example 2 gives us the concept of clopen subsets, i.e. subsets of a topological space which are both open and closed. In any topological space X, $\emptyset$ and X are always clopen. Are there any other clopen subsets of Q, where Q inherits the subspace topology from R?

Answers

Y is discrete since each {1/n} = (1/n – ε, 1/n + ε) ∩ Y for some small ε>0. On the other hand Y* is not discrete since any open subset containing 0 must also contain some 1/n.
Yes, Q has infinitely (in fact, uncountably) many clopen subsets. If r is irrational, then (-∞, r) ∩ Q and (r, ∞) ∩ Q form a disjoint union of Q by open subsets.

Basic Properties of Subspaces

The following basic property is often taken for granted.

Theorem. If (X, T) is a topological space and $Z\subseteq Y\subseteq X$ are subsets, then we can form the subspace topology on Z in two ways:

by taking the subspace topology $T\mapsto T_Z$ from $Z\subseteq X$ ; and

by taking successive subspaces $T\mapsto T_Y$ which is a topology on Y, then $T_Y\mapsto (T_Y)_Z$ .

The two topologies are identical.

The proof is straightforward: in the first case, the class of all open subsets of Z is given by U ∩ Z for open subsets U of X; in the second case, the class is given by (U ∩ Y) ∩ Z = U ∩ Z for open subsets U of X. They’re identical. ♦

The following properties are also surprisingly useful in practice.

Theorem. Let Y be a subspace of (X, T). If Y is open in X, then any open subset of Y is an open subset of X. If Y is closed in X, then any closed subset of Y is a closed subset of X.

Proof.

For the first statement, an open subset of Y is of the form V = U ∩ Y for some open subset U of X. Since U and Y are both open in X, so is V = U ∩ Y. The same proof holds for the second statement by replacing ‘open’ with ‘closed’. ♦

Furthermore, for bases and subbases, we have:

Theorem. Let $(X, T)$ be a topological space with subspace $(Y, T_Y)$ .

If B is a basis for T, then $B_Y :=\{U\cap Y : U\in B\}$ is a basis for Y.

If S is a subbasis for T, then $S_Y:=\{U\cap Y: U\in S\}$ is a subbasis for Y.

Proof.

For the first statement, we first verify that $B_Y$ is indeed a basis of some topology over Y:

$\cup_{V\in B_Y} V = \cup_{U\in B} (U\cap Y) = \left(\cup_{U\in B} U\right)\cap Y = X\cap Y = Y.$
Any two elements of $B_Y$ are of the form $U_1\cap Y, U_2\cap Y$ for some basic open subsets $U_1, U_2\in B$ . Since B is a basis, $U_1 \cap U_2 = \cup_i V_i$ for some $V_i \in B$ . This gives $(U_1 \cap Y)\cap (U_2\cap Y) = (U_1\cap U_2)\cap Y = \left(\cup_i V_i\right)\cap Y=\cup_i (V_i\cap Y)$ which is a union of elements in $B_Y$ .

Finally, we need to show $B_Y$ generates the topology $T_Y$ .

Every element $U\cap Y \in B_Y$ (for some element U in B) is open in Y by definition. So $B_Y\subseteq T_Y$ .
Conversely, any element of $T_Y$ is of the form $U\cap Y$ for some open subset U of X. Since B is a basis for (X, T), $U=\cup_i V_i$ for some $V_i\in B$ . So $U\cap Y = \cup_i (V_i\cap Y)$ which is a union of elements in $B_Y$ .

This concludes the proof for the first statement.

For the second, suppose the intersections of finitely many elements of S form basis B. It suffices to show that the intersections of finitely many elements of $S_Y$ form basis $B_Y$ .

In one direction, note that $S\subseteq B\implies S_Y \subseteq B_Y$ .
Conversely, any element of $B_Y$ is of the form U ∩ Y for some $U\in B$ . So $U=V_1\cap V_2 \cap \ldots \cap V_n$ for finitely many $V_i\in S$ . But this gives $U\cap Y = \cap_i (V_i\cap Y)$ for finitely many $V_i\cap Y\in S_Y$ .

And the proof is complete. ♦

Summary.

Any subset of a topological space X inherits a topology from it. The inheritance is consistent across inclusion chains of topological spaces.

With spaces and subspaces, one should be more careful when talking about “open sets”, i.e. mention what it’s open in.

If Y is open (resp. closed) in X and Z is open (resp. closed) in Y, then Z is open (resp. closed) in X.

If Y is a subspace of X, then a basis (resp. subbasis) of X restrict to give a basis (resp. subbasis) of Y.

Posted in Notes | Tagged advanced, bases, clopen sets, homeomorphisms, metric spaces, subbases, subspaces, topology | Leave a comment

Topology: Bases and Subbases

Posted on January 21, 2013 by limsup

Bases

Recall that though a subring or ideal of a ring may be rather huge, it often suffices to specify just a few elements which will generate the subring or ideal. Likewise, in a topology, one can specify a few open sets and generate the rest via unions and finite intersections. We’ll expound upon that in what follows.

First, note that when (X, d) is a metric space, a subset U of X is open if and only if it is a union of (possibly infinitely many) open balls. Indeed, since every open ball is open, so is a union of multiple open balls. Conversely, if U is open, each element $x\in U$ is contained in some open ball $N(x,\epsilon) \subseteq U$ for some ε>0 which depends on x; hence U is the union of all these N(x, ε).

In other words, consider B = {N(x, ε) : x in X, ε>0}, the collection of all open balls; a subset of X is open if and only if it is a union of elements of B. Generalising this concept for topological spaces gives us the following definition.

Definition. A basis for a topology (X, T) is a collection of open sets, $B\subseteq T$ such that every open subset U of X is a union $U=\cup_i V_i$ of elements $V_i\in B$ . We shall call an element $V\in B$ a basic open set.

Examples

As we saw above, the set B of open balls in a metric space (X, d) forms a basis of the induced topology.
In particular, the set of open intervals (a, b) in R forms a basis.
In the discrete topology, the collection of singleton sets {x} forms a basis. In fact, if you take the collection of open balls {N(x, 1/2)} for the discrete metric, you get the same basis.
Since the metrics d, d₁ and d_∞ on Rⁿ give rise to the same topology, we can pick the set of open balls under any one metric to produce a basis.
Exercise: find all possible bases of N under the right-order topology. [ Answer: there’s no non-trivial basis, i.e. any basis must include all open sets. ]

Our next question is: suppose $B\subseteq \mathbf{P}(X)$ is some collection of subsets of X. Is it always the basis of some topology?

If it were, then clearly the resulting topology would be:

$T=\{\cup_i V_i : \{V_i\}\subseteq B\}.$

Let’s check if T satisfies the axioms of a topology.

Empty set and X : the empty set can be written as a union of an empty collection of $V_i$ , so no problem there; for X, we need to specify that $\cup_i V_i = X$ .
Arbitrary union : this is obvious, since the union of sets $U_i := \cup_j V_{ij}$ (where each $V_{ij} \in B$ ) is also of the form $\cup_{i, j} V_{ij}$ which lies in T.
Finite intersection : write $U_1 = \cup_i V_{1i}$ and $U_2 = \cup_j V_{2j}$ , with each $V_{1i}, V_{2j}\in B$ . Since intersection is distributive over union, we get:

$U_1\cap U_2 = (\cup_i V_{1i})\cap (\cup_j V_{2j}) = \cup_{i,j} (V_{1i} \cap V_{2j}).$

For this to be in T, a sufficient condition is that $V_1\cap V_2\in T$ for all $V_1, V_2\in B$ . On the other hand, this condition is obviously necessary since if $V_1, V_2\in B$ , we have $V_1, V_2\in T\implies V_1\cap V_2 \in T$ . Thus we have proven:

Theorem. A subset $B\subseteq \mathbf{P}(X)$ is a basis for some topology if and only if:

the union of all $V\in B$ is the whole X; and

for any $V_1, V_2\in B$ , the intersection $V_1 \cap V_2$ is a union of elements from B.

Application: Furstenberg’s Proof of the Infinitude of Primes

While he was an undergraduate, Hillel Furstenberg found an innovative proof that there’re infinitely many prime numbers, via the concept of topology.

Theorem. There are infinitely many prime numbers.

Proof (Furstenberg)

Define a topology on set of integers Z as follows. For integers a, b let V(a, b) be the set of all integers am+b for integer m. Then any intersection $V(a,b)\cap V(c,d)$ corresponds to solutions of simultaneous linear congruences $x\equiv b\pmod a$ , $x\equiv d\pmod c$ . From elementary number theory, this intersection is either empty or V(e, f) for some integers e, f (where e = lcm(a, c)). Since V(1, 0) = Z, {V(a, b)} forms a basis for some topology T on Z.

Since the complement Z – V(a, b) is a union of V(a, b’) for various b’, it is open, i.e. V(a, b) is both open and closed. Now, since any integer other than ±1 is divisible by some prime p, we have:

$\mathbf{Z} -\{-1, +1\} = \cup_{p \text{ prime}} V(p, 0).$

If there’re finitely many primes, then the RHS is a union of finitely many closed sets and is hence closed. Thus, {-1, +1} is open, which is ridiculous: since every basic open set V(a, b) is infinite, every open set must be infinite too. ♦

Subbases

Let’s fix an underlying set X and consider various topologies on it. If $\{T_i\}$ is a collection of topologies on X, one easily checks that so is $T:=\cap_i T_i$ where a subset of X is open in T if and only if it’s open in all T_i. In short, an intersection of topologies is still a topology; so given any subset $S\subseteq \mathbf{P}(X)$ , one can consider the collection of all topologies containing S (this is a non-empty collection since it includes the discrete topology) and take their intersection.

Definition. Under the above definition, the resulting topology is called the topology generated by S. One also says that S is a subbasis for the topology T.

Put in another way, T is the “smallest” topology on X containing S, in the following sense:

$S\subseteq T$ ;
if T’ is any topology containing S, then $T\subseteq T'$ .

[ This is similar to case where an arbitrary subset of a group or ring can be used to generate a subgroup or subring. ]

Theorem. The topology T generated by subbasis S has, as basis,

$B(S) := \{W_1 \cap W_2 \cap \ldots \cap W_k : W_i\in S\} \cup \{X\}.$

[ Note: the additional term {X} may be superfluous if one interprets X as being the intersection of no W_i‘s. The more terms we intersect, the smaller the set becomes, so if we intersect no terms at all, we get the universal set X.]

Proof.

The intersection of any two elements of B(S) still lies in it, so B(S) is indeed a basis for some topology T’.
Since $S\subseteq B(S)\subseteq T'$ , we have $T \subseteq T'$ too, since T is the smallest topology containing S.
Finally, if $V=W_1\cap \ldots \cap W_k \in B(S)$ , for some $W_i\in S$ , then since $W_i\in T$ , we have $V\in T$ as well. Thus, $T'\subseteq T$ . ♦

In general, bases and subbases can be useful tools in topological proofs since to verify a property, it often only suffices to do it on basic open sets, or even subbasic open sets.

Summary. We have:

$\text{Subbasis S} \stackrel{\footnotesize \begin{matrix}\text{take}\\ \text{finite} \\ \text{intersections}\end{matrix}}{\implies} \text{Basis }B \stackrel{\footnotesize\begin{matrix}\text{take}\\ \text{unions}\end{matrix}}{\implies} \text{Topology }T.$

Examples

On R, the set of intervals of the form (-∞, b) and (a, ∞) forms a subbasis; indeed, the intersection gives (-∞, b) ∩ (a, ∞) = (a, b) or the empty set.
If |X|>2, then the collection of two-element subsets {a, b} forms a subbasis since if a, b, c are distinct, then {a, b} ∩ {b, c} = {a}.
For the topology on Z in Furstenberg’s proof, the set of V(a, b) for prime power a forms a subbasis by Chinese Remainder Theorem.

Posted in Notes | Tagged bases, furstenberg's proof, generated topology, infinitude of primes, open balls, open subsets, subbases, topology | Leave a comment

Topology: Basic Definitions

Posted on January 18, 2013 by limsup

Motivation and Definition

While studying analysis, one notices that many important concepts can be defined in terms of “open sets”. One gets the inkling that this concept is critical in forming our notions of continuity, limits etc. In this article, we shall abstractise this further by considering only open sets. At first glance, it does seem rather unlikely that such a general definition could be useful in any manner. However, it will soon become clear that abstractisation has the following advantages:

It brings out the essence of proofs of theorems, by highlighting specific properties of an object which are critical to the proofs.
The concepts can be applied to vastly different areas of mathematics, including for example, number theory, Galois theory, commuative algebra (in more ways than one!) etc, and have proven to be extraordinarily fruitful in furthering their study.

Enough talk, let’s begin. Let X be any set. As mentioned above, we wish to define a topology on it by stating what constitutes “open sets”.

Definition. Let $\mathbf{P}(X)$ be the power set of X (i.e. set of all subsets of X). A topology on X is a subset $T\subseteq \mathbf{P}(X)$ satisfying:

$\emptyset, X \in T$ ;

if $U_1, U_2 \in T$ , then $U_1 \cap U_2\in T$ ;

if $U_i\in T$ is a collection of elements of T, then $\cup_i U_i\in T$ .

If U is a subset of X belonging to T, we say U is an open subset of X. If C is a subset of X whose complement X-C belongs to T, we say C is a closed subset of X.

Some observations:

The definition does not preclude the possibility that a subset can be both open and closed, so the terminology is a little unfortunate. For a quick motivation of why closed subsets are of interest, read here.
By repeatedly applying intersection, one notes that if $U_1, U_2, \ldots, U_n$ are open, then so is $U_1\cap U_2\cap\ldots\cap U_n$ . In other words, a topology is closed under an intersection of finitely many terms, but not infinitely many terms in general (see the example of R later, where the intersection of all $U_n = (-\frac 1 n, +\frac 1 n)$ is {0}, which is not open).

Note that we could also define a topology by stating what constitutes “closed subsets” of X. Thus:

$\emptyset, X$ are closed subsets of X;
if $C_1, C_2$ are closed subsets of X, then so $C_1 \cup C_2$ is closed;
if $C_i$ is a collection of closed subsets of X, then $\cap_i C_i$ is closed.

Definition. Suppose $T_1, T_2$ are both topologies on X. If $T_1\subseteq T_2$ , then we say $T_2$ is a finer topology than $T_1$ and conversely $T_1$ is a coarser topology than $T_2$ .

[ Note: this diagram is meant as an illustration of finer/coarser; it does not represent actual topologies. ]

Examples of Topologies

On any set X, we have the discrete topology T = P(X) where every subset is open; this is the finest topology possible for X. Also, there’s the trivial topology $T=\{\emptyset, X\}$ , which is the coarsest topology possible for X.
Let $X=\mathbf{R}^n$ . The standard topology on X was already defined earlier. Recall that $U\subseteq X$ is open if and only if for each x in U, there is an ε>0, such that any y in X satisfying |y – x|<ε must be in U also.
Let X be a set. The cofinite topology is defined via having $U\subseteq X$ open if and only if X–U is finite. In other words, the class of closed subsets is precisely the class of finite subsets, from which it’s clear that the cofinite topology satisfies the set of axioms of a topology.
Let N = {1, 2, … } be the set of positive integers. The right order topology on N is where the only non-empty open subsets are $U_m = \{m, m+1, m+2, \ldots\}$ for m=1, 2, 3, … . Note that $U_m \cap U_n = U_{\max(m,n)}$ and $\cup_i U_{m_i} = U_{\min(m_i)}$ ; the fact that the order topology is closed under arbitrary union follows from that every subset of N has a minimum.
If $\mathbf{N}^* = \mathbf{N}\cup\{\infty\}$ , where ∞ is just a dummy symbol here, then we can define a topology on N* by decreeing that the only non-empty open subsets are $U_m^* = U_m \cup\{\infty\}.$ Thus, any open subset containing ∞ contains all sufficiently large natural numbers.

[ Side note: more generally, one can define the order/right-order and left-order topology on any totally ordered set. Note that the right-order topology on N* differs from the one described in example 5, since {∞} is an open subset in the right-order topology. ]

Metric Spaces

One common source of topologies is via a metric function, which provides a geometry by indicating the “distance” between any two points in a set.

Definition. A metric on a set X is a function $d:X\times X\to\mathbf{R}$ such that:

d(x, y) ≥ 0 for any x, y in X, with equality if and only if x=y;

d(x, y) = d(y, x) for any x, y in X;

d(x, z) ≤ d(x, y) + d(y, z) for any x, y, z in X (triangular inequality).

The pair (X, d) is then called a metric space.

The triangular inequality ensures that the function d behaves like a “distance function”.

Examples

The discrete metric on any X is defined by d(x, y) = 0 if x=y and 1 otherwise.
Let $X=\mathbf{R}^n$ . The Euclidean metric is the standard distance function $d(x,y) = \sqrt{\sum_{i=1}^n (x_i - y_i)^2}$ .
Let $X=\mathbf{R}^n$ again. This time we take the metrics $d_1(x,y)=\sum_{i=1}^n |x_i - y_i|$ and $d_\infty(x,y)=\max_{i=1}^n |x_i - y_i|$ . [ See diagram below. ]
Let p be a prime number and X=Z, the set of integers. Given any integer m let where is the highest power of p dividing m; in the case m=0, since all powers of p divide m we have . Then is a metric, called the p-adic metric on Z. This was briefly alluded to in a prior post. [ Note: the triangular inequality follows from the fact that if divides both x–y and y–z, then it divides x–z. One way of looking at this metric is that two integers are near if their difference is divisible by a high power of p. ]
- E.g. suppose p=5. Then d(5, 3)=|5-3|₅ = 0, d(57, -18)=|75|₅ = 1/25, and d(152, 27) = 1/125.

The following theorem shows that a metric space gives rise to a topology.

Theorem. Let (X, d) be a metric space. For x in X and ε>0, the open ball with radius ε>0 and centre x is defined by

$N(x, \epsilon) = \{y\in X: d(x, y)<\epsilon\}.$

We say that a subset $U\subseteq X$ is open if for any x in U, there is an open ball $N(x,\epsilon)\subseteq U$ for some ε>0. Then the collection of open sets form a topology.

[ Note: the above diagrams for d, d₁ and d_∞ illustrate the open balls for the respective metrics. ]

Proof.

Clearly, $\emptyset$ and X are both open. Now, suppose $U_1, U_2\subseteq X$ are open (in the sense of metric space). If $x\in U_1\cap U_2$ , then x lies in both U₁ and U₂. Thus, we can find $\epsilon_1, \epsilon_2>0$ such that:

$N(x, \epsilon_1) \subseteq U_1$ and $N(x,\epsilon_2)\subseteq U_2$ .

So $N(x, \min(\epsilon_1, \epsilon_2)) \subseteq N(x,\epsilon_1)\cap N(x,\epsilon_2) \subseteq U_1 \cap U_2$ .

Finally for arbitrary union of open subsets, let $U_i\subseteq X$ be open and $U = \cup_i U_i$ . If x lies in U, then it must lie in some U_i. Thus, for some ε>0, $N(x,\epsilon) \subseteq U_i \subseteq U$ . ♦

For example,

the above metric space $(\mathbf{R}^n, d)$ induces the topology on Rⁿ defined earlier;
the discrete metric on any set X induces the discrete topology;
the open ball N(x, ε) itself is an open subset of X (the proof is an exact replica of the one in an earlier post, by replacing |x–y| with d(x, y)).

We say that a topology (X, T) is metrisable if there’s a metric d on X which induces it. But even if such a d exists, it is usually far from unique. For example, the above metrics d, d₁ and d_∞ on Rⁿ all induce the same topology as we will show a short while later.

Definition. Two metrics d and d’ on X are said to be topologically equivalent if they induce the same topology.

Theorem. This happens if and only if:

for any x in X and ε>0, there exists δ>0 such that $N_{d'}(x,\delta) \subseteq N_d(x, \epsilon)$ ;

for any x in X and ε>0, there exists δ>0 such that $N_{d}(x,\delta) \subseteq N_{d'}(x, \epsilon)$ ;

[ Since there’re two metrics here, we use the subscript N_d to denote the open ball induced by d. ]

Proof

Suppose d and d’ induce topologies T and T’ respectively. It suffices to show that the first condition is equivalent to that T’ is finer than T.

For the forward direction, suppose $U\in T$ , i.e. it is open in the metric space (X, d). Then for any x in U, $N_d(x, \epsilon)\subseteq U$ for some ε>0. By the first condition, $N_{d'}(x, \delta)\subseteq N_d(x,\epsilon)$ for some δ>0. Hence it follows that U is open in the metric space (X, d’) too.

For the converse, suppose let x be in U and ε>0. Then since $U:=N_d(x,\epsilon)$ is open in T, it is also open in T’. Hence $N_{d'}(x,\delta)\subseteq U$ for some δ>0. ♦

For example, the following diagram shows that d₁ and d are topologically equivalent on Rⁿ.

Conclusion: every metric space induces a topology and many different metrics on the same space can induce the same topology. We’ll see later that some topologies are inherently non-metrisable.

Posted in Notes | Tagged analysis, closed subsets, discrete topology, metric spaces, open balls, open subsets, topology, topoloical equivalence | Leave a comment

Burnside’s Lemma and Polya Enumeration Theorem (2)

Posted on January 15, 2013 by limsup

[ Acknowledgement: all the tedious algebraic expansions in this article were performed by wolframalpha. ]

Counting Graphs

One of the most surprising applications of Burnside’s lemma and Polya enumeration theorem is in counting the number of graphs up to isomorphism.

Problem. Find the number of simple graphs with n vertices, up to isomorphism.

Let V be a set of n vertices, and E be the set of all 2-element subsets of V. E.g. for n=5:

V = {a, b, c, d, e};
E = {{a, b}, {a, c}, {a, d}, {a, e}, {b, c}, {b, d}, {b, e}, {c, d}, {c, e}, {d, e}}.

The full symmetric group $G=S(V) \cong S_n$ then acts on E. Each graph corresponds to a colouring of E by Y = {black, white} and two graphs are isomorphic if and only if they’re isotopic under G – i.e. belong to the same orbit under G. Our objective is to compute the number of unique colourings by applying Polya enumeration theorem.

Clearly, c(g) – the number of orbits of g acting on E – only depends on the equivalence class of g in G; in particular for n=5, the number of cases is reduced from 5!=120 to 7.

g=e : c(g) = C(5, 2) = 10;
[10×] g=(a, b): edges ac and bc must share the same colour, as do {ad, bd}, {ae, be}, so c(g) = 7;
[20×] g=(a, b, c): edges ad, bd and cd must have the same colour, as do {ae, be, ce} and {ab, bc, ac}, so c(g) = 4;
[30×] g=(a, b, c, d): edges ae, be, ce and de have the same colour, as do {ab, bc, cd, da} and {ac, bd}, so c(g) = 3;
[24×] g=(a, b, c, d, e): we get two orbits {ab, bc, cd, de, ae} and {ac, bd, ce, ad, be} so c(g)=2;
[15×] g=(a, b)(c, d): we get {ae, be}, {ce, de}, {ac, bd}, {ad, bc}, {ab}, {cd} so c(g)=6;
[20×] g=(a, b)(c, d, e): we get {ab}, {cd, de, ce} and all the remaining edges form a single orbit so c(g)=3.

Hence the number of graphs with 5 vertices, up to isomorphism, is:

$\frac 1 {120} (1\times 2^{10} + 10\times 2^7 + 20\times 2^4 + 30\times 2^3 + 24\times 2^2 + 15\times 2^6 + 20\times 2^3) = 34.$

The following diagram lists all isomorphism classes of simple graphs with 5 vertices and up to 5 edges.

Counting Graphs II

In fact, with just a bit more effort, we can compute the number of graphs with 5 vertices and e edges, for each 0 ≤ e ≤ 10. Recall that each graph (ignoring symmetry) is simply a map E → {black, white}. Let X be the set of all such graphs; the group $G\cong S_5$ acts on X by permuting the edges around.

Now, we assign a weight to each colour: black→1 and white→0. Each graph E → {black, white} then gets a weight which is its number of edges. Let X_i be the set of elements in X with weight i; it’s easy to see that $|X_i| = C(10, i)$ .

Our objective is to compute $n_i := |G\backslash X_i|$ , the number of isomorphic classes of simple graphs with 5 vertices and i edges.

To that end, define the polynomial $P(t) := \sum_{i=0}^{10} n_i t^i$ . Burnside’s lemma gives:

$P(t) =\sum_i |G\backslash X_i| t^i = \frac 1 {|G|} \sum_i \sum_{g\in G} |X_i^g|t^i = \frac 1 {|G|}\sum_{g\in G}\left(\sum_i |X_i^g|t^i\right).$

Thus, we need to evaluate the power series $P_g(t) = \sum_i |X_i^g| t^i$ for each g.

g=e : since $X_i^g = X_i$ , this is just $P_g(t) = (1+t)^{10}$ .
g=(a, b) : as worked out earlier, the non-trivial orbits are {ac, bc}, {ad, bd} and {ae, be}. For each of these three orbits, the weight is either 0 or 2, so we get a factor of $(1+t^2)^3$ . For the remaining 4 singleton orbits, we get a factor of $(1+t)^4$ , so the summand is $P_g(t) = (1+t^2)^3 (1+t)^4$ .
g=(a, b, c): the orbits are {de}, {ad, bd, cd}, {ae, be, ce} and {ab, bc, ac}. Each of the three 3-element sets contributes a factor of $(1+t^3)$ while the singleton set contributes (1+t), hence the summand is $(1+t^3)^3(1+t)$ .

Eventually, we get all the seven power series.

Hence, P(t) equals:

$\begin{aligned}\frac 1 {120}(&(1+t)^{10}+10(1+t^2)^3(1+t)^4 + 20(1+t^3)^3(1+t) + 30(1+t^4)^2(1+t^2)\\&+24(1+t^5)^2 + 15(1+t^2)^4(1+t)^2+20(1+t)(1+t^3)(1+t^6)),\end{aligned}$

which gives $1+t+2t^2 + 4t^3+6t^4+6t^5+6t^6 + 4t^7+2t^8+t^9+t^{10}$ . Thus, there are 1 graph with 1 edge, 2 graphs with 2 edges, 4 graphs with 3 edges, … up to isomorphism.

Polya Enumeration Theorem Revisited

Inspired by the above example, we define:

Definition. Let G be a finite group acting on finite set X of size k. Then the cycle index is the polynomial:

$Z_G(t_1, t_2, \ldots, t_k) := \frac 1{|G|}\sum_{g\in G} t_1^{j_1(g)} t_2^{j_2(g)} \ldots t_k^{j_k(g)},$

where $j_c(g)$ is the number of cycles (or orbits) in g of length c.

Examples.

Consider S₃ acting on {1, 2, 3}. Then each term gives:
- g=e : $t_1^3$ ;
- [3×] g=(a, b): $t_1 t_2$ ;
- [2×] g=(a, b, c): $t_3$ , so we have $Z_G(t_1, t_2, t_3) = \frac 1 6(t_1^3 + 3t_1 t_2 + 2t_3)$ .
Consider $G\cong S_5$ acting on the set of edges E earlier. We have:

$Z_G(t_1, t_2, \ldots, t_{10}) = \frac 1 {120}(t_1^{10}+10t_1^4 t_2^3 + 20 t_1 t_3^3 + 30t_2 t_4^2 + 24t_5^2 + 15t_1 t_2^4 + 20t_1 t_3 t_6).$

Now we’re ready to describe a more general form of Polya enumeration theorem. As before, G acts on set X, and we consider the set of all colourings X→Y. But now each colour is endowed with a weight. If we let $X_i$ be the set of colourings with weight i, then our objective is to compute $n_i := |G\backslash X_i|$ via the power series

$P(t) := \sum_{i\ge 0} n_i t^i.$

As worked out before, to do that we need to compute:

$P_g(t) := \sum_{i\ge 0} |X_i^g| t^i$ , for each g in G.

Now if we write g as a product of cycles: g = (cycle 1)(cycle 2) … , where there are $j_c(g)$ cycles of length c, then each cycle must be of uniform colour! In other words, each cycle contributes a term of $(f_0+f_1 t^c + f_2 t^{2c} + f_3 t^{3c}+ \ldots)$ where $f_w$ is the number of colours of weight w. Hence:

$\begin{aligned}P_g(t) = &(f_0+f_1 t + f_2 t^2 +\ldots)^{j_1(g)} (f_0+f_1 t^2 + f_2 t^4+\ldots)^{j_2(g)}\\ &(f_0+f_1 t^3+f_2 t^6+\ldots)^{j_3(g)} \ldots\end{aligned}$

Writing $f(x) = f_0 + f_1 x + f_2 x^2 + \ldots$ , this gives: $P_g(t) = f(t)^{j_1(g)} f(t^2)^{j_2(g)} f(t^3)^{j_3(g)}\ldots$ . Summing up:

$P(t) = \frac 1 {|G|} \sum_{g\in G} P_g(t) = Z_G(f(t), f(t^2), f(t^3), \ldots).$

Let’s summarise all we’ve learnt.

Polya Enumeration Theorem B. Suppose we have $f_w$ colours of weight w and $f(x) := \sum_w f_w x^w$ . Then the number of colourings with a specified weight is given by the generating function:

$P(t) =Z_G(f(t), f(t^2), f(t^3), \ldots)$ .

E.g. in our earlier example of graph counting, we have f(x) = 1+x so the resulting generating function is $P(t) = Z_G(1+t, 1+t^2, 1+t^3, \ldots)$ as we saw above.

Example 1

On a 2k × 2k chessboard, each square is to be coloured black or white. Two colourings are identical if they can be obtained from each other by rotation or reflection. Find the generating function for the number of colourings with a specified number of black squares.

Solution

Let’s consider the symmetries of a square.

In the first case, we get k² cycles of length 4 each. In the second and third cases, we get 2k² cycles of length 2 each. In the final case, we get 2k cycles of length 1 and k(2k-1) cycles of length 2. Hence:

$Z_G(t_1, t_2, \ldots) = \frac 1 8 (t_1^{4k^2} + 2t_1^{2k}t_2^{k(2k-1)} + 3t_2^{2k^2} + 2t_4^{k^2}).$

The final answer is thus:

$\begin{aligned}f(t) &= Z_G(1+t, 1+t^2, \ldots)\\&= \frac 1 8 ((1+t)^{4k^2} + 2(1+t)^{2k}(1+t^2)^{k(2k-1)} + 3(1+t^2)^{2k^2} + 2(1+t^4)^{k^2}).\end{aligned}$

Here are some small examples.

k=1: $\begin{aligned} f(t) = t^4+t^3+2t^2+t+1\end{aligned}$ ;
k=2: $\begin{aligned} f(t)=&t^{16}+3t^{15}+21t^{14}+77t^{13}+252t^{12} +567t^{11}+1051 t^{10}+1465 t^9\\&+1674 t^8+1465 t^7+1051 t^6+567 t^5+252 t^4+77 t^3+21 t^2+3 t+1.\end{aligned}$

More generally, we can consider colours with vectorial weights. For example, if each colour has weight (v, w), then we write:

$f(x,y) = \sum_{v,w} f_{v,w} x^v y^w$

where $f_{v,w}$ denotes the number of colours of weight (v, w). Then Polya’s enumeration theorem can be further generalised: the number of colourings with a specified vectorial weight (v, w) is given by the coefficient of $t^v u^w$ in:

$P(t, u) = Z_G(f(t,u), f(t^2, u^2), f(t^3, u^3), \ldots).$

Example 2

Find the number of ways to colour the 12 edges of a cube with {red, green blue} such that each colour occurs on an even number of edges. Two colourings are considered identical if they’re obtained from each other via a rotation.

Solution

Label the 12 edges of a cube as follows, together with three axes of rotation.

Let’s compute the orbits of the various rotations. There should be 24 rotations in all.

Identity: 12 singleton orbits, {1}, {2}, …, {12}.
[6×] Rotation about axis 1 (90 deg): {1, 2, 3, 4}, {5, 6, 7, 8}, {9, 10, 11, 12}.
[3×] Rotation about axis 1 (180 deg): {1, 3}, {2, 4}, {5, 7}, {6, 8}, {9, 11}, {10, 12}.
[6×] Rotation about axis 2: {5}, {7}, and 5 orbits of size 2.
[8×] Rotation about axis 3: 4 orbits of size 3.

Hence the cycle index is:

$Z_G(t_1, t_2, \ldots) = \frac 1 {24}(t_1^{12}+6t_4^3 + 3t_2^6+6 t_1^2 t_2^5 + 8t_3^4).$

[ Note: without tracing the exact orbits, we can conclude that the rotation about axis 2 has exactly five orbits of size two. Indeed, since the rotation has order 2, each cycle size is either 1 or 2. The only fixed edges are 5 and 7 so the remaining edges must be paired into five orbits of size two each. ]

We need to compute the number of colourings where green and blue each occurs an even number of times, so assign weight to each colour: red=(0, 0), green=(1, 0) and blue=(0, 1). The weight polynomial is f(x, y) = 1+x+y so the resulting generating function is:

$\begin{aligned} P(t, u)= &\frac 1 {24}[(1+t+u)^{12} + 6(1+t^4+u^4)^3 + 3(1+t^2+u^2)^6\\&+6(1+t+u)^2 (1+t^2+u^2)^5 + 8(1+t^3+u^3)^4].\end{aligned}$

To sum up the coefficients of $t^i u^j$ where i and j are both even, compute the sum

$\frac 1 4(P(1, 1)+P(1,-1)+P(-1,1)+P(-1,-1))=5823$ . ♦

The Case of Full Symmetric Group

Suppose |X|=k and |Y|=n and G is the full symmetric group $S(X) \cong S_k$ . In the previous article, considering this group led us to the formulation of Stirling numbers. Let’s consider the case of the cycle index:

$Z_G(t_1, t_2, \ldots) = \sum_{g\in S_k} t_1^{j_1(g)} t_2^{j_2(g)}\ldots.$

What can we say about this power series?

To answer them, let’s assign some weights to the n colours. To distinguish all the n distinct colours, we’ll assign an (n-component) vectorial weight:

colour 1 ↔ (1, 0, …, 0), colour 2 ↔ (0, 1, …, 0), … …, colour n ↔ (0, 0, …, 1).

Hence, the weight of a colouring tells us the number of occurrences of each colour. The corresponding weight polynomial is then a linear function in n variables:

$f(x_1, x_2, \ldots, x_n) = x_1 + x_2 + \ldots + x_n.$

Applying the multivariate version of the Polya enumeration theorem, we see that the number of colourings of a specified weight $(w_1, w_2, \ldots, w_n)$ is given by the generating function:

$Z_G(f(u_1, u_2, \ldots, u_n), f(u_1^2, u_2^2, \ldots, u_n^2), \ldots) = Z_G(p_1, p_2, p_3, \ldots),$

where $p_i = u_1^i + u_2^i + \ldots + u_n^i$ is the i-th power symmetric function. The question is: how many colourings are of weight $(w_1, w_2, \ldots, w_n)$ , up to symmetry? Since we have the full symmetric group acting on X, the answer is obvious: 1 if $\sum_i w_i = k$ and 0 otherwise. Now if we multiply this by an auxiliary $t^k$ and sum over all k=|X|, we get the resulting power series:

$\begin{aligned}\sum_{k=0}^\infty Z_{S_k}(p_1, p_2, p_3, \ldots) t^k &= \sum_{w_1, \ldots,w_n\ge 0} u_1^{w_1} u_2^{w_2}\ldots u_n^{w_n}\cdot t^{\sum w_i}\\&= \frac 1 {1-u_1t}\cdot \frac 1{1-u_2t}\cdot\dots \cdot \frac 1 {1-u_n t}.\end{aligned}$

Writing each term

$(1-u_i t)^{-1} = \exp(\log(1-u_i t)) = \exp\left(u_i t + \frac {u_i^2 t^2} 2 + \frac{u_i^3 t^3} 3 + \ldots\right),$

we get the final result:

$\begin{aligned}\sum_{k=0}^\infty Z_{S_k}(p_1, p_2, \ldots)t^k = \exp\left(p_1 t + \frac {p_2} 2 t^2 + \frac {p_3} 3 t^3 + \ldots\right).\end{aligned}$

Example

To compute the cycle index for $G=S_5$ , let $T = p_1 t + \frac {p_2} 2 t^2 + \frac{p_3}3 t^3 + \ldots$ . We need to compute the expansion of exp(T) until T⁵ and find the coefficient of t⁵.

$T : p_5/5$ ;
$T^2 : p_1 p_4/2 + p_2 p_3/3$ ;
$T^3 : p_1^2 p_3 + 3p_1 p_2^2/4$ ;
$T^4 : 2p_1^3 p_2$ ;
$T^5 : p_1^5$ .

Hence, the final sum is

$\begin{aligned}&\frac{p_5}5 + \frac 1 {2!}\left(\frac{p_1 p_4}2 + \frac{p_2 p_3}3\right) + \frac 1 {3!}\left(p_1^2 p_3 + \frac {3 p_1 p_2^2}4\right) + \frac 1 {4!}\left(2p_1^3 p_2\right) + \frac 1 {5!}p_1^5 \\= &\frac 1 {120}\left(24p_5 + 30p_1p_4 + 20p_2p_3 + 20p_1^2 p_3 + 15 p_1 p_2^2 +10p_1^3 p_2 + p_1^5\right).\end{aligned}$

In conclusion, in the group $S_5$ , there are:

1 identity element,
24 elements of type (a, b, c, d, e),
30 elements of type (a, b, c, d),
20 elements of type (a, b)(c, d, e),
20 elements of type (a, b, c),
15 elements of type (a, b)(c, d), and
10 elements of type (a, b).

All these were obtained via generating functions.

Posted in Notes | Tagged advanced, burnside's lemma, combinatorics, generating functions, group actions, group theory, polya enumeration theorem, stirling numbers, symmetries | Leave a comment

Burnside’s Lemma and Polya Enumeration Theorem (1)

Posted on January 13, 2013 by limsup

[ Note: this article assumes you know some rudimentary theory of group actions. ]

Let’s consider the following combinatorial problem.

Problem. ABC is a given equilateral triangle. We wish to colour each of the three vertices A, B and C by one out of n possible colours. Furthermore, two colourings are considered identical if we can obtain one from the other by rotating or reflecting the triangle. How many distinct colourings are there?

For example, all the following colourings are considered identical.

Note that the symmetries comprise of three rotations in the top row and three reflections in the bottom row. More generally, the group of symmetries of a regular n-gon comprises of n rotations – including the identity rotation – as well as n reflections; this is called the dihedral group of order n.

For our case, the dihedral group D₃ is isomorphic to the full symmetric group S₃. Relabeling the vertices A, B and C as 1, 2 and 3, the six symmetries above are respectively given by 1, (1, 3, 2), (1, 2, 3), (2, 3), (1, 3) and (1, 2).

Without symmetries, we clearly have $n^3$ colourings since each vertex can take n colours independently. Letting X denote the set of these colourings, the group S₃ acts on X by permuting the vertices. Our objective is now to compute $|S_3\backslash X|$ , the number of orbits under this group action. To that end, the following result is phenomenally useful.

Burnside’s Lemma. Let G be a finite group acting on a finite set X. Then:

$|G\backslash X| = \frac 1 {|G|}\sum_{g\in G} |X^g|$ ,

where $X^g=\{x \in X: gx = x\}$ is the set of all elements of X fixed by g.

Proof.

Consider the set $S = \{(g,x)\in G\times X: gx = x\}$ . We’ll compute |S| in two ways. Summing over g, we get $|S| = \sum_{g\in G} X^g$ , the RHS.

Summing over x, we get $|S| = \sum_{x\in X} |G_x|$ , where $G_x$ is the isotropy group of x and $|G_x| = |G|/|Gx|$ , where Gx is the orbit of x. Hence,

$|S| = \sum_{x\in X}\frac{|G|}{|Gx|} = |G|\sum_{x\in X} \frac 1 {|Gx|} = |G|\times |G\backslash X|$ ,

where the last equality follows from the fact that since each element x contributes 1/(orbit size) to the sum, the grand total is just the number of orbits. Matching the two expressions, we’re done. ♦

Solution to Our Problem.

To compute $|S_3\backslash X|$ , let’s compute $|X^g|$ for each element $g\in S_3$ :

g=e : all colourings are fixed, so $|X^g| = |X| = n^3$ ;
g=(1, 2): a colouring lies in X^g if and only if vertices 1 and 2 have the same colour, this gives $|X^g| = n^2$ ;
g=(2, 3), (1, 3): similarly, $|X^g| = n^2$ ;
g=(1, 2, 3), (1, 3, 2): a colouring lies in X^g if and only if all vertices have the same colour; thus $|X^g| = n$ .

Hence, Burnside’s lemma tells us the answer to our problem is:

$\frac 1 6 (n^3 + 3n^2 + 2n) = \frac {n(n+1)(n+2)}6.$ ♦

Exercise.

Find the number of colourings of the vertices of a square ABCD by n colours, where two colourings are identical if they can be obtained from each other by rotation or reflection. What about a regular pentagon ABCDE? Or a solid tetrahedron ABCD? Or a solid tetrahedron excluding reflections?

Answer (highlight to read)

For square: n(n+1)(n²+n+2)/8.
For pentagon: n(n²+1)(n²+4)/10.
For tetrahedron with reflections: n(n+1)(n+2)(n+3)/24.
For tetrahedron without reflections: n²(n²+11)/12.

Polya Enumeration Theorem

A large class of problems can be classified under the case of “colouring problems“. Namely, suppose finite group G acts on a finite set of points X. If Y denotes a finite set of colours, the set of colourings is denoted by $S = Y^X$ , which is the set of all functions X→Y. The action of G on X extends to an action on S as follows:

If $g\in G$ and $f:X\to Y$ is a colouring, then the action is given by the composition

$g\cdot f = f\circ g^{-1}:X\to Y$ .

[ The inverse is necessary so that $(g_1g_2)\cdot f = g_1\cdot(g_2\cdot f)$ . ]

Burnside’s lemma then tells us that $|G\backslash S| = \frac 1{|G|} \sum_{g\in G}|S^g|$ . But $|S^g|$ is not hard to compute: if we express the action of g on X as a permutation, and c(g) denotes its number of cycles (or orbits), then $|S^g| = |Y|^{c(g)}$ since the points in each cycle must share the same colour. This gives:

$|G\backslash S| = \frac 1{|G|} \sum_{g\in G} n^{c(g)}$ , where n = |Y|.

In summary…

Polya Enumeration Theorem A. For a finite group G acting on a finite set X, the number of colourings X→Y up to symmetry (two colourings are identical if they can be obtained from each other via G-action) is given by:

$\frac 1{|G|} \sum_{g\in G} |Y|^{c(g)}$ ,

which is a polynomial in |Y|.

For example, in the case where X = {1, 2, 3} in our first problem. Then:

c(e) = 3;
c((1,2)) = c((2,3)) = c((1,3)) = 2;
c((1,2,3)) = c((1,3,2)) = 1.

And we obtain $\frac 1 6 (n^3 + 3n^2 + 2n)$ as above. Let’s consider more examples.

Example 1

On a 2k × 2k chessboard, each square is coloured with one out of n possible colours. Two colourings are identical if they can be obtained from each other by reflecting or rotating the chessboard. Find the number of colourings.

Solution

Let’s consider the 8 symmetries of the square:

The grey area shows the orbit representatives when we let each symmetry act on the 4n² squares. From the above diagram and Polya enumeration theorem, we get:

number of unique colourings = $\frac 1 8 (1\times n^{4k^2} + 2\times n^{k^2} + 3\times n^{2k^2} + 2\times n^{k(2k+1)})$ .

Notice that when k=1, this simplifies to $\frac 1 8(n^4 + 2n^3 + 3n^2 + 2n)$ which is precisely the number of ways to colour the four vertices of a square with n colours, up to rotational/reflection symmetry. ♦

Example 2

Consider a 3 × 3 chessboard. We wish to colour each of its 9 squares with one out of n colours. Two colourings are identical if we can obtain one from the other by first permuting the rows and then permuting the columns. How many unique colourings are there?

Solution

First note that we have G=S₃ acting on the set of 9 squares in two distinct ways. If S denotes the set of columns and T denotes the set of rows, then the whole board is simply S × T, with G × G acting on it component wise.

For each $(g, h)\in G\times G$ , consider the number of orbits c((g, h)). This only depends on the conjugancy classes of g and h. For example:

g=(1, 2), h=(1, 3): the orbits of (g, h) are given by {(1, 1), (2, 3)}, {(1, 2), (2, 2)}, {(1, 3), (2, 1)}, {(3, 1), (3, 3)}, and {(3, 2)};
g=(1, 2, 3), h=(1, 2, 3): the orbits of (g, h) are {(1, 1), (2, 2), (3, 3)}, {(1, 2), (2, 3), (3, 1)} and {(1, 3), (2, 1), (3, 2)}.

This gives us the following table:

so the solution is $\frac 1 {36} (n^9 + 6n^6 + 9n^5+8n^3 + 12n^2)$ . ♦

Stirling Numbers of the First Kind

Suppose the full symmetric group G = S(X) acts on X, and |X|=k, |Y|=n. Recall that Polya theorem gives the number of colourings as a polynomial in n (depending on k):

$\frac 1 {k!}\sum_{g\in S_k} n^{c(g)},$

where the coefficient of $n^c$ is the number of elements $g\in S_k$ with exactly c orbits.

On the other hand, we can simplify the expression by elementary combinatorics! Indeed, since we can permute the elements of X freely, the colouring is uniquely determined by the number of occurrences of each colour. To put it explicitly, denote the colours by 1, 2, …, n. Then we can permute the elements of X so that the colours occur in increasing order: (1, 1, …, 1, 2, 2, …, 2, 3, 3,… ).

Hence each colouring corresponds to a solution to the linear Diophantine equation:

$x_1 + x_2 + x_3 + \ldots + x_n = k$ ,

where each $x_i\ge 0$ is the number of occurrences of colour i. Elementary combinatorics then tell us the number of solutions is:

${n+k-1\choose k} = \frac 1 {k!} (n+k-1)(n+k-2)\ldots(n+1)n$ .

Equating the two expressions, we obtain:

Theorem. The number of elements of $g\in S_k$ with exactly c orbits is given by the coefficient of $x^c$ in the polynomial:

$x(x+1)(x+2)\ldots (x+k-1)$ .

This number is denoted by $\left[\begin{matrix}k\\c\end{matrix}\right]$ and is called a Stirling number of the first type.

For example, for k=5, we get:

$x(x+1)(x+2)(x+3)(x+4) = x^5 + 10x^4 + 35x^3 + 50x^2 + 24x,$

so there are 35 elements of $S_5$ with exactly 3 cycles (or orbits). These are of the form:

(a)(b)(c, d, e) : C(5, 3) × 2 = 20;
(a)(b, c)(d, e) : C(5, 2) × C(3, 2) / 2 = 15.

Exercises

Find the number of ways of colouring the 8 vertices of a cube with n colours, where two colourings are identical if one can be obtained from the other via rotation. [ Hint: there are 24 rotations of a cube. ]
Repeat example 1 above for a (2k+1) × (2k+1) chessboard, with n colours.
Find the number of ways to place k chairs around a table if there are n types of chairs available (each type of unlimited quantity). Two placings are identical if they can be obtained from each other via a rotation.
On a 8 × 8 chessboard, each square is coloured black or white such that every row and every column have an even number of black squares. Two colourings are identical if they can be obtained from each other by reflecting / rotating the square. Find the number of unique colourings.
Prove the following identities involving Stirling numbers of the first kind.
1. $\left[\begin{matrix}n+1\\k\end{matrix}\right] = n\left[\begin{matrix}n\\k\end{matrix}\right] + \left[\begin{matrix}n\\k-1\end{matrix}\right]$
2. $\left[\begin{matrix}n\\n-2\end{matrix}\right] = \frac 1 4(3n-1){n\choose 3}$
3. $\left[\begin{matrix}n\\2\end{matrix}\right] = (n-1)! (1 + \frac 1 2 + \frac 1 3 + \dots + \frac 1 {n-1})$ .

Answers (Highlight to Read)

Consider the rotations of a cube through an axis, with the corresponding terms in Polya formula:
- identity: c = 8;
- axis passes through a face (90 deg): c = 2 for 3×2 = 6 cases;
- axis passes through a face (180 deg): c = 4 for 3 cases;
- axis passes through an edge: c = 4 for 6 cases;
- axis passes through a vertex: c = 4 for 4×2 = 8 cases.
- Thus, answer = (n⁸ + 17n⁴ + 6n²)/24.
The terms in Polya formula:
- rotation of 90 deg: c = k² + k + 1;
- rotation of 180 deg: c = 2k² + 2k + 1;
- reflection about horizontal/vertical axis (2×): c = (k+1)(2k+1);
- reflection about diagonal (2×): c = (k+1)(2k+1).
- Thus, answer = (n^(4k² +4k + 1) + 2n^(k² + k + 1) + n^(2k² + 2k + 1) + 4n^((k+1)(2k+1)))/8.
For each rotation of t steps, t = 0, …, k-1, we have c(t) = gcd(t, k), where gcd(0, k) = k. On the other hand, for each d|k, there are exactly φ(k/d) numbers whose gcd with k is exactly d, where φ is the Euler totient function. Thus, the final sum is (1/k) ∑_d|k φ(k/d)n^d. Note that when k is prime, we get (n^k + (k-1)n)/k, so Fermat’s Little Theorem occurs as a corollary.
We can’t use Polya theorem directly, so let’s revert to Burnside’s lemma. The full set of colourings, excluding symmetry, is 2⁴⁹, by considering linear algebra over Z/2.
- Symmetry under ±90 deg rotation: we pick the top-left 4×4 square as a set of representatives. However, we have 3 linear relations (a priori 4, but 1 is superfluous), so total number = 2¹³.
- Symmetry under 180 deg rotation: pick the top-half 4×8 rectangle as a set of representatives. With 7 linear relations (a priori 8, but 1 is superfluous), this gives 2²⁵.
- Symmetry under reflection about horizontal axis: pick top half rectangle as representatives. With 4 linear relations, this gives 2²⁸. Same with vertical.
- Symmetry under reflection about diagonal axis: pick the 36 squares on or above the main diagonal. With 8 linear relations, this gives 2²⁸.
- Final answer = (2⁴⁹ + 2 × 2¹³ + 2²⁵ + 4 × 2²⁸)/8.

Posted in Notes | Tagged advanced, burnside's lemma, combinatorics, group actions, group theory, polya enumeration theorem, stirling numbers, symmetries | Leave a comment

Basic Analysis: Closed Subsets and Uniform Continuity

Posted on January 10, 2013 by limsup

Let’s consider another question: suppose f : D → R is continuous, where D is a subset of R. If (x_n) is a sequence in D converging to some real L, is it true that (f(x_n)) is also convergent? Now if L is in D, then we know that (f(x_n)) → (f(L)). But what if it’s not?

Examples

Let f(x) = 1/x, defined on (0, ∞). Take the sequence x_n= 1/n which converges to 0. Then the resulting sequence (f(x_n)) = (n) diverges.
Let f(x) = sin(1/x), defined on (0, ∞). Then the sequence x_n= 2/((2n-1)π) converges to 0 but the resulting (f(x_n)) = (1, -1, 1, -1, … ) diverges.

Closed Subsets

Let’s first consider the case where the limit L must lie in D. To that end, we have the following result.

Proposition. For an arbitrary subset $D\subseteq \mathbf{R}$ , the following are equivalent.

If (x_n) is a sequence in D converging to L, then L is in D.

D contains all its points of accumulations.

The complement D^c = R-D is an open subset of R.

Proof.

(1)→(2) : Let x be a point of accumulation of D. Hence for each ε = 1, 1/2, 1/3, …, there exists $x_n\in D$ , such that 0 < |x_n–x| < 1/n for each n. Then (x_n)→x must be in L.

(2)→(3) : Let $a\in D^c$ . Since a is not in D it’s not a point of accumulation of D. By definition, this means that there exists ε>0 such that the set of x satisfying 0 < |x–a| < ε does not intersect D. Together with the fact that a is not in D, we see that $N(a,\epsilon)\subseteq D^c$ , so D^c is open.

(3)→(1) : Let (x_n)→L. If $L\in D^c$ , an open subset of R, there must be an open ball $N(L,\epsilon)\subseteq D^c$ . On the other hand, since (x_n)→L, some $x_n\in D$ must lie within N(L, ε), which is a contradiction. ♦

Definition. We say that a subset $U\subset D$ is a closed subset if D-U is an open subset of D.

Warning. A subset of D can be both open and closed. E.g. the empty set (or the entire set D) is so.

Useful note: in particular, a closed and bounded subset D of R must have a maximum and minimum. Indeed, since D is bounded, let M be its supremum; we need to show that M lies in D. Suppose it doesn’t. For any positive ε, M-ε is not an upper bound of D, so there exists an element $x\in D$ , x > M-ε and x < M. This shows that M is a point of accumulation of D, and by the above proposition, it must lie in D, which contradicts our assumption. Thus M is a maximum of D. The case for minimum is similar.

Examples

Consider the set $D = [1, 2]\cup \{3\}$ . It is closed in R since the complement is $(-\infty, 1)\cup (2, 3)\cup (3,\infty)$ which is a union of open intervals. Clearly D is bounded. Being both closed and bounded, it has a minimum and a maximum (in this case, 1 and 3 respectively).
Consider the set $D = [1, 2)$ . It is not closed because its complement is $E=(-\infty, 1)\cup [2, \infty)$ and the point 2 has no open neighbourhood of R which is wholly contained in E. Or, look at it another way, if it were closed, then since it’s obviously bounded it would contain a maximum and a minimum, but 2 is the supremum of D which is not contained in D itself.
Consider the set $D = \{0\} \cup [1, \infty)$ . The complement is $(-\infty, 0)\cup (0, 1)$ which is a union of open intervals and is hence open. But it’s not upper-bounded so it doesn’t have a maximum.

From the above proposition, if D is a closed subset of R and f : D → R is continuous, then any convergent sequence in D gets mapped to a convergent sequence.

Important Exercises

Prove that f : D → R is continuous if and only if for each closed subset C of R, the set $f^{-1}(C)$ is closed in D.
Find a continuous function f : R → R and a closed subset C of R such that f(C) is not closed in R.
Prove that a union of finitely many closed subsets of D is still closed in D.
Prove that an intersection of (possibly infinitely many) closed subsets of D is still closed in D.
Suppose .
1. Prove that if C is closed in D, then B := C ∩ E is closed in E.
2. Prove that conversely, if B is closed in E, then B = C ∩ E for some closed subset C of D.

Answers [Highlight to read. Compare with the earlier case of open subsets. ]

If C is closed in R, then D – f^-1(C) = f^-1(D – C) is open in D iff f^-1(C) is closed in D. Then use the fact that f is continuous iff for each open subset U of R, f^-1(U) is open in D.
Let f be defined by f(x)=1/x for x>1 and f(x)=1 for x≤1. The set C=[1, ∞) is closed in R but f(C) = (0, 1] is not closed in R.
Follows from the fact that the intersection of finitely many open subsets of D is also open in D.
Follows from the fact that the union of arbitrarily many open subsets of D is also open in D.
For (A), if C is closed in D, then D–C is open in D and E–B = E – (C ∩ E) = (D–C) ∩ E is open in E by definition. Thus B is closed in E. For (B), since E–B is open in E, we can write E–B = U ∩ E for some open subset U of D. Then B = E – (U ∩ E) = (D–U) ∩ E where D–U is closed in D.

Uniform Continuity

Now let’s consider the case where D is not closed and (x_n) approaches an L not in D.

Definition. The function $f:D\to\mathbf{R}$ is uniformly continuous (on D) if:

for any ε>0, there is a δ>0 such that when x, y in D satisfies |x-y|<δ, we have |f(x)-f(y)|<ε.

Pictorially:

Once again we note the difference between standard continuity and uniform continuity. In the former case:

For every y in D and ε>0, there exists δ>0 such that for all x in D satisfying |x–y|<δ, we have |f(x)-f(y)|<ε.

In the latter case:

For every ε>0, there exists δ>0 such that for all x, y in D satisfying |x–y|<δ, we have |f(x)-f(y)|<ε.

Using logical quantifiers, these can be succinctly expressed as:

$\forall \epsilon, \forall y, \exists \delta, \forall x, P(x,y,\delta,\epsilon)$ and $\forall \epsilon,\exists\delta, \forall y, \forall x, P(x,y,\delta,\epsilon)$ ,

where P(x, y, δ, ε) says “|x–y|<δ implies |f(x)-f(y)|<ε”.

Examples of Uniform Continuity

Let f(x)=x on any subset S of R. We claim that f is uniformly continuous. Indeed, for any ε>0, let δ=ε. Then whenever |x–y| < δ and x, y in S, we easily get |f(x)-f(y)| = |x–y| < δ = ε, which satisfies the definition for uniform continuity.
Let f(x)=1/x on (0, ∞). We shall prove that f is not uniformly continuous. Negating the definition of uniform continuity, we need to find an ε>0 such that for any δ>0, we can find x, y>0 with |x–y|<δ but |1/x-1/y|≥ε. So let’s take ε=1; for any δ>0, pick x=δ and y = δ/(δ+1) < x. Then |x–y| = x–y < δ but |1/x-1/y|=1.
On the other hand, f(x)=1/x on [1, 2] is uniformly continuous. Indeed, whenever 1 ≤ x, y ≤ 2, we have |1/x – 1/y| = |x–y|/|xy| ≤ |x–y| since |xy| ≥ 1. Hence, given any ε>0, we can just let δ=ε. Now whenever x, y in [1, 2] satisfy |x–y| < δ, we also have |f(x)-f(y)| ≤ |x–y| < δ = ε.

Examples 2 and 3 tell us that uniform continuity depends heavily on the domain of the function.

Main Theorems

The first main result in this article is:

Theorem. If (x_n) is a Cauchy sequence in D and f : D → R is a uniformly continuous function, then (f(x_n)) is also a Cauchy sequence.

Proof. Let ε>0.

Since f is uniformly continuous, pick δ>0 such that when x, y in D satisfies |x–y|<δ, we have |f(x)-f(y)|<ε.
Since (x_n) is Cauchy, pick an N such that when m, n > N, we have |x_m–x_n|<δ.

This implies that when m, n > N, |f(x_m)-f(x_n)|<ε. ♦

For example, consider f(x)=1/x on (0, ∞). The sequence $x_n = 1/n$ is Cauchy but the resulting $f(x_n) = n$ is not a Cauchy sequence. Hence, this tells us f(x) is not uniformly continuous, as we had verified earlier directly in example 2 above.

Finally, let’s combine uniform convergence and uniform continuity.

Theorem. Suppose $f_n:D \to \mathbf{R}$ is a sequence of uniformly continuous functions on D, which converges uniformly to $f:D\to\mathbf{R}$ . Then $f:D\to\mathbf{R}$ is also uniformly continuous.

Proof.

Let ε>0. Then by uniformly convergence, there exists N such that whenever n > N, we have $||f_n-f||<\epsilon$ . Fix n. Since $f_n:D\to\mathbf{R}$ is uniformly continuous, there exists δ>0 such that whenever $x,y\in D$ satisfies |x–y|<δ, we also have $|f_n(x)-f_n(y)|<\epsilon$ . Hence:

$|f(x)-f(y)| \le |f(x)-f_n(x)|+|f_n(x)-f_n(y)|+|f_n(y)-f(y)|<\epsilon+\epsilon+\epsilon = 3\epsilon$ .

Now just go back and replace ε by ε/3. ♦

Posted in Notes | Tagged advanced, analysis, closed subsets, continuity, points of accumulation, uniform continuity, uniform convergence | Leave a comment

Basic Analysis: Uniform Convergence

Posted on December 23, 2012 by limsup

Once again, let $D\subseteq \mathbf{R}$ be a subset. Suppose we now have a sequence of functions

$f_n : D\to\mathbf{R}$ , where n = 1, 2, 3, … ,

such that for each x in D, the sequence $(f_n(x))$ converges to some real value. We’ll denote this value by f(x), thus defining a function f : D → R. We then say that the sequence of functions f_n(x) converges to f(x) pointwise.

The big question is: if each f_n(x) is continuous at x=a, is it true that f(x) is also continuous at x=a?

It may surprise you that the answer is no. [ Historical note: Cauchy actually believed the answer is yes and provided an erroneous proof. This shouldn’t be held against Cauchy since analysis was still in its infancy in the 19th century. On the contrary, one should use this as an indication of how counter-intuitive results in analysis can be. ]

Example

Consider the sequence of functions

$f_n : [0,1]\to\mathbf{R}, \quad f_n(x) = x^n.$

If 0≤x<1, then clearly f_n(x) → 0 as n approaches infinity. On the other hand, for x=1, the sequence is a constant sequence (1, 1, 1, …). Thus f_n(x) converges to f(x) pointwise, where f takes any element in [0,1) to 0, and 1 to 1.

(Graphs provided by wolframalpha)

Uniform Convergence

In order to ensure that f is continuous, we need a further condition.

Definition. The sequence of functions f_n(x) is said to converge uniformly to f(x) if:

for each ε>0, there exists N such that whenever n>N and x is in D, we have |f_n(x)-f(x)|<ε.

Pictorially, we have:

[ For large n, the function f_n(x) lies between f(x)-ε and f(x)+ε. ]

Let’s consider the difference between pointwise convergence and uniform convergence. In the former case, we have:

For each x in D and ε>0, there exists N such that for all n>N, we have |f_n(x)-f(x)|<ε.

On the other hand, the latter says:

For each ε>0, there exists N such that for all x in D and n>N, we have |f_n(x)-f(x)|<ε.

Using logical quantifiers, the two statements become:

$\forall x,\forall\epsilon, \exists N, P(N,x,\epsilon)$ and $\forall \epsilon, \exists N, \forall x, P(N,x,\epsilon)$ ,

where P(N, x, ε) says that whenever n>N, we have |f_n(x)-f(x)|<ε. Written in this manner, it’s clear that uniform convergence implies pointwise convergence.

Main Therorem. If f_n(x) converges to f(x) uniformly, and each f_n(x) is continuous at x=a, then so is f(x).

Proof. Let ε>0.

Since f_n(x) → f(x) uniformly, there exists N such that for all x in D and n>N, we have |f_n(x)-f(x)|<ε/3. Fix any n>N.
Since f_n(x) is continuous at x=a, there exists δ>0 such that for all x in D satisfying |x–a|<δ, we have |f_n(x)-f_n(a)|<ε/3, which gives:

$|f(x)-f(a)|\le |f(x)-f_n(x)|+|f_n(x)-f_n(a)|+|f_n(a)-f(a)|<\frac\epsilon 3+\frac\epsilon 3+\frac\epsilon 3=\epsilon.$ ♦

Alternate Definition

Before going through some examples, we consider an alternate definition of uniform convergence which is somewhat conceptually clearer.

Consider the set V of all functions f : D → R. This forms a real vector space via pointwise addition and scalar multiplication: i.e.

$(f+g)(x) := f(x) + g(x), \quad (c\cdot f)(x) = c\cdot f(x), \ c\in\mathbf{R}.$

Now we define the norm of f to be $||f|| := \sup\{|f(x)| : x\in D\}$ which is either a non-negative real number or +∞. It’s not hard to show that $||f+g|| \le ||f||+||g||$ , $||fg|| \le ||f||\cdot||g||$ and $||c\cdot f|| = |c|\cdot ||f||$ .

[ Indeed, to prove the first inequality, just note that for any x in D, we have |(f+g)(x)| = |f(x)+g(x)| ≤ |f(x)|+|g(x)| ≤ RHS. The same proof holds for the second inequality. ]

Alternate Definition. The sequence $f_n(x) \to f(x)$ uniformly if:

for every ε>0, there exists N such that when n>N, we have $||f_n - f||<\epsilon.$

Note that this definition is quite similar to that of convergent sequences, which makes it convenient to work with. [ Side note: the acute reader may suspect that one can study analysis on the above function space V and he’d be right! Indeed, this is the primary motivation behind functional analysis. ]

Examples

Let’s go back to the example $f_n(x) = x^n$ on D=[0,1]. Since the functions converge to a discontinuous function, the above theorem tells us the convergence is not uniform, but let’s verify this directly. Indeed, the function $f_n-f$ takes xⁿ on [0,1) and 0 at x=1. Thus, $||f_n-f||=1$ for all n, which shows that the convergence is not uniform.
On the other hand, if we take $f_n(x) = x^n$ on D=[0, 1/2], then they converge to f(x) = 0 uniformly. Indeed, we have $||f_n - f|| = ||f_n|| = 1/2^n$ for each n, so $\lim_{n\to\infty} ||f_n - f|| = 0$ . As expected, since each f_n(x) is continuous, so is the resulting converged function f(x).
If we now take $f_n(x) = x^n$ on D=[0, 1), then they converge to f(x) = 0, albeit not uniformly. Indeed, it’s easy to see that $||f_n - f|| = 1$ for each n. In other words, one can have pointwise convergence of a sequence of continuous functions to a continuous function, even if the convergence is not uniform.
Consider the Riemann zeta function on (1, ∞). We know this converges at every x>1, but is the function continuous? Indeed, it is: via the following useful trick. First, write , so that pointwise. Suppose we wish to show that ζ(x) is continuous at x=a>1.
- Pick any b, 1<b<a.
- Let g_n (resp. g) be the restriction of f_n (resp. f) to [b, ∞).
- Now $||g_n - g||=(n+1)^{-x} + (n+2)^{-x} + \ldots \le (n+1)^{-b} + (n+2)^{-b} + \ldots$ which is a convergent series since b>1.
- Hence $||g_n-g|| \to 0$ , which shows that g_n converges uniformly to g.
- Thus means g is continuous, in particular f is continuous at a. Hence, f is continuous at any a>1.

Basic Properties of Uniform Convergence

Let’s figure out some basic properties, specifically suppose $f_n : D\to \mathbf{R}$ and $g_n : D\to\mathbf{R}$ are sequences of functions which converge uniformly to $f, g:D\to\mathbf{R}$ respectively. What about the sum $f_n+g_n$ , product $f_n g_n$ and reciprocal $1/g_n$ ?

1. The case of sum is easy.

Indeed, $f_n + g_n \to f+g$ uniformly. To see why, write:

$||(f_n+g_n)-(f+g)|| = ||(f_n-f)+(g_n-g)|| \le ||f_n-f|| + ||g_n-g||.$

Now, for any $\epsilon>0$ , $||f_n - f|| < \frac\epsilon 2$ and $||g_n-g|| < \frac\epsilon 2$ for large n. Thus, we also have $||(f_n+g_n)-(f+g)|| < \epsilon$ for large n.

2. The case of product is true with some additional conditions.

Indeed, let’s write:

$||f_n g_n - fg || = ||(f_n-f)g_n + f(g_n - g)|| \le ||f_n - f||\cdot ||g_n|| + ||f||\cdot ||g_n - g||.$

Now, if ||f|| and ||g|| are both finite, then in particular, we have $||g_n|| \le ||g_n - g|| + ||g|| < ||g||+1$ for large n. It then follows that we can write $||f||, ||g|| < B$ for some positive bound B and $||f_n g_n - fg|| \le B(||f_n - f|| + ||g_n - g||)$ . Hence, it also follows that $f_n g_n \to fg$ converges uniformly.

What if either ||f|| or ||g|| is infinite? A counter-example is easy:

$f_n, g_n:\mathbf{R} \to\mathbf{R}, \ f_n(x) = x, g_n(x) = \frac 1 n.$

Then $f_n(x) \to x$ and $g_n(x) \to 0$ uniformly. However, $||f_n g_n - 0|| = \sup\{\frac {|x|} n: x\in\mathbf{R}\} = \infty$ .

3. The case of reciprocal is true with some additional conditions.

Let’s consider when $1/g_n \to 1/g$ uniformly, assuming that g(x) and g_n(x) are never 0. Taking the difference gives:

$||\frac{1}{g_n} - \frac{1}{g}|| = ||\frac{g_n - g}{g_n g}|| \le ||g_n -g|| \cdot ||\frac{1}{g_n}||\cdot ||\frac 1 g||.$

Hence, suppose there is a positive B such that $||1/g_n|| < B$ for all n. This means $|g_n(x)| > 1/B$ for all n, and $x\in D$ . For some N, we have $||g_n - g|| < 1/(2B)$ , i.e. $|g_n(x) - g(x)| < 1/(2B)$ for all n>N and $x\in D$ . Thus:

$|g(x)| \ge |g_n(x)| - |g_n(x)-g(x)| > 1/B - 1/(2B) = 1/(2B),$ when $x\in D$ , i.e. $||1/g||\le 2B$ .

Thus, $||\frac 1 {g_n}-\frac 1 g|| \le ||g_n - g||\cdot B\cdot(2B)$ for all n and it follows that $1/g_n \to 1/g$ converges uniformly.

What if such a B doesn’t exist, i.e. the infimum of all $||1/g_n||$ is zero? Consider the example of $g_n: (0, 1]\to\mathbf{R}, g_n(x)=(1+\frac 1 n)x$ $\$ . Then $g_n(x)$ converges to g(x)=x uniformly since $||g_n-g|| = ||\frac x n||=\frac 1 n$ . On the other hand,

$||\frac 1 {g_n}-\frac 1 g|| = ||\frac n {(n+1)x}-\frac 1 x|| = ||\frac 1 {(n+1)x}||=+\infty,$

so $1/g_n \to 1/g$ pointwise, but not uniformly.

Summary. Suppose $f_n \to f$ and $g_n \to g$ uniformly, where each $f_n, g_n, f, g:D\to\mathbf{R}$ . Then:

$f_n+g_n\to f+g$ uniformly as well;

if $||f||, ||g||<\infty$ , then $f_n g_n \to fg$ uniformly;

if there is a positive B such that $||g_n||>B$ for all n, then $1/g_n \to 1/g$ uniformly.

Posted in Notes | Tagged advanced, analysis, continuity, convergence, pointwise convergence, series, uniform convergence | 1 Comment

Basic Analysis: Differentiation (2)

Posted on December 2, 2012 by limsup

Finding Extremum Points

One of the most common applications of differentiation is in finding all local maximum and minimum points.

Definition. We say f(x) has a local maximum (resp. minimum) at x=a, if there is an open interval (b, c) containing a, such that f(x) attains a maximum (resp. minimum) in this interval at x=a.

Theorem. If f(x) has a local maximum or minimum at x=a and f'(a) exists, then f'(a) = 0.

Proof.

We’ll only prove it for maximum (for minimum, replace f with –f). Restrict f to a small open interval about a so that f(x) takes the maximum there.

By definition, the derivative f’(a) is $\lim_{x\to a}\frac{f(x)-f(a)}{x-a}$ . Now when x>a, we have f(x)≤f(a), so $\frac{f(x)-f(a)}{x-a}\le 0$ and so the limit ≤ 0. On the other hand, when x<a, the inequality f(x)≤f(a) also gives $\frac{f(x)-f(a)}{x-a}\ge 0$ so the limit ≥ 0. These two inequalities force the limit to be 0. ♦

Example 1

We have a piece of wire of length 2L. We wish to bend it to form an isosceles triangle so that the area is maximum. What’re the optimum dimensions?

Unlike secondary school calculus, we’ll solve this in a rigourous manner. Bear with us if the reasoning below seems a little lengthy, but we wish to have no gaps in our proof.

First, it’s not even clear a maximum exists even if the area is upper-bounded. To solve this problem: we’ll quote a theorem without proof.

Theorem. If C is a closed and bounded subset of Rⁿ, and f:C→Rⁿ is continuous, then f(C) is also closed and bounded.

In particular, since f(C) is bounded it has a supremum and an infimum. Since it’s closed, both these values must belong to f(C). Thus, f(C) has a maximum and minimum, which is the crucial property we need.

Here, a subset $C\subseteq \mathbf{R}^n$ is closed if its complement Rⁿ-C is an open subset of Rⁿ. We’ll have more to say about closed subsets in a later article.

[ Question to think about: is it possible to find C closed but not bounded, and continuous f, such that f(C) is neither closed nor bounded? What about a bounded but not closed C? ]

Proving this requires the concept of compactness, which we’ll delay until a later article. We urge the reader to take our word for now.

Back to the problem. Bending the wire into a triangle of side lengths a, a, b gives 2a+b = 2L. By Heron’s formula, the area is:

$\sqrt{L\cdot (L-a)^2 (L-b)}$

so we need to maximise $f(a) := (L-a)^2 (L-b) = (L-a)^2 (2a -L)$ . Now to form a triangle, we need L/2 ≤ a ≤ L which gives a function f : [L/2, L] → R. Since [L/2, L] is closed and bounded, the image of f attains a maximum and a minimum.

Suppose f(x) attains a maximum at x=a. Then a either lies in (L/2, L) or a=L/2 or a=L. The last two cases are out since f(L/2) = f(L) = 0. The first case is a local maximum which gives f’(a) = 0. But:

$f(x) = (L-x)^2 (2x-L) = 2x^3 - 5L\cdot x^2 + 4L^2\cdot x - L^3 \implies \frac{df}{dx} = 6x^2 - 10L\cdot x + 4L^2.$

Setting $\frac{df}{dx} = 0$ gives us x=L or (2/3)L. Clearly, the latter is the correct answer. So the optimum dimensions are (2L/3, 2L/3, 2L/3), an equilateral triangle.

Example 2.

Suppose we have a piece of wire of fixed length 2L and we wish to bend it to form a triangle. What is the largest possible area of the triangle?

First, label the sides of the triangle by a, b, 2L-(a+b). To form a triangle, we need the following inequalities:

$a+b \ge 2L-(a+b) \implies a+b \ge L$ ;
$a+2L-(a+b) \ge b \implies 0\le b \le L$ ;
$b+2L-(a+b) \ge a \implies 0\le a\le L$ .

Thus, we get a map f : C → R, where C is the subset of R² defined by the above inequalities and f denotes the area of the resulting triangle. Since each inequality defines a closed subset, the resulting intersection is also closed. Finally, the second and third inequalities clearly imply region C is bounded.

Hence, f(C) is closed and bounded, so in particular it attains a maximum and a minimum. Suppose f attains a maximum at (a, b) in C.

Now fix a and vary b, where max{0, L–a} ≤ b ≤ L. Now either b is an edge-point in this region, or it’s a local maximum for f(a, b). But the edge-points give f = 0, so it must be a local maximum. Write the area as the square root of $L(L-a)(L-b)(a+b-L)$ . Since b varies, we differentiate this with respect to b and set it to zero, which gives b = L-(a/2), i.e. the sides are a, L-(a/2) and L-(a/2) so we get an isosceles triangle, which reduces this to the previous problem.

Conclusion: the largest area occurs when the triangle is equilateral.

In summary, since the image of a closed and bounded set under a continuous map is closed and bounded, it must attain a maximum and minimum (if the map is real-valued). Both these extrema points are either in the interior, in which case the derivative is zero since it’s a local extremum, or on a boundary.

Rolle’s Theorem

This says:

Rolle’s Theorem. Suppose f:[a, b] → R (a<b) is a continuous function which is differentiable in the interior (a, b). If f(a) = f(b), then there is a point a < x < b for which f'(x) = 0.

Proof.

Since [a, b] is closed and bounded, we know from the above that f([a, b]) attains a maximum at x and minimum at y. Now if a < x < b, then x is a local extremum so we have f’(x) = 0 as desired. Same goes for y. The only remaining possibility is that both x and y are edge-points, i.e. a or b. But that means f(a)=f(b) is both the maximum and the minimum so f(x) is constant on [a, b] in which case any a < x < b would do. ♦

This can be generalised as follows.

Mean Value Theorem. Suppose f:[a, b] → R (a<b) is a continuous function which is differentiable in the interior (a, b). There exists x in (a, b) such that $f'(x) = \frac{f(b)-f(a)}{b-a}$ .

Proof

Apply a slight transformation to reduce it to the case of Rolle’s theorem: let $g(x) = f(x) - cx$ , for some constant c such that g(a) = g(b). Clearly, $c=\frac{f(b)-f(a)}{b-a}$ . Rolle’s theorem says there exists x in (a, b) such that g’(x) = 0, whence we have $f'(x) = g'(x) + c = c$ . ♦

One particularly useful corollary is as follows.

Corollary. Suppose f : (a, b) → R is differentiable and its derivative at every a < x < b is positive. Then f is strictly increasing, i.e. if x < y, then f(x) < f(y).

Proof.

If x < y and f(x) ≥ f(y), then by Mean Value Theorem, we can find x < z < y such that $f'(z) = \frac{f(y)-f(x)}{y-x} \le 0$ , which is a contradiction. ♦

Example 3.

Suppose we have the function f : (0, ∞) → R, given by $f(x) = x^{1/x}$ . Let’s check the behaviour of f(x) as x increases. The above results suggest that the derivative is helpful: indeed, writing $f(x) = \exp(\log(x)/x)$ gives:

$f'(x) = \exp(\log(x)/x)\frac{x \cdot (1/x) - \log(x)}{x^2} = \frac{1 - \log(x)}{x^2}.$

We thus see that:

for 0 < x < e, f’(x) > 0 so the function is strictly increasing on (0, e);
for x > e, f’(x) < 0 so the function is strictly decreasing on (e, ∞);
when x = e, f’(x) = 0.

We claim that there’s a local maximum at x=e. Indeed, if x > e satisfies f(x) ≥ f(e), then Mean Value Theorem says there exists y, x > y > e such that $f'(y) = \frac{f(x)-f(e)}{x-e}\ge 0$ which contradicts the second point. Likewise, if x < e satisfies f(x) ≥ f(e), then this contradicts the first point.

Exercise

Prove the Extended Mean Value Theorem : suppose f and g are continuous functions [a, b] → R which are differentiable on (a, b). If g’(x) ≠ 0 for all x in (a, b), then there exists x, a < x < b, such that $\frac{f'(x)}{g'(x)} = \frac{f(b)-f(a)}{g(b)-g(a)}$ .

[ Answer (highlight to read) : find a constant c such that h(x) := f(x) + c·g(x) satisfies h(a) = h(b). Show that c exists. Apply Rolle’s theorem to h. ]

L’Hopital’s Rule

This theorem is useful for evaluating the limit of f(x)/g(x) when both the numerator and denominator approach 0. However, we have to be careful in stating it since there’re many traps for the unwary. First, we consider one variation.

Theorem (L’Hopital’s Rule). Suppose f and g are differentiable functions (a, b) – {c} → R. If:

$\lim_{x\to c} f(x) = \lim_{x\to c} g(x) = 0$ ,

$g'(x) \ne 0$ for all x in (a, b), x ≠ c,

$\lim_{x\to c}\frac {f'(x)}{g'(x)} = L$ ,

then $\lim_{x\to c}\frac{f(x)}{g(x)} = L$ .

Proof.

We’ll need the Extended Mean Value Theorem in the above exercise: for every x in (a, b), x≠c, there exists y between x and c such that:

$\frac{f'(y)}{g'(y)} = \frac{f(x) - f(c)}{g(x) - g(c)} = \frac{f(x)}{g(x)}.$

Since f’(x)/g’(x) → L, for every ε>0, |f’(x)/g’(x) – L| < ε when x is close to c. The above equality tells us that |f(x)/g(x) – L| < ε when x is close to c. This proves the theorem. ♦

Warning

Even if f’(x)/g’(x) does not converge as x → c, it may still be possible for $\lim_{x\to c} f(x)/g(x)$ to converge. E.g. take our earlier example of $f(x) = x^2 \sin(1/x)$ and g(x) = x, defined on x≠0. Then the first two conditions are satisfied, but now

$\lim_{x\to 0}\frac{f'(x)}{g'(x)} = 2x\sin(1/x)-\cos(1/x)$

which doesn’t converge. On the other hand $\lim_{x\to 0} \frac{f(x)}{g(x)}\to 0$ by squeeze theorem. In summary, even if f’(x)/g’(x) fails to converge as x→c, it’s still possible for f(x)/g(x) to converge.

Example 4.

Suppose f(x) = 3sin(x) – sin(3x) and g(x) = x – sin(x). The first two conditions of L’Hopital’s Rule are satisfied. So $\lim_{x\to 0} \frac{f(x)}{g(x)}=L$ if $\lim_{x\to 0} \frac{f'(x)}{g'(x)}=L$ . [ Note that the converse isn’t true, as the above warning tells us. ]

So, we write:

$\frac{f'(x)}{g'(x)} = \frac{3\cos(x) - 3\cos(3x)}{1-\cos(x)}.$

Again the first two conditions of L’Hopital’s Rule are satisfied for this expression, so let’s look at the next derivative:

$\frac{f''(x)}{g''(x)} = \frac{-3\sin(x)+9\sin(3x)}{\sin(x)}.$

The same thing happens, so we take the derivative again:

$\frac{f'''(x)}{g'''(x)} = \frac{-3\cos(x)+27\cos(3x)}{\cos(x)}.$

This time, as x→0, the expression approaches 24. Thus, the original f(x)/g(x)→24 as x→0.

Posted in Notes | Tagged advanced, analysis, compact sets, differentiation, mean value theorem, rolle's theorem | Leave a comment

Basic Analysis: Differentiation (1)

Posted on November 28, 2012 by limsup

In this article, we’ll look at differentiation more rigourously and carefully. Throughout this article, we suppose f is a real-valued function defined on an open interval (b, c) containing a, i.e. f : (b, c) → R with b < a < c.

Theorem. The derivative of f(x) at a is defined by:

$f'(a) := \lim_{x\to a} \frac{f(x) - f(a)}{x-a}.$

If the limit exists, we also say f(x) is differentiable at x=a. If f(x) is differentiable at every point in (b, c), then we get a function $(b,c) \to \mathbf{R}$ , $x\mapsto f'(x)$ . The resulting function is then written as $\frac{df}{dx}$ .

Pictorially, the derivative measures the gradient of the tangent at x=a:

Example 1

Consider f : R → R, given by f(x) = x³. At any point x=a, we have:

$f'(a) = \lim_{x\to a}\frac{x^3 - a^3}{x-a} = \lim_{x\to a}\frac{(x-a)(x^2+xa+a^2)}{x-a}$

which is equal to $x^2 + xa + a^2$ when x≠a. Hence, as x→a, the expression tends to $a^2 + aa + a^2 = 3a^2$ . Conclusion: the derivative exists at every point a on the real line and $\frac{df}{dx} = 3x^2$ .

Example 2

Consider f : R-{0} → R, given by f(x) = 1/x. At any point x=a≠0, note that f(x) is defined on an open interval containing a. Then:

$f'(a) = \lim_{x\to a}\frac{x^{-1} - a^{-1}}{x-a} = \lim_{x\to a} -\frac 1{ax} = -\frac 1 {a^2}$ .

Hence, $\frac{df}{dx} = x^{-2}$ . Conclusion: the derivative exists for every non-zero a.

Example 3

Take the function f : R → R, given by f(x) = |x|. If x=a>0, then f(x)=x on an open interval containing a so it’s easy to check that f’(a) = 1. Likewise, if x=a<0, then f’(a) = -1. Finally, we shall show that f’(0) does not exist. But f’(0) is the limit of g(x) := (|x|-|0|)/(x-0) = |x|/x as x tends to 0. Note that g(x)=1 for x>0, and g(x)=-1 for x<0. Hence, the left limit is -1 while the right limit is +1. Since the two limits don’t match, the limit doesn’t exist. Conclusion: the derivative exists for every non-zero a.

Properties of the Derivative

We have the following basic properties.

Theorem. Let f(x) and g(x) be defined on an open interval about a. If they are both differentiable at x=a, then:

f(x) is continuous at x=a;

(addition rule) the derivative of the sum (f+g)(x) at x=a is (f+g)'(a) = f'(a) + g'(a);

(product rule) the derivative of the product (fg)(x) at x=a is (fg)'(a) = f'(a)g(a) + f(a)g'(a).

Proof.

1. For the first property, we have $f'(a) = \lim_{x\to a}\frac{f(x)-f(a)}{x-a}$ , so:

$\lim_{x\to a} (f(x) - f(a)) = \lim_{x\to a}\frac{f(x)-f(a)}{x-a} \cdot \lim_{x\to a}(x-a)$

if both limits on the RHS exist. But they do: the first is f’(a) while the second is 0. So the LHS tends to 0 and f(x)→f(a) as x→a as desired.

2. For the second property, write:

$(f+g)'(a) = \lim_{x\to a}\frac{(f+g)(x)-(f+g)(a)}{x-a} = \lim_{x\to a}\frac{f(x)-f(a)}{x-a} + \lim_{x\to a}\frac{g(x)-g(a)}{x-a}.$

The two limits on the right are f’(a) and g’(a) respectively. Hence, the middle limit equals f’(a)+g’(a) and we’re done.

3. For the final property, write (fg)'(a) as:

$\lim_{x\to a}\frac{f(x)g(x) - f(a)g(a)}{x-a} = \lim_{x\to a} f(x)\frac{g(x)-g(a)}{x-a} + \lim_{x\to a}\frac{f(x)-f(a)}{x-a}\cdot g(a).$

Since f is differentiable at x=a, it’s continuous there also. Thus, the limits on the RHS all exist the expression approaches f(a)g’(a) + f’(a)g(a), as desired. ♦

The next property is extremely useful computationally.

Theorem (Chain Rule). Suppose f(x) (resp. g(x)) is defined on an open interval containing a (resp. f(a)). If f'(a) and g'(f(a)) both exist, then the composed function h = g°f gives h'(a) = g'(f(a))·f'(a).

Proof.

One’s inclined to write:

$\frac{h(x)-h(a)}{x-a} = \frac{g(f(x))-g(f(a))}{x-a} = \frac{(g(f(x))-g(f(a))}{f(x)-f(a)} \cdot \frac{f(x)-f(a)}{x-a}$ ,

if f(x) ≠ f(a), and let x→a. But the problem is that f(x)=f(a) may occur even when x is arbitrarily close to a. This is a slight technical issue, which we’ll counter by defining:

$G(y) = \begin{cases} \frac{g(y) - g(f(a))}{y-a}, &\text{ if } y\ne f(a), \\ g'(f(a)), &\text{ if } y=a.\end{cases}$

Then G(y) is defined on an open interval about f(a) and is continuous there. Now we can write the above equality as:

$\frac{g(f(x))-g(f(a))}{x-a} = G(f(x))\cdot\frac{f(x)-f(a)}{x-a}$ ,

if x ≠ a. Now since G is continuous at f(a) and f is continuous at a, taking the limit x→a gives (g°f)'(a) = g’(f(a)) f’(a) on both sides. ♦

Now we can describe the derivative of f(x)/g(x).

Corollary. If f(x) and g(x) are differentiable at x=a, and g'(a)≠0, then

1/g is differentiable at x=a and (1/g)'(a) = -g'(a)/g(a)².

f/g is differentiable at x=a and $(f/g)'(a) = \frac{g(a)f'(a) - f(a)g'(a)}{g(a)^2}$ .

Proof. The first property follows from the Chain Rule and Example 2. The second property follows from the first property and the product rule.

Summary. We have derived the standard laws of differentiation in secondary school calculus.

(addition rule) $\frac{d}{dx}(f+g) = \frac{df}{dx}+\frac{dg}{dx}$ ;

(product rule) $\frac{d}{dx}(fg) = \frac{df}{dx}g + f\frac{dg}{dx}$ ;

(quotient rule) $\frac{d}{dx}(f/g) = \frac{g\frac{df}{dx} - f\frac{dg}{dx}}{g^2}$ ;

(chain rule) $\frac{dg}{dx} = \frac{dg}{dy}\cdot\frac{dy}{dx}$ .

More Examples

The above rules for manipulating derivatives have multiple applications.

Example 4

Let’s differentiate $f_n(x) = x^n$ for an integer n. Clearly, for n=0, the derivative is 0 and for n=1, it’s 1. For a positive integer n>1, the derivative of f_n can be obtained by recursively applying the product rule:

$f_n(x) = x\cdot f_{n-1}(x) \implies f_n'(x) = f_{n-1}(x) +x\cdot f_{n-1}'(x),$

which gives $f_n'(x) = n\cdot x^{n-1}$ for n≥0. By applying the division rule, we see that in fact this equation holds for all integers n.

Example 5

Let $f(x) = (x^2 + 1)^{100}$ . We wish to compute $\frac{df}{dx}$ in closed form. Let $u(x) = x^2+1$ and $v(y) = y^{100}$ . Since f(x) = v(u(x)), the chain rule gives us

$\frac{df}{dx} = 100(x^2+1)^{99}\cdot 2x = 200x(x^2+1)^{99}.$

Example 6

For a fixed positive integer n, write the binomial expansion:

$(x+1)^n = C^n_n x^n + C^n_{n-1} x^{n-1} + \ldots + C^n_1 x + C^n_0.$

Differentiating both sides with respect to x, the chain rule applied to the LHS gives n(1+x)^n-1 while the RHS gives us $\sum_{k=1}^n k\cdot C^n_{k} x^{k-1}$ . Substituting x=1 then gives us the combinatorial equality:

$1\cdot C^n_1 + 2\cdot C^n_2 + \ldots + (n-1) C^n_{n-1} + n\cdot C^n_n = n\cdot 2^{n-1}.$

Many more examples can be found in our articles on generating functions.

Example 7 (with Warning)

We know that for |x|<1, the geometric series converges to:

$1 + x+x^2 +x^3 + \ldots = \frac 1 {1-x}.$

One’s tempted to differentiate both sides with respect to x, which gives $1+2x+3x^2+4x^3+\ldots$ on the LHS and $\frac 1{(1-x)^2}$ on the RHS by using the chain rule with u(x) = 1-x. But there’s no guarantee that the addition rule holds for infinitely many terms, although in this case it does work. The underlying theory will be covered in greater detail in a later article.

Higher Derivatives

From the above results, we see that we can differentiate a function f(x) = p(x)/q(x), where p(x) and q(x) are both polynomials, at any point x=a which is not a root of q(x). A function of this form (p(x)/q(x) for polynomials p(x) and q(x)) is called a rational function. Furthermore, from the division rule the derivative $\frac{df}{dx}$ is also a rational function, which means we can differentiate it as many times as we wish:

$\frac{d^2 f}{dx^2} := \frac{d}{dx}\left(\frac{df}{dx}\right),\ \frac{d^3 f}{dx^3} := \frac{d}{dx}\left(\frac{d^2f}{dx^2}\right),\ \ldots.$

These can also be denoted by f”, f”’, etc, or f⁽²⁾, f⁽³⁾, … .

Now, if f(x) is second-differentiable (i.e. f” exists) at x=a, then by definition f’ = df/dx must exist on an open interval (b, c) about a, and also be differentiable at x=a. In particular, since differentiability implies continuity, df/dx is continuous at x=a:

Definition. A function f(x) is said to be continuously differentiable at x=a if it’s differentiable on an open interval about a, and the derivative is continuous at x=a.

Clearly, we have the following implications:

$\begin{matrix}\text{second diff.}\\ \text{at }x=a\end{matrix} \implies\begin{matrix}\text{continuously}\\ \text{diff. at }x=a\end{matrix}\implies \begin{matrix}\text{differentiable}\\ \text{at }x=a\end{matrix}\implies \begin{matrix}\text{continuous}\\ \text{at }x=a\end{matrix}$

And the implications are strict:

Continuous but not differentiable function : easy, take f(x) = |x|; it’s continuous and example 3 tells us it’s not differentiable at x=0.
Differentiable but not continuously differentiable function : take $f(x)=\begin{cases}x^2 \sin(1/x), &\text{if } x\ne 0,\\ 0, &\text{if } x=0.\end{cases}$ When a≠0, the derivative $f'(a) = 2a \sin(1/a) + a^2\cos(1/a) (-1/a^2) = 2a\sin(1/a) -\cos(1/a)$ , which does not converge as a→0. When a=0, the derivative is $\lim_{x\to 0} \frac{f(x)-f(0)}{x-0} = \lim_{x\to 0} x\sin(1/x) = 0$ by squeezing the function between |x| and -|x|.
Continuously differentiable but not second differentiable function : take f(x) = x|x|. It’s easy to see that the derivative is f’(a) = 2a if a≥0, and f’(a) = -2a if a<0. Thus f’(x) = 2|x| which is continuous but not differentiable.

More generally, it’s easy to see that the function f(x) = x^n-1|x| is (n-1)th order continuously differentiable (i.e. the (n-1)th derivative exists and is continuous) but not n-th order differentiable.

Now, we say f(x) is infinitely differentiable if the n-th order derivative exists for all n. For such a function, one’s inclined to write down its Taylor series about x=a:

$f(x) = f(a) + f'(a)(x-a)+ \frac{f''(a)}{2!}(x-a)^2 + \frac{f'''(a)}{3!}(x-a)^3 + \ldots$

and hope that the Taylor series converges to f(x) in some open interval about a. If so, we say f(x) is analytic. Alas.

Infinitely differentiable but not analytic function : let $f(x) = \begin{cases}\exp(-x^{-2}), &\text{if } x\ne 0,\\ 0, &\text{if }x=0.\end{cases}$ At the points x=a≠0, it’s easy to see the n-th derivative is of the form $p(x)x^{-m} \exp(-x^{-2})$ for some polynomial p(x) and positive integer m. To check its behaviour as x→0, let y=1/x; then $x^{-m} \exp(-x^{-2}) = y^m/ \exp(y^2)$ which approaches zero since exponential increases much faster than any polynomial function. With this, one proves via induction that the n-th derivative at 0 is always 0, so the Taylor series is 0.

It should be noted that this anomaly doesn’t occur for complex analysis. Specifically, a differentiable function C → C on an open set is automatically analytic there and one immediately has all the wonderful results about Taylor series convergence etc. We hope to come around to that at a later date.

Posted in Notes | Tagged advanced, analysis, analytic, chain rule, differentiation, taylor series | Leave a comment

Basic Analysis: Limits and Continuity (3)

Posted on November 25, 2012 by limsup

Let’s consider multivariate functions $f : D\to \mathbf{R}^m$ where $D\subseteq \mathbf{R}^n$ . To that end, we need the Euclidean distance function on Rⁿ. If x = (x₁, x₂, …, x_n) is in Rⁿ, we define:

$|\mathbf{x}| := \sqrt{x_1^2 + x_2^2 + \ldots + x_n^2}.$

Note that |x| = 0 if and only if x is the zero vector 0. Now we are ready to define continuity.

[ In this and all subsequent articles, boldfaced variables a, x, y, … denote parameters with multiple components. ]

Definition. The function $f:D\to\mathbf{R}^m$ is said to be continuous at a if:

for every ε>0, there exists δ>0, such that whenever |x–a|<δ and x belongs to D, we have |f(x)-f(a)|<ε.

Let $D\subseteq \mathbf{R}^n$ . A point $\mathbf{a}\in\mathbf{R}^n$ is said to be a point of accumulation, if for any ε>0, there exists $\mathbf{x}\in \mathbf{R}^n$ such that

$0 < |\mathbf{x}-\mathbf{a}|<\epsilon$ .

Theorem. If a is a point of accumulation of D, and we have continuous functions

$g, h : D\cup\{\mathbf{a}\}\to \mathbf{R}^m$

which agree on D-{a}, then g(a)=h(a).

We’ll skip the proof since it’s identical to the case of single-variable functions, but we’d encourage the conscientious reader to check that the logic follows through.

Basic Properties

Continuous functions satisfy the following basic properties:

Theorem. The property of continuity satisfies the following.

Identity. The identity map $D\to D\subseteq\mathbf{R}^n$ is continuous.

Restriction. If $D'\subseteq D$ and $f:D\to\mathbf{R}^m$ is continuous at a in D’, then so is the restriction of f to D’.

Composition. If $f:D\to E$ is continuous at a and $g:E\to\mathbf{R}^m$ is continuous at b=f(a), then $g\circ f:D\to \mathbf{R}^m$ is continuous at a.

Projection. The map $\pi_i : \mathbf{R}^m \to\mathbf{R}$ which projects to i-th component is continuous at every point.

Reduction of Codomain to R. The function $f:D\to\mathbf{R}^m$ is continuous at a iff $\pi_i\circ f$ is continuous for each i.

Arithmetic. The addition and product functions $\mathbf{R}\times\mathbf{R}\to\mathbf{R}$ are continuous, as is the reciprocal $\mathbf{R}-\{0\}\to\mathbf{R}$ .

Proof.

All the properties have rather straightforward proofs. We’ll use our earlier lingo: say “when x close to a in D, P(x) holds” if there exists δ>0 such that whenever |x–a|<δ and x lies in D, the property P(x) holds.

Identity. Easy: let δ=ε.
Restriction. Again let δ=ε.
Composition. Let ε>0. Since g is continuous at f(a), there exists δ>0 such that when |y–f(a)|<δ and y is in E, |g(y)-gf(a)|<ε. Likewise, since f is continuous at a, there exists δ’>0 such that when |x–a|<δ’ and x is in D, then |f(x)-f(a)|<δ and hence |gf(x)-gf(a)|<ε.
Projection. Use the inequality $|\pi_i(\mathbf{x})-\pi_i(\mathbf{a})| \le |\mathbf{x}-\mathbf{a}|$ .
Reduction of Codomain to R. If f is continuous, then $\pi_i\circ f$ is continuous by the above two properties. Conversely, suppose each $\pi_i\circ f$ is continuous. Let ε>0. For x close to a in D, we have $|\pi_i f(\mathbf{x})-\pi_i f(\mathbf{a})|< \epsilon/\sqrt{m}$ for each i=1,2,…,m. Hence, when x is close to a in D, we have |f(x)-f(a)| < ε as well.
Arithmetic.
- For addition, use the inequality |(x+y)-(a+b)| ≤ |x–a|+|y–b| ≤ |(x,y)-(a,b)|.
- For multiplication, |xy–ab| = |x(y–b)+(x–a)b| ≤ |x|·|y–b|+|b|·|x–a|. When |x–a|<1, we then have |x|<|a|+1. Hence, for any ε>0, whenever |x–a|<min(1, ε/(2|a|+2)) and |y–b|<ε/(2|b|) we have |xy–ab|<ε. Hence set δ=min(1, ε/(2|a|+2), ε/(2|b|)).
- For division, take $|\frac 1 x-\frac 1 a| = \frac{|a-x|}{|a|\cdot|x|}$ . When |x–a|<|a|/2, we have |x|>|a|/2. Hence, for any ε>0, whenever |x–a|<min(|a|/2, ε|a|²/2) we get $|\frac 1 x-\frac 1 a|<\epsilon$ .

Exercises

Prove the following using only the above properties, without using the ε-δ definition.

“Concatenation” of continuous functions gives a continuous function. Specifically, suppose $f:D\to \mathbf{R}$ and $f:E\to\mathbf{R}$ are continuous at a and a’ respectively, where $D\subseteq \mathbf{R}$ and $E\subseteq\mathbf{R}$ . Then $h=(f,g):D\times E\to\mathbf{R}^2$ given by $\mathbf{a}\mapsto (f(\mathbf{a}), g(\mathbf{a}))$ is continuous at (a, a’).
Prove that if f, g: D → R are continuous at a, for $D\subseteq \mathbf{R}$ , then f+g, f×g are also continuous at a. If f(a)≠0, then 1/f is continuous at a also. Hence, any polynomial function is continuous.

Answers (highlight to read).

Consider D×E → D → R, where the first map is projection (and hence continuous at (a, a’) ) and the second map is f. Since it’s a composition of two continuous maps, the result is continuous. Likewise, D×E → E → R is continuous. On the other hand, the two maps are also given by composing (f, g): D×E → R×R → R, where the 2nd map is a projection. The 4th property then tells us (f, g) is continuous at (a, a’).
Use property 5: the composition D×D → R×R → R, where the first map is the concatenation (f, g) and the second map is addition/product, is continuous.

Global Continuity

We say that a function $f:D\to \mathbf{R}^m$ is continuous if it’s continuous at every point in the domain. From definition, it’s clear that continuity at a specific point is determined by what happens close to that point. To that end, let’s define open sets.

Definitions. Let $D\subseteq \mathbf{R}^n$ be any subset. An open ball containing $\mathbf{a}\in D$ is the set of points:

$N(\mathbf{a}, \epsilon)=\{\mathbf{x}\in D : |\mathbf{x}-\mathbf{a}|<\epsilon\},$

for some ε>0. [ Note that the definition depends on D, although the notation doesn’t reflect that. If there’s a possibility of confusion, we’ll write it as $N_D(\mathbf{a},\epsilon)$ . ]

A subset $U\subseteq D$ is said to be an open subset of D if for each $\mathbf{a}\in D$ , we have $N(\mathbf{a}, \epsilon)\subseteq U$ for some ε>0.

E.g. each open ball U := N(a, ε) is open, for if x is in U, then |x–a|<ε, so we let δ = ε-|x–a|. We claim that $N(\mathbf{x},\delta)\subseteq N(\mathbf{a},\epsilon)$ . Indeed, any y in N(x, δ) satisfies |y–x|<δ so

|y–a| ≤ |y–x| + |x–a| < δ+|x–a| = ε.

More Examples of Open Subsets

Let D = R. Then any open interval (b, c) is an open subset of D since it’s an open ball. The punctured neighbourhood (b, c) – {a} is also an open subset of D since it’s a union of two open balls. The half-open interval U = (b, c] is not open in D since the point c in U isn’t contained in any open ball of R which is a subset of U. [ Any open ball containing c is of the form (c-ε, c+ε), which contains points outside U. ]
Let . The subset U = [0, 1] is now an open subset of D. This may confuse the reader at first glance, but note that even if we pick the endpoint 1 (in U), we can pick the open ball N(1, 1/2) in D. By definition, this is the set , so it’s (1/2, 1] which is wholly contained in U. Likewise, the singleton subset V = {2} is also a subset of D, since the open ball N(2, 1/2) = {2} is wholly contained in U.
- Which of the subsets [0, 1), (0, 1], {1}, [0, 1/2), (0, 1/2] are open in D? [ Answer: the first, second, fourth. ]

Here are further examples of open subsets $U\subseteq D$ , where D is coloured in blue and U in red.

In the first two examples, $D=\{(x,y)\in\mathbf{R}^2: x^2+y^2\le 1\}$ while in the other two, $D=\{(x,y)\in\mathbf{R}^2:x^2 + y^2\le 1\}\cup\{(2,0)\}$ . The dotted boundary lines indicate that the boundaries are not included in U.

The main theorem we wish to prove is:

Theorem. Let $f:D\to \mathbf{R}^m$ be a function, where $D\subseteq \mathbf{R}^n$ . Then f is continuous if and only if for each open subset $U\subseteq \mathbf{R}^m$ , $f^{-1}(U)$ is an open subset of D.

Proof.

Suppose f is continuous. To show that $f^{-1}(U)$ is open in D, let $\mathbf{a}\in f^{-1}(U)$ . Since f(a) is in U which is open, there exists an ε>0 such that N(f(a), ε) is a subset of U. And since f is continuous at a, there exists a δ>0 such that whenever |x–a|<δ and x in D, we have |f(x)-f(a)|<ε. Hence

$f(N(\mathbf{a}), \delta))\subseteq N(f(\mathbf{a}, \epsilon)\subseteq U\implies N(\mathbf{a},\delta)\subseteq f^{-1}(U)$

which shows that $f^{-1}(U)$ is open in D.

Conversely, suppose $f^{-1}(U)\subseteq D$ is open for any open subset $U\subseteq \mathbf{R}^m$ . To show f is continuous at a (an element of D), let ε>0. Since $U:=N(f(\mathbf{a}),\epsilon)\subset \mathbf{R}^n$ is open, so is $f^{-1}(U)$ . But $\mathbf{a}\in f^{-1}(U)$ , so by definition of open subsets, there exists δ>0 for which

$N(\mathbf{a}, \delta)\subseteq f^{-1}(U) \implies f(N(\mathbf{a},\delta))\subseteq U=N(f(\mathbf{a}),\epsilon)$

so whenever |x–a|<δ and x in D, we have |f(x)-f(a)|<ε. ♦

Main point. The above theorem suggests that the central concept behind analysis is that of open subsets. Indeed, the core idea behind topology is to build up an entire theory based only on the concept of open subsets. This will be elaborated in subsequent posts.

Important Exercises.

Find a continuous function f : R → R and an open subset U of R such that f(U) is not open in R.
Prove that if U and V are open subsets of D, then so is U ∩ V. In particular, this means that finite intersections of open subsets are still open.
Prove that if {U_i} is a collection of open subsets of D, then so is $U:=\cup_i U_i$ .
Suppose .
1. Prove that if U is open in D, then V := U ∩ E is open in E.
2. Prove that conversely, if V is open in E, then V = U ∩ E for some open subset U of D.

[ Note: for Q4, be careful with the notation N(a, ε) since the ambient space can be either D or E. ]

Sketch of Answers [Highlight to read.]

Let f(x)=x² and U=(-1, 1). Then f(U) = [0, 1) is not open in R. However, in this case, it is open in the image of f, which is [0, ∞). Can you find an example where f(U) is not open in the image of f?
Let a be in U ∩ V. For some ε>0 and ε’>0, N(a, ε) and N(a, ε’) are subsets of U and V respectively. Thus, N(a, min(ε, ε’)) is a subset of U ∩ V.
Let a be in U. Then a lies in some U_i so for some ε>0, N(a, ε) is a subset of U_i and hence U.
For (A), suppose a is in V. So a is in the open set U and for some ε>0, N_D(a, ε) is a subset of U. Since N_E(a, ε) = N_D(a, ε) ∩ E, we also have N_E(a, ε) is a subset U ∩ E = V. For (B), let a be in V, which is open in E. For some ε>0, N_E(a, ε) is a subset of V. Now let U be the union of all open balls N_D(a, ε) in D. Each open ball is open in D and by Q3, U is also open in D. Clearly V = U ∩ E.

Posted in Notes | Tagged advanced, analysis, continuity, limits, multivariate, open balls, open subsets, topology | Leave a comment

Mathematics and Such

Topology: Subspaces

Basic Properties of Subspaces

Topology: Bases and Subbases

Bases

Application: Furstenberg’s Proof of the Infinitude of Primes

Subbases

Topology: Basic Definitions

Motivation and Definition

Examples of Topologies

Metric Spaces

Basic Analysis: Closed Subsets and Uniform Continuity

Closed Subsets

Uniform Continuity

Main Theorems

Basic Analysis: Uniform Convergence

Uniform Convergence

Alternate Definition

Basic Properties of Uniform Convergence

Basic Analysis: Differentiation (2)

Finding Extremum Points

Rolle’s Theorem

L’Hopital’s Rule

Basic Analysis: Differentiation (1)

Properties of the Derivative

More Examples

Higher Derivatives

Basic Analysis: Limits and Continuity (3)

Basic Properties

Global Continuity

Recent Posts

Archives

Categories

Meta

Pages