Background recommended : coordinate geometry
Here I thought I’d give an outline of linear algebra and matrices starting from a more axiomatic viewpoint, instead of merely giving rules of computation – the way it’s usually taught in school. The materials here are not really Olympiad-type of mathematics, but if you’ve always wondered why matrix multiplication is defined the way it is, you’ll hopefully get it by the end of this post. Purpose of this post:
- introduction to axiomatic reasoning for students (no background is required since everything is derived from definitions, but it helps to keep the geometric picture in mind);
- explain the purpose behind matrix operations.
Note: please try to do all the exercises to get a feel of the axiomatic approach.
The Object
Our object of interest is , which is the set of all pairs (x, y), where x, y are real. The pairs (x, y) and (x’, y’) are considered equal if and only if x=x’ and y=y’. E.g. (1, -1), (2.5, -1), (√2, 0), … are examples of (distinct) elements of
. Bear in mind that the underlying idea for (x, y) is a point on the plane with the corresponding coordinates.
Next, we shall define the following operations on .
- Addition : given
, the sum of these two pairs is defined to be
. E.g. (-1, 3) + (5, 10.3) = (4, 13.3).
- Multiplication by scalar : given any real number c and
, the product between them is defined to be
. E.g. 1.5 × (-1, 2) = (-1.5, 3). Later, we will drop the × for brevity.
- We will not define multiplication between elements of
.
All in all, totally intuitive concepts. If you draw the points on a plane, you’ll immediately see the geometric interpretation: if points A and B corresponds to (x, y) and (x’, y’) on the Cartesian plane, then the sum corresponds to a point C such that OACB is a parallelogram, where O = (0,0) is the origin.
Before I forget, for convenience, we will also define subtraction in terms of addition and multiplication: for any v, w in , we define v – w = v + (-1)w. Note that each of the elements v, w is a pair of numbers. Clearly, the definition just means (x, y) – (x’, y’) = (x–x’, y–y’). You might wonder why I didn’t just say so; as a matter of fact, though it’s probably easier just to say (x, y) – (x’, y’) = (x–x’, y–y’), I was following the philosophy of having as few basic (atomic) definitions as possible.
The Functions
Next, we will describe what kind of functions are of interest to us. This will be a recurring theme throughout algebra and higher mathematics (and even object-oriented programming!) : you study the objects as well as the “nice” functions between them. Since we had defined a structure on
, clearly our f is expected to respect this structure.
A function
is said to be linear if for any v and w in
and real number c, we have
;
.
Geometrically, if we have a line l on the plane , then l gets mapped to f(l) which is also a line (can you prove it?). That’s the reason behind the name linear.
Basic Exercise : prove the following explicitly (with only the definitions/materials we’ve covered so far).
- If f is linear then we must have f(0) = 0 (where 0 = (0,0)) as well as f(v–w) = f(v)-f(w).
- f is injective if and only if we cannot a v, where v ≠ 0, such that f(v) = 0.
- The function which takes all v in
to the zero element 0 is linear.
- If f and g are linear functions
, then the new function
which is defined by h(v) := f(v) + g(v) is also linear.
- Same as 4., but now replace h with the function h(v) = f(g(v)).
- If f is linear and c is any real number, define a new function
via h(v) = c × f(v). Prove that h is linear.
Properties 4-6 suggest that we can have operations among linear functions! In other words, we have functions which take in linear functions, and output another linear function. That’s not so weird if you think about it, since in algebra we can add and multiply polynomials (which are functions themselves).
Definition : Let
be linear functions and c be a real number. Define the following:
- f+g is the ‘h’ function defined in exercise 4 above.
- f×g is the ‘h’ function defined in exercise 5.
- c×f is the ‘h’ function defined in exercise 6.
In short, we can “add” or “multiply” two linear functions to give another. And we can multiply a linear function by a real number to obtain another linear function.
Matrices
Note that we’ve yet to see a single concrete example of a linear function. This is where matrices come in handy; we will also see that multiplication of two linear functions is non-commutative! In other words f×g and g×f are different in general.
The key property we will prove is this: fix the two elements .
- If
are linear functions such that
and
, then f = g.
- On the other hand, for any two elements
, there is a linear f such that
.
[ In other words, the linear function f is uniquely determined by a pair of elements . Note that since each of v1 and v2 is a pair of real numbers, we’re really looking at 4 real numbers here. ]
Proof. Suppose f, g satisfy and
. Now any element of
is of the form v = (x, y) = x × e1 + y × e2 from which we get:
[ Make sure you understand each step thoroughly! ]
Thus f = g. For the second property, again we can write v = (x, y) and we simply define via
[ Test of concept: why are there double brackets around “x,y” ? ]
We need to show that this f is linear, e.g. to show f( (x,y) + (x’,y’) ) = f( (x,y) ) + f( (x’, y’) ) , we have:
- By definition, RHS 1st term = x × v1 + y × v2 and RHS 2nd term = x’ × v1 + y’ × v2.
- LHS = f( (x+x’, y+y’) ) = (x+x’) × v1 + (y+y’) × v2 = (x × v1) + (x’ × v1) + (y × v2) + (y’ × v2).
The remaining condition: f(c(x, y)) = c × f(x, y) is left as an exercise for the reader. ♦
Concrete Example. Suppose we pick elements v1 = (2, 3) and v2 = (-1, 4). The corresponding linear function f is given by:
f( (x, y) ) = x × (2,3) + y × (-1,4) = (2x–y, 3x+4y).
Given a linear map
, which corresponds to
and
, write
. The 2-by-2 matrix corresponding to f is then defined to be the 2-by-2 table of values:
Thus there is a one-one correspondence between linear maps and 2-by-2 matrices.
Now the following exercises will explain the definition for matrix multiplication.
Important exercises. Let f, g, h be any linear maps and M, N, P be the corresponding 2-by-2 matrices (respectively).
- We have already seen that f+g is linear. Write the matrix corresponding to it by examining the images of e1 = (1,0) and e2 = (0,1).
- Likewise do the same for the linear maps fg and c × f. Verify that the matrix corresponding to fg is precisely the matrix multiplication M × N.
- Prove the following equality of linear maps, by examining the images of various
.
- f(g + h) = fg + fh;
- f(gh) = (fg)h.
- Use exercise 3 to prove the following:
- For any matrices A, B, C, we have A(B + C) = AB + AC.
- For any matrices A, B, C, we have A(BC) = (AB)C.
Qualitatively, this also explains why multiplication of matrices don’t commute. This follows from the fact that matrix multiplication corresponds to function composition, and generally function compositions don’t commute (try putting on the shoes before putting on the socks).
Long but insightful exercise. Develop the whole theory in greater generality. Start with for general positive integer n. Consider linear functions
for positive integers m and n. We can add and compose linear functions which are of “appropriate dimensions”. E.g. if
and
, then
. Thus prove that matrix multiplication is, in general, associative, and distributive over addition.