Inner product spaces, I – Real closed fields

I’ve always felt sort of uneasy about the way linear algebra is presented: you start off doing all this stuff that makes complete sense, and works over arbitrary fields, and then suddenly you’re doing something with all these complex conjugates and conjugate transposes and real symmetric matrices and so forth and nothing makes sense any more.  So I’m going to try to say something about that here.

Everything in the theory of inner products is based on three properties that look simple enough at first glance, but appear more and more bizarre as you consider them more deeply:

  • The real numbers are an ordered field.
  • The real numbers aren’t algebraically closed, but their algebraic closure (the complex numbers) forms a degree-2 extension.
  • The norm of a nonzero complex number is a positive real number.

(By the way, what do we mean by the norm here?  Well, it’s probably exactly what you think, namely

N(z) = z \overline{z}.

But the norm is something more general: if we have a finite Galois extension L/K, then we can define a function N_{L/K} : L \to K by

N_{L/K}(a) := \prod\limits_{\sigma \in {\rm Gal}(L/K)} \sigma(a)

 Since {\rm Gal}(\mathbb{C}/\mathbb{R}) consists of the identity and complex conjugation, we recover

N_{\mathbb{C}/\mathbb{R}}(z) = z \overline{z}.)

The fact that N_{\mathbb{C}/\mathbb{R}}(z) > 0 for z \neq 0 is specific to $latex \mathbb{C}/\mathbb{R}$; for instance, if d is some squarefree positive integer, then we have

N_{\mathbb{Q}(\sqrt{d})/\mathbb{Q}}(1 + \sqrt{d}) = (1 + \sqrt{d})(1 - \sqrt{d}) = 1-d.

Anyway, here’s why this is all pretty weird:

  • For almost any other field, the algebraic closure is an infinite-dimensional extension, so we have no hope of getting a norm map like this.  In fact, if we have a field F whose algebraic closure \overline{F} is a finite-dimensional extension, then F is a real closed field, meaning that it looks very much like the real numbers, and moreover \overline{F} = F[i].  (This is the Artin-Schreier theorem.)
  • In particular, if F is an ordered field then N_{F[i]/F}(x + i y) = (x + i y)(x - i y) = x^2 + y^2 > 0 for x + i y \neq 0.
  • So, there seem to be two completely separate kinds of non-algebraically closed fields: those that behave exactly like this (such as the real algebraic numbers, reals, and the field of real Puiseux series), and those that behave nothing like this but much like one another (such as the rational numbers, number fields in general, positive characteristic fields, etc.).

The fact that we have (1) an ordered field R (2) whose algebraic closure C is a finite-degree extension  such that (3) N_{R/C}(z) > 0 for nonzero z \in C allows us to extend the theory of linear algebra (over both R and C!) in some strange new directions.

Posted in Uncategorized | Tagged | Leave a comment

A more motivated proof of the Pythagorean theorem

So far every proof I’ve known of the Pythagorean theorem has adhered to a narrative along the lines of

  • Notice, purely by accident, that in known right triangles it appears that the square on the hypotenuse is always equal to the sum of the squares on the other two sides.
  • Conjecture that this holds in general.
  • Draw a right triangle and a square on each side.
  • Figure out some ingenious geometric decomposition reassembling the two smaller squares into a copy of the bigger one.

This is fairly unsatisfying, because it only tells us that the theorem is true; it doesn’t do much to tell us why it’s true, or give us much intuition for what kind of information it does or does not encode.

Today I wondered if there was a better explanation, and I came across this:

Pythagoras’s theorem | What’s New

Terry Tao writes:

it is perhaps the most intuitive proof of the theorem that I have seen yet

The proof just comes down to examining the (obviously useful) construction where a right triangle is split into two smaller right triangles, both of which are similar to the big one.

Posted in Uncategorized | Tagged | Leave a comment

The Weyl Group of GL(n, C)

Here’s a cleaner explanation of the Weyl group of GL(n) than I’ve seen before.  I came up with this myself, but it’s straightforward enough that I’m sure I’m not the first.

Let V be an n-dimensional complex vector space, and fix a basis \beta := \{ e_1, e_2, \ldots, e_n \} for V.  Write G := GL(V) \cong GL_n(\mathbb{C}).  Let T < G denote the subgroup of matrices which are diagonal in the basis \beta; this is a maximal torus.  We know that the Weyl group is isomorphic to N(T)/T, so let’s determine N(T).

Pick a matrix D \in T all of whose eigenvalues are distinct, and suppose A \in N(T).  Then A^{-1} D A is diagonal.  This means that A represents a change of basis from \beta to some basis in which D is diagonal.  Now D is diagonal in some basis iff that basis consists of eigenvectors of D.  Since D was chosen in such a way that its eigenspaces are one-dimensional, the only eigenvectors of D are nonzero scalar multiples of the e_i.  Therefore we have A = P C, where P is a permutation matrix and C is diagonal.  Conversely, it’s easy to see that any such matrix normalizes T.  

From here it’s clear that N(T)/T \cong S_n, since the cosets correspond to permutation matrices.

Posted in Uncategorized | Leave a comment

The resultant

Like Euler products for number-theoretic functions, the resultant is one of those amazingly simple gadgets that you’d never imagine existed.  Given two polynomials f and g in a single variable, there is a number called the resultant, denoted res(f,g), such that:

  1. res(f, g) = 0 iff f and g share a common root
  2. The coefficients of res(f, g) are polynomials in the coefficients of f and g.

Think about this for a second.  Given the coefficients of an arbitrary polynomial, we have in general no algebraic expression for its roots, but nonetheless we have a way of determining if two polynomials share a root by simply adding and multiplying together some of their coefficients!

First, let’s see why such a thing ought to exist.  Say that f(x) = p (x-a_1)\cdots(x-a_n) and g(x) = q (x-b_1)\cdots(x-b_m), and define

res(f, g) = p^n q^m \displaystyle \prod_{i=1}^n \prod_{j =1}^m (a_i - b_j).

This clearly satisfies condition (1) above — the product will equal zero iff a_i = b_j for some i and j.  However, it also satisfies condition (2).  Why?  Well, if we regard res(f, g) as a polynomial in the a_i‘s, with coefficients which are polynomials in the b_i‘s, then it’s a symmetric polynomial in the a_i‘s — it’s invariant under permuting the order of the a_i‘s.  Further, if we regard one of the coefficients of this polynomial as a polynomial in the b_i‘s, then this polynomial is also symmetric.

Why does this matter?  Well, the space of symmetric polynomials in n variables is spanned by the elementary symmetric polynomials


x_1 + \cdots + x_n,

x_1 x_2 + x_1 x_3 + \cdots + x_{n-1} x_n,

and so forth.

But you’ll recognize that the coefficients of a (monic, univariate) polynomial are precisely the elementary symmetric polynomials in its roots!  That is, given a monic polynomial in one variable, any symmetric polynomial of its roots is just a linear combination of the coefficients.  (Throwing in the p^n and q^m at the front handles the case when the polynomials aren’t symmetric.)

I vaguely remember learning about elementary symmetric polynomials in my undergrad algebra sequence, but at the time I had no real idea what they were for.  They didn’t look that complicated, so I figured they probably didn’t matter too much.  As it turns out, though, the whole subject of invariant theory is really interesting, and symmetric polynomials are just the first nontrivial example.

As an added bonus, note that we can also determine whether a polynomial has a double root by calculating res(f, f’), where f’ is the derivative of f.  (You can define the derivative of a polynomial without using any calculus — just consider the power rule et al. as definitions instead of theorems.)  Now res(f, f’) is zero iff f and f’ share a root.  Suppose a is a root of f; then f(x) = (x - a)^n g(x) for some n and some g, where a is not a root of g.  Taking the derivative, we have f'(x) = n (x - a)^{n-1} g(x) + (x-a)^n g'(x), so f'(a) = n 0^{n-1} g(x) + 0^n g'(a) = 0 if n > 1, or f'(a) = g(a) + 0^n g'(a) = g(a) \neq 0 if n = 1.

Up to a sign, the resultant res(f, f’) is known as the discriminant, as you’ll remember from high-school algebra when they seemingly needlessly assigned this fancy name to the term b^2 - 4 a c appearing in the quadratic formula.  The form here generalizes to univariate polynomials of arbitrary degree, but in fact it can be generalized further, to arbitrary multivariate polynomials as well.

Posted in Uncategorized | Tagged | Leave a comment

An obvious statement which surprised me when I read it

From Baumslag’s “Topics in Combinatorial Group Theory,” chapter V, Exercises 2(3)(iv):

M \in SL_2(\mathbb{C}) is of order e > 2 if, and only if, {\rm tr} M = \omega + \omega^{-1}  for some primitive e-th root of unity \omega.

There’s really nothing to this statement — just put the matrix in Jordan Canonical Form and draw the obvious conclusion — but it really surprised me when I saw it used in an argument.  Another way to put this would be that, for a 2×2 matrix, the trace and determinant determine the eigenvalues (which is equally obvious).

(The problem in the cases e = 1 and e = 2 is that then the eigenvalues are identical, which means that the matrix isn’t necessarily diagonal when it’s in Jordan Canonical Form — it could be a 2×2 Jordan block.)

From this and one other fact it follows that the elements a, b, and ab in the group \langle a, b \; | \; a^\ell = b^m = (ab)^n = 1 \rangle actually have the desired orders.

Posted in Uncategorized | Tagged , | Leave a comment

An elementary result on noncommutative rings.

Let R be a (noncommutative) ring with identity, such that there are elements u and v with u v = 1.  Then the following are equivalent:

  1. u w = 0 for some nonzero w.
  2. u is not a unit.
  3. u has other right inverses.

Proof. It’s easy to see that 1 and 3 are equivalent — if v is an inverse, then so is v + w, and vice-versa — and clearly they imply condition 2.  Showing that 2 implies the others isn’t as obvious, but it’s a nice one-liner formal trick.

In particular, note that u (1 – v u) = u – u v u = u – (u v) u = u – u = 0, so either v u = 1 and hence u is a unit, or w = 1 – vu is a nonzero element of R such that u w = 0.

In fact, by a theorem due to Kaplansky there are infinitely many such inverses if there are two, which we obtain simply by taking w_n = (1 - v u) u^n.  To see these are distinct, note that if

(1 - v u) u^n = (1 - v u) u^m

for some n < m, then multiplying through by v^n on the right gives

1 - vu = (1 - vu) u^{m-n},


1 = \left[v + (1-vu) u^{m-n-1}\right] u,

i.e. u is actually a unit, contradicting the hypothesis.

Posted in Uncategorized | Tagged | Leave a comment

The cross-ratio

Here is a nice invariant from classical geometry that I’d never heard of before today.

The action of \text{GL}_2(\mathbb{C}) on \mathbb{C}^2 restricts to an action on \mathbb{CP}^1; this is all that a Möbius transformation really is.  Now \text{GL}_2(\mathbb{C}) is four-dimensional, but there’s a one-dimensional subspace corresponding to scaling which stabilizes each point of the projective line, so we may as well quotient this out and get a \text{PGL}_2(\mathbb{C})-action.

In fact, this is the projective automorphism group of \mathbb{CP}^1, hence the name PGL; its elements are called projectivities.

\text{PGL}_2(\mathbb{C}) group is three-dimensional, so we would expect that with our free degrees of freedom we could send any three points to any three other points, and indeed we can: the action is 3-transitive.  On the other hand, the dimension of \text{PGL}_2(\mathbb{C}) implies that it can’t possibly be 4-transitive.

What this says is that, up to projectivities, any three or fewer points on the complex projective line look like any other set of the same cardinality, but there are sets of four or more points which are essentially “different.”  In particular, given a set of n > 3 points in \mathbb{CP}^1, we ought to be able to find an invariant which determines whether a projectivity takes one set to the other.

Let’s consider the case n = 4.  Obviously we’ve got some latitude in determining this invariant up to a constant, so let’s just decree that R(0, 1, \infty, x) = x.  Consequently, for any element M = \begin{pmatrix} a & b \\ c & d \end{pmatrix} \in PGL_2(\mathbb{C}), we have

R(M \cdot 0, M \cdot 1, M \cdot \infty, M \cdot x) = x.

Expanding the expression,

\displaystyle R\left(\frac{b}{d}, \; \frac{a+b}{c+d}, \; \frac{a}{c}, \;\frac{ax+b}{cx+d}\right) = x.

So all we need to do is, given a system of equations

\displaystyle p = \frac{b}{d}, \; q = \frac{a+c}{b+d}, \; r = \frac{a}{c}, s = \frac{ax+b}{cx+d},

figure out how to solve for x as a rational function of p, q, r, and s.  This isn’t too bad — note first that  ax + b = s(cx + d), or (a - cs) x = ds - b, so

x = \displaystyle \frac{ds - b}{a-cs} = \frac{ds - dp}{cr - cs} = \frac{d}{c} \frac{s-p}{r-s}

Now we just need to express d/c in terms of p, q, and r. Inspired by the previous expression, it’s not hard to determine that we can write d/c = (q-s)/(p-q), so altogether we get

R(p, q, r, s) = \displaystyle \frac{(q-r)(s-p)}{(p-q)(r-s)}.

Actually, as the negative reciprocal would provide just as good an invariant, let’s redefine R slightly to get all the variables in a nice, alphabetical order:

R(p, q, r, s) = \displaystyle \frac{(p-q)(r-s)}{(p-s)(q-r)}.

This function is the classical “cross-ratio” of the four points p, q, r, and s.  As we can see from the formula, it’s a ratio of ratios of distances between points.

Of course the point of the preceding is to provide one justification for why we should expect such an invariant to exist, and how we could determine it.  In fact, the cross-ratio and its significance to projective geometry was known already to Pappus of Alexandria around AD 300.

Posted in Uncategorized | Tagged , | 1 Comment