A more motivated proof of the Pythagorean theorem

So far every proof I’ve known of the Pythagorean theorem has adhered to a narrative along the lines of

  • Notice, purely by accident, that in known right triangles it appears that the square on the hypotenuse is always equal to the sum of the squares on the other two sides.
  • Conjecture that this holds in general.
  • Draw a right triangle and a square on each side.
  • Figure out some ingenious geometric decomposition reassembling the two smaller squares into a copy of the bigger one.

This is fairly unsatisfying, because it only tells us that the theorem is true; it doesn’t do much to tell us why it’s true, or give us much intuition for what kind of information it does or does not encode.

Today I wondered if there was a better explanation, and I came across this:

Pythagoras’s theorem | What’s New

Terry Tao writes:

it is perhaps the most intuitive proof of the theorem that I have seen yet

The proof just comes down to examining the (obviously useful) construction where a right triangle is split into two smaller right triangles, both of which are similar to the big one.

Posted in Uncategorized | Leave a comment

The Weyl Group of GL(n, C)

Here’s a cleaner explanation of the Weyl group of GL(n) than I’ve seen before.  I came up with this myself, but it’s straightforward enough that I’m sure I’m not the first.

Let V be an n-dimensional complex vector space, and fix a basis \beta := \{ e_1, e_2, \ldots, e_n \} for V.  Write G := GL(V) \cong GL_n(\mathbb{C}).  Let T < G denote the subgroup of matrices which are diagonal in the basis \beta; this is a maximal torus.  We know that the Weyl group is isomorphic to N(T)/T, so let’s determine N(T).

Pick a matrix D \in T all of whose eigenvalues are distinct, and suppose A \in N(T).  Then A^{-1} D A is diagonal.  This means that A represents a change of basis from \beta to some basis in which D is diagonal.  Now D is diagonal in some basis iff that basis consists of eigenvectors of D.  Since D was chosen in such a way that its eigenspaces are one-dimensional, the only eigenvectors of D are nonzero scalar multiples of the e_i.  Therefore we have A = P C, where P is a permutation matrix and C is diagonal.  Conversely, it’s easy to see that any such matrix normalizes T.  

From here it’s clear that N(T)/T \cong S_n, since the cosets correspond to permutation matrices.

Posted in Uncategorized | Leave a comment

The resultant

Like Euler products for number-theoretic functions, the resultant is one of those amazingly simple gadgets that you’d never imagine existed.  Given two polynomials f and g in a single variable, there is a number called the resultant, denoted res(f,g), such that:

  1. res(f, g) = 0 iff f and g share a common root
  2. The coefficients of res(f, g) are polynomials in the coefficients of f and g.

Think about this for a second.  Given the coefficients of an arbitrary polynomial, we have in general no algebraic expression for its roots, but nonetheless we have a way of determining if two polynomials share a root by simply adding and multiplying together some of their coefficients!

First, let’s see why such a thing ought to exist.  Say that f(x) = p (x-a_1)\cdots(x-a_n) and g(x) = q (x-b_1)\cdots(x-b_m), and define

res(f, g) = p^n q^m \displaystyle \prod_{i=1}^n \prod_{j =1}^m (a_i - b_j).

This clearly satisfies condition (1) above — the product will equal zero iff a_i = b_j for some i and j.  However, it also satisfies condition (2).  Why?  Well, if we regard res(f, g) as a polynomial in the a_i‘s, with coefficients which are polynomials in the b_i‘s, then it’s a symmetric polynomial in the a_i‘s — it’s invariant under permuting the order of the a_i‘s.  Further, if we regard one of the coefficients of this polynomial as a polynomial in the b_i‘s, then this polynomial is also symmetric.

Why does this matter?  Well, the space of symmetric polynomials in n variables is spanned by the elementary symmetric polynomials


x_1 + \cdots + x_n,

x_1 x_2 + x_1 x_3 + \cdots + x_{n-1} x_n,

and so forth.

But you’ll recognize that the coefficients of a (monic, univariate) polynomial are precisely the elementary symmetric polynomials in its roots!  That is, given a monic polynomial in one variable, any symmetric polynomial of its roots is just a linear combination of the coefficients.  (Throwing in the p^n and q^m at the front handles the case when the polynomials aren’t symmetric.)

I vaguely remember learning about elementary symmetric polynomials in my undergrad algebra sequence, but at the time I had no real idea what they were for.  They didn’t look that complicated, so I figured they probably didn’t matter too much.  As it turns out, though, the whole subject of invariant theory is really interesting, and symmetric polynomials are just the first nontrivial example.

As an added bonus, note that we can also determine whether a polynomial has a double root by calculating res(f, f’), where f’ is the derivative of f.  (You can define the derivative of a polynomial without using any calculus — just consider the power rule et al. as definitions instead of theorems.)  Now res(f, f’) is zero iff f and f’ share a root.  Suppose a is a root of f; then f(x) = (x - a)^n g(x) for some n and some g, where a is not a root of g.  Taking the derivative, we have f'(x) = n (x - a)^{n-1} g(x) + (x-a)^n g'(x), so f'(a) = n 0^{n-1} g(x) + 0^n g'(a) = 0 if n > 1, or f'(a) = g(a) + 0^n g'(a) = g(a) \neq 0 if n = 1.

Up to a sign, the resultant res(f, f’) is known as the discriminant, as you’ll remember from high-school algebra when they seemingly needlessly assigned this fancy name to the term b^2 - 4 a c appearing in the quadratic formula.  The form here generalizes to univariate polynomials of arbitrary degree, but in fact it can be generalized further, to arbitrary multivariate polynomials as well.

Posted in Uncategorized | Tagged | Leave a comment

An obvious statement which surprised me when I read it

From Baumslag’s “Topics in Combinatorial Group Theory,” chapter V, Exercises 2(3)(iv):

M \in SL_2(\mathbb{C}) is of order e > 2 if, and only if, {\rm tr} M = \omega + \omega^{-1}  for some primitive e-th root of unity \omega.

There’s really nothing to this statement — just put the matrix in Jordan Canonical Form and draw the obvious conclusion — but it really surprised me when I saw it used in an argument.  Another way to put this would be that, for a 2×2 matrix, the trace and determinant determine the eigenvalues (which is equally obvious).

(The problem in the cases e = 1 and e = 2 is that then the eigenvalues are identical, which means that the matrix isn’t necessarily diagonal when it’s in Jordan Canonical Form — it could be a 2×2 Jordan block.)

From this and one other fact it follows that the elements a, b, and ab in the group \langle a, b \; | \; a^\ell = b^m = (ab)^n = 1 \rangle actually have the desired orders.

Posted in Uncategorized | Tagged , | Leave a comment

The cross-ratio

Here is a nice invariant from classical geometry that I’d never heard of before today.

The action of \text{GL}_2(\mathbb{C}) on \mathbb{C}^2 restricts to an action on \mathbb{CP}^1; this is all that a Möbius transformation really is.  Now \text{GL}_2(\mathbb{C}) is four-dimensional, but there’s a one-dimensional subspace corresponding to scaling which stabilizes each point of the projective line, so we may as well quotient this out and get a \text{PGL}_2(\mathbb{C})-action.

In fact, this is the projective automorphism group of \mathbb{CP}^1, hence the name PGL; its elements are called projectivities.

\text{PGL}_2(\mathbb{C}) group is three-dimensional, so we would expect that with our free degrees of freedom we could send any three points to any three other points, and indeed we can: the action is 3-transitive.  On the other hand, the dimension of \text{PGL}_2(\mathbb{C}) implies that it can’t possibly be 4-transitive.

What this says is that, up to projectivities, any three or fewer points on the complex projective line look like any other set of the same cardinality, but there are sets of four or more points which are essentially “different.”  In particular, given a set of n > 3 points in \mathbb{CP}^1, we ought to be able to find an invariant which determines whether a projectivity takes one set to the other.

Let’s consider the case n = 4.  Obviously we’ve got some latitude in determining this invariant up to a constant, so let’s just decree that R(0, 1, \infty, x) = x.  Consequently, for any element M = \begin{pmatrix} a & b \\ c & d \end{pmatrix} \in PGL_2(\mathbb{C}), we have

R(M \cdot 0, M \cdot 1, M \cdot \infty, M \cdot x) = x.

Expanding the expression,

\displaystyle R\left(\frac{b}{d}, \; \frac{a+b}{c+d}, \; \frac{a}{c}, \;\frac{ax+b}{cx+d}\right) = x.

So all we need to do is, given a system of equations

\displaystyle p = \frac{b}{d}, \; q = \frac{a+c}{b+d}, \; r = \frac{a}{c}, s = \frac{ax+b}{cx+d},

figure out how to solve for x as a rational function of p, q, r, and s.  This isn’t too bad — note first that  ax + b = s(cx + d), or (a - cs) x = ds - b, so

x = \displaystyle \frac{ds - b}{a-cs} = \frac{ds - dp}{cr - cs} = \frac{d}{c} \frac{s-p}{r-s}

Now we just need to express d/c in terms of p, q, and r. Inspired by the previous expression, it’s not hard to determine that we can write d/c = (q-s)/(p-q), so altogether we get

R(p, q, r, s) = \displaystyle \frac{(q-r)(s-p)}{(p-q)(r-s)}.

Actually, as the negative reciprocal would provide just as good an invariant, let’s redefine R slightly to get all the variables in a nice, alphabetical order:

R(p, q, r, s) = \displaystyle \frac{(p-q)(r-s)}{(p-s)(q-r)}.

This function is the classical “cross-ratio” of the four points p, q, r, and s.  As we can see from the formula, it’s a ratio of ratios of distances between points.

Of course the point of the preceding is to provide one justification for why we should expect such an invariant to exist, and how we could determine it.  In fact, the cross-ratio and its significance to projective geometry was known already to Pappus of Alexandria around AD 300.

Posted in Uncategorized | Tagged , | 1 Comment

Left/Right Issues, Part 1

Any time a mathematical concept comes in a “left-handed” and a “right-handed” flavor, I can almost guarantee that I’m going to have trouble remembering which is which.  I’ll either have to write it somewhere I’ll see it every day until, after a few months, I pick it up via osmosis, or I’ll have to find some way of relating it to the small number of chiral concepts that I actually understand.

Here’s one such way.  We’ll learn to tell the difference between a slice category and a coslice category.

First, recall the definitions.  Given a category \sf C, and an object A \in \mathrm{Obj}({\sf C}) in this category, the slice category {\sf C}_A is the category with

\mathrm{Obj}({\sf C}_A) := \{ (Z, f) \; | \; Z \in \mathrm{Obj}({\sf C}), f \in \mathrm{Hom}_{\sf C}(Z, A) \}

\mathrm{Hom}_{{\sf C}_A}((Z, f), (Y, g)) := \{\sigma \in \mathrm{Hom}_{\sf C}(Z, Y) \; | \; f = g \circ \sigma\}

In other words, objects of {\sf C}_A are objects of \sf C equipped with maps to A, and a map in {\sf C}_A is a map in \sf C which preserves this additional structure.  The coslice category is, of course, the categorical dual.

Now, before we move on, we need a slight generalization of this concept.  For some collection S of objects of C, define the generalized slice category (nonstandard terminology!) {\sf C}_S to be

\mathrm{Obj}({\sf C}_S) := \{ (Z, (f_A)_{A \in S}) \; | \; Z \in \mathrm{Obj}({\sf C}), \forall A \in S. f_A \in \mathrm{Hom}_{\sf C}(Z, A) \}

\mathrm{Hom}_S ((Z, (f_A)), (Y, (g_A)))
:= \{\sigma \in \mathrm{Hom}_{\sf C}(Z, Y) \; | \; \forall A \in S. f_A = g_A \circ \sigma \}

So, for instance, objects of {\sf C}_{A, B} are objects of \sf C equipped with maps to both A and B, and morphisms in this category are structure-respecting morphisms from \sf C. Generalized coslice categories, then, will be the same thing.

Now we can introduce a pair of facts which will prevent us from ever confusing slice and coslice categories again:

  • Products are terminal objects in generalized slice categories;
  • Coproducts are terminal objects in generalized coslice categories.

Bonus explanation: having trouble remembering the difference between products and coproducts?  Just think of them in the category of sets.  A product of sets is the usual cartesian product (together with projection maps) — hence the name — while a coproduct of sets is their disjoint union (together with inclusion maps), which is not something you’d generally call a product.

Posted in Uncategorized | Tagged | Leave a comment

Site up

This will primarily be a private mathematical diary, as described in Krantz’s “A Mathematician’s Survival Guide,” although I may post some things publicly from time to time.

Posted in Uncategorized | Leave a comment