A tangent space is a useful structure which can be extracted from the differential structure of an n-dimensional manifold M. On such a manifold, it is handy to be able to define vectors, so that we can calculate things like, say, the velocity of a given curve defined on M. However, since a general manifold doesn't have to be defined as a subspace of flat euclidean space, we can no longer just think of a vector as some "arrow" living in the ambient space. In euclidean space, we would just draw an arrow, and state that it's "located" at the coordinates of the non-pointy end of the arrow, and it had a "direction" and "magnitude" given by the coordinates of the "point" of the arrow. We were able to get away with this nonsense because there was no curvature in euclidean space, and our vectors therefore are given by straight lines. The essential core of a vector is linearity, and it is that property of vectors that we cannot lose when generalize to curved spaces. Therefore, we can no longer define our vectors as living in the manifold itself. If we try to represent our vectors as "curved" arrows in the manifold, there is no proper way of "adding" them together; we lose linearity. The bottom line is, we need a new place for our vectors to live. So, we seek to define the tangent space to M at any given point p in the manifold. This can be done in a few different ways.
If M is an n-dimensional differential manifold, the tangent space to M at p, denoted TpM, is an n-dimensional vector space. You might like to think of it as the "closest flat approximation to M at p", but so far, it's really just a copy of Rn where vectors at p can be defined.
Curves in the Manifold
Let's begin in a familiar setting. We look at the velocity vector for a curve in R3. Define the curve as a map C: (-1,1) → R3. In local coordinates on R3 (which we can just take to be the cartesian coordinates), we can represent C by (x1(t), x2(t), x3(t)). The tangent vector to C inside of R3 is given by just taking the derivative with respect to the curve parameter, t: (dx1/dt|p, dx2/dt|p, dx3/dt|p). This can be interpreted as the velocity vector of the curve at the point p.
Now, there is no reason we can't do this for arbitrary manifolds; in a small enough neighborhood of a point p, we have a local coordinate representation of any curve passing through p, and thus we can compute the tangent vector to such a curve in those coordinates. We can then compare the tangent vectors to multiple curves passing through p in a natural way, since we can use the same coordinate representation to look at both curves.
All of this is useful to us, because in a way we can now associate vectors defined at a point p on M with curves in M passing through p; "vectors" are really just the velocities of curves. This definition is nice, because it is a completely coordinate-independent statement. Note, however, the relationship between vectors at p and curves passing through p is not a one-to-one mapping. For a given vector vp at p, there are many different curves passing through p which have the same velocity vp. Therefore, in order to associate the tangent space at p with the set of curves passing through p, we must first define an equivalence class of curves.
Consider the set of curves on M passing through p. An element Cj of this space is a map Ci: (-1,1) → M such that Cj(0) = p.
Define two such curves Cj and Ck to be "equivalent" when their tangent vectors at p agree in a local coordinate representation {xi} about p. You can check that this definition is independent of the choice of coordinates {xi}.
Now, we can define a tangent vector to M at p to be an element of the equivalence class of curves on M through p defined as above. The tangent space, TpM, is therefore the space of all such equivalence classes. Notice we use the notation vp, hilighting the fact that vectors at different points on the manifold live in different tangent spaces. Before defining additional structure, there is no way of comparing two vectors vp and vq at different points on the manifold.
This is one good way of defining the tangent space. It is nice because its definition does not rely on our choice of coordinates. However, it is a bit abstract, and therefore it is common to use the coordinate-dependent version, especially when doing calculations:
Coordinate Representations of Curves
Given a point p in M, and local coordinates {xi} on a neighborhood of p, the tangent space TpM is the space of all possible velocities (dC1/dt|p, dC2/dt|p, ..., dCn/dt|p) for curves C(t) mapping into M. The components of a vector v in M are given by the components of this coordinate representation for dC/dt;
vi = dCi/dt
Note that the components of v are coordinate-dependent. If we chose a different patch to represent the neighborhood of p in M, we would generally get different components vi. While v is a coordinate-independent object, its components are entirely coordinate-dependent.
Directional Derivatives
There is another way of defining a tangent vector space, which involves looking at derivatives of functions defined on M. At a point p, given a tangent vector, vp (defined in one of the two ways above), and a function f: M → R, we ask how quickly is f changing in the v-direction?
The notation we will use will be vp(f) = the directional derivative of f at p in the v-direction. To calculate vp(f), we choose an associated curve C in M through p (from the equivalence class of curves corresponding to v) with velocity v at p. Then, the rate of change of f in the v-direction is given by the rate of change of f o C with respect to the curve parameter t, evaluated at p:
vp(f) = d/dt(f(C(t)))|p.
This definition for vp(f) is manifestly coordinate-independent, since we haven't chosen a coordinate chart yet. However, we haven't yet shown that this definition is independent of our choice of curve, C (remember, we just picked one out of the set of curves with velocity v at p). We can do this by choosing a coordinate chart, {xi}, and carrying through the t-derivative using the chain rule:
vp(f) = d/dt(f(C(t)))|p = (df/dxi)(dxi/dt)|p
where xi(t) is just the coordinate representation of the curve C in M. Note we are implicitly summing over the coordinate index i, using the Einstein summation convention. Simplifying a little of the cumbersome notation, we write:
vp(f) = (dxi/dt) (∂if)
Thus, the directional derivative of f is specified by n parameters, dxi/dt, for i = {1,...,n}. These parameters are exactly the components vi in the coordinate representation given above. Now let us think of vp as an operator on functions in M. By the above equation, it is clear that this is a linear operator, and that its operation on functions is completely specified by the components vi. This means we have a new representation of our tangent space, TpM:
The tangent space TpM is the space of all directional derivative operators vp, acting on smooth functions f: M → R, and returning a real number given by the equation:
vp(f) = vi (∂if)|p.
So, now we can think of vectors as linear maps from functions into R. This concept is a bit abstract upon first viewing, so let's play with the algebra until you're a bit more convinced.
First of all, notice that by this definition adding two vectors together is the same thing as adding their components together, as one would expect. Secondly, multiplying a vector by a given constant just multiplies the components by a constant. Thus, directional derivative operators do form a linear space. Now, let's say we choose a given set of coordinates {xi} and choose a particular directional derivative, which we will call e1, given by e11 = 1, all other e1i = 0. We will show that this is one of a natural set of basis vectors for TpM. The equation above shows us how this operator acts on functions:
e1(f) = e1i (∂if) = ∂1f.
In other words, e1 = ∂1. Likewise, for any j, ej = ∂j, the j'th partial derivative. Since every directional derivative is a linear combination of partial derivatives, we can always express any vector in TpM as a linear combination of the ej's. Thus, we have established that we can think of {∂j} as a basis for TpM.
Change of Basis
Now that we're a bit more comfortable with the notion of a vector as a linear map, let's look at the consistency of this definition when going from one coordinate patch to another. Remember, when we first gave the definition for a directional derivative operator, we noted that it was manifestly coordinate-independent. However, when we chose a set of coordinates {xi}, we got a set of components for v, {vi}, which were coordinate-dependent. So, a natural question to ask would be: How exactly do the components vi for a vector v transform when we make a new choice of coordinates? The intuitive picture is that the components must somehow transform in a way "opposite" that of the basis vectors, so that the overall definition is coordinate-independent. Specifically, if the coordinate transformation results in a change of basis which can be represented as a matrix acting on basis vectors,
fi = ∂/∂yi = Mij ej = Mij ∂/∂xj
then the components of a given vector v must transform via the inverse matrix:
(v')i = (M -1)ij vj.
Since we are expressing the basis vectors as partial derivatives, the matrix is just given by the chain rule,
Mij = ∂xj/∂yi
and its inverse is given by using the chain rule in the opposite direction,
(M -1)ij = ∂yi/∂xj.
We have written down three equivalent approaches to understanding the tangent space TpM. In this third picture, we can think of TpM as all possible directional derivative operators at p on functions defined on M. In local coordinates, these are all expressible as linear combinations of partial derivatives, {∂/∂xj|p}, j = {1,...,n}. The transformation rule for a change of coordinates is just given by the chain rule on partial derivatives; the basis vectors transform like derivatives, and the components of a vector transform inversely to the basis vectors. If we wanted, we could now define vectors in a fourth way, i.e. that they are merely a set of components {vi} which transform via the inverse chain rule, as above. This definition for vectors would be entirely equivalent to the other three given above, but there is no corresponding intrinsic description of the space TpM. Vectors in this picture lose their intrinsic value as a mathematical object; they no longer "live" anywhere as an element of a topological space. This definition is the most common, however, because it's easiest for purposes of direct computation.
A Bigger Picture
Now that we've constructed the tangent space TpM from several different perspectives, it is a useful question to ask, "How is TpM related to TqM, for different points p and q in the manifold M?" We know that since the manifold is smooth, the vector spaces should somehow mesh smoothly with each other. We can also be sure about how this "meshing" should work on a global scale; this is how we define the tangent bundle, denoted simply TM. However, there is no god-given way of answering this question locally; we need to add additional structure to our manifold before we can compare tangent spaces. This leads to discussions of connections, parallel transport, covariant covariant differentiation, and curvature. All of these notions can be defined before we add the further structure of a metric to the manifold.
The Cotangent Space
It is a mathematical fact of linear algebra that every vector space V is naturally associated to a dual of a dual space, V*. For example, in the case of column vectors in euclidean space, the natural dual space is the space of row vectors. A more interesting example is in a hilbert space, where the linear space of bras < ψ | is associated with the dual space of kets | ψ >. We'll now give a brief explanation of what a dual space is before refining our attention to the special case of the dual of a tangent vector space TpM, which is what we call the cotangent space, Tp*M.
In general, given any n-dimensional+ vector space V, we look at the space of all real-valued linear maps on V. This space of linear maps forms a vector space in its own right, which we call the dual space, V*. This would appear to be an extremely abstract space, but in fact, it is not very different from the space V itself. Think of V* as the evil twin of V. Its initial definition depends on V, but once the computational machinery is put in place, it is entirely possible to reverse their roles and redefine V as the space of linear maps on V*.
A dual vector space is often defined by its basis. Given a set of basis vectors {eα} of a vector space V, we define the dual space V* to be the vector space spanned by the basis vectors {e*β}, where each e*β is a linear map on the set of eα's. Specifically,
e*β[ eα ] = δαβ.
Where δαβ is simply the Kronecker delta, equal to one when the indices agree, and zero when they don't. This gives us a basis for linear maps, in that any linear map ω: V → R can be written as ω = ωβe*β, where the ωβ's are just real coefficients. Indeed, this is definitely a linear map on V, and it is fairly easy to show that any linear map on V can be written in this form. However, our time is best spent studying the special case where V = TpM, V* = Tp*M.
As we have already discussed, TpM can be thought of as the space of directional derivative operators acting on functions defined on M. The basis for this space is the set of partial derivative operators {eα = ∂/∂xα}. We will now introduce the dual basis exactly as we did before, but using very suggestive notation: {eβ = "dxβ "}.
By our definition, dxβ[ ∂/∂xα ] = δαβ.
Noting that any vp in TpM can be written vp = vα ∂/∂xα,
and any ωp in Tp*M can be written ωp = ωβdxβ, we find a general formula for ω acting on v.
ωp[ vp ] = ωβdxβ[ vα ∂/∂xα ] = ωβ vα δαβ = ωα vα.
Dual vectors in Tp*M are known as one-forms. Notice that, although TpM and Tp*M have the same size and essentially the same structure, there is no natural map between the two spaces++. In other words, given an arbitrary vector vp, there is no natural way to associate it with a unique one-form, ωp. We could try identifying them component-wise, via vα ↔ ωα, but the components aren't fundamental; they change under coordinate transformations. It is possible to choose some particular identification between vectors and one-forms, and this additional structure is known as a metric. The bottom line is, you need to input some additional information before you can directly compare vectors and one-forms.
Now, we could just stop here, having defined Tp*M via the dual space, {dxβ}. However, the notation "dxβ " seems to cry out for some motivation. We've seen this notation before in calculus. To see the relationship, consider the following:
Given a function f: M → R, define a special element of Tp*M. Call it ωf. Define ωf by the following:
ωf = (∂f/∂xβ) dxβ. In other words, the coefficients ωβ = ∂f/∂xβ.
Now, let's see what happens if we act with ωf on a vector vp in TpM:
ωf[ vα∂/∂xα ] = (∂f/∂xβ) vβ = vp(f).
ωf[ v ] is the directional derivative of f in the direction of v. It exactly gives us the same result we would get if we acted on f with v as a directional derivative.
We give ωf a new name: ωf = df = (∂f/∂xβ) dxβ.
Now, df[ v ] = v(f) is the directional derivative of f in the v-direction.
Now, we can see that our notation for "df" connects with our notation for "dxβ ". We have used boldface to distinguish them thus far, but soon that will not be necessary. As a special case, let f be the coordinate function f = xβ. Then,
df = d(xβ) = (∂xβ/∂xα) dxα, where "dxα " is still our dual vector notation.
Now, ∂xβ/∂xα = δαβ is just our friendly Kronecker Delta again, which is fairly easy to see, since our coordinates are independent of each other. Therefore, we have the relation
d(xβ) = dxβ. Our notation is consistent.
So, in a fairly unorthodox manner, the notation has lead us to a map d from functions into forms. d is known as the exterior derivative.
You might ask, "What about our notion of dx as a small change in x?" Well, one day, you're going to have to throw that picture out of the window, because the notation "dx" does not actually mean a small change in x. "dx" isn't even really a number. It's a linear map from vectors to numbers. It can act on small vectors to produce small numbers, but it isn't a small number in itself; it's not even an element of R. It's an element of Tp*M. So what does it really mean when we see "dx" in an integral? That's a question I may have to put off until later.