I just completed the very good youtube playlist Tensors for Beginners by eigenchris and I want to jot down some notes before I forget everything.

An (n, m)-tensor is a multi-linear function from m vectors and n covectors to a scalar.

A tensor is a “geometrical object” in the same way that a vector is a “geometrical object” (and a vector is a tensor, so it really is in the same way). We often deal with the coordinates of a vector, which assumes a particular basis. But the exact same vector will have different coordinates if we change the basis. So, the vector itself is “invariant” under a change of basis, but the coordinates are not. However, the coordinates change in a predictable way under a change of basis. All the same is true for tensors (again, vectors are tensors).

Covectors are a new “type of thing”. They’re functions from a vector to a scalar. One concrete way to think about them is that they’re “row vectors”. If you multiply a row vector by a vector, you get a scalar.

Tensor product: So, a covector * vector = scalar. But a vector * covector = matrix. The latter is an example of a tensor product. More generally, a tensor product takes the cartesian product of the inputs, and for each ordered pair, you multiply the elements. So in the simple case of an n-dimensional vector v and an m-dimensional covector c, the tensor product v ⊗ c would have (n x m) dimensions, i.e. it can be represented by an (n x m) matrix! Think about each element of that matrix; the (i, j)th element is the product of the ith element of v and the jth element of c. So, you can see concretely what I mean by “the tensor product takes the cartesian product of the inputs, and for each ordered pair, you multiply the elements”.

Back to “what is a tensor”. A simple (n, m)-tensor can be constructed by the tensor product of n vectors and m covectors. Again, let’s think about a matrix. We just said that a matrix can be constructed via the tensor product of a vector and a covector. So, I guess that means a matrix is a (1, 1)-tensor! So, why did I say “simple” in “A simple (n, m)-tensor …”. Think about the set of matrices you can construct by multiplying a vector v * a row vector c. What’s their rank? Rank 1, of course! Every column is a scaled version of every other column, since all the columns are just scaled versions of v (the jth column is v * c[j]). Same goes for rows; each row is a scaled version of c (the ith row is v[i] * c). A rank 1 matrix is a very boring matrix indeed. If you think about a matrix as a function from vector -> vector (since, when you multiply a matrix by a vector you get a vector), all the output vectors lie on the same line (and that line points in same the direction as v). So, if these are 2-dimensional vectors, the rank 1 matrix will project all 2 dimensional vectors onto a line. Slight tangent, but this corresponds to having a zero determinant, having a zero eigenvalue, and being non-invertible.

So, are all tensors simple and uninteresting in the same way? No, tensors form a vector space, meaning that they can be scaled and added to each other, and the output will be another tensor. To create more interesting tensors, you can take linear combinations of simple tensors. Again, let’s make an analogy to something familiar: vectors. Any vector can be thought of as a linear combination of a set of “basis vectors” (and that’s how we get the vector’s coordinates). In 2-d space, using the standard basis, the two basis vectors are [0,1] and [1,0]. Every other vector is a linear combination of those two “simple” vectors. Tensors work the same way. In fact, if you start with a n-dimensional vector space (with n basis vectors) and a m-dimensional covector space (with m basis covectors), you can construct (n x m) basis (1, 1)-tensors by taking the tensor product of each of the n basis vectors with each of the m basis covectors.

To make that more concrete, let’s say n = 2 and m = 3 and let’s use the standard basis. You can construct the following 6 basis (1, 1)-tensors:

\[\newcommand{\vec}[2]{\left[\begin{matrix}#1\\#2\end{matrix}\right]} \newcommand{\covec}[3]{\left[\begin{matrix}#1 & #2 & #3\end{matrix}\right]} \newcommand{\mat}[6]{\left[\begin{matrix}#1 & #3 & #5 \\ #2 & #4 & #6\end{matrix}\right]} \newcommand{\VS}{V^*} \newcommand{\reals}{\mathbb{R}} \vec{1}{0} \otimes \covec{1}{0}{0} = \mat{1}{0}{0}{0}{0}{0} \\ \vec{1}{0} \otimes \covec{0}{1}{0} = \mat{0}{0}{1}{0}{0}{0} \\ \vec{1}{0} \otimes \covec{0}{0}{1} = \mat{0}{0}{0}{0}{1}{0} \\ \vec{0}{1} \otimes \covec{1}{0}{0} = \mat{0}{1}{0}{0}{0}{0} \\ \vec{0}{1} \otimes \covec{0}{1}{0} = \mat{0}{0}{0}{1}{0}{0} \\ \vec{0}{1} \otimes \covec{0}{0}{1} = \mat{0}{0}{0}{0}{0}{1} \\\]

Now it’s easy to see how those 6 “simple” (1, 1)-tensors form a basis for any (2 x 3)-dimensional (1, 1)-tensor. Another thing that this example makes clear is that (1, 1) does not describe the dimensions of the matrix, it describes the number of vectors and covectors that were combined (via the tensor product) to create the tensor. What is the dimension of the (1, 1)-tensor? In this case it’s (2 x 3), but more generally if we take \(dim(x)\) to be the dimension of \(x\), an (n, m)-tensor has dimension \(dim(v_1) dim(v_2) \cdots dim(v_n) dim(c_1) dim(c_2) \cdots dim(c_m)\). These things can get big, fast!

So what about these linear functions? I started the post by saying: An (n, m)-tensor is a multi-linear function from m vectors and n covectors to a scalar, and yet we’ve barely mentioned functions at all. Well, remember when I said that covectors were functions from a vector to a scalar? We were on to something there.

Let’s denote the vector space of vectors as \(V\). Let’s denote the vector space of covectors (called the dual vector space) with the symbol \(\VS\). Another way to write this would be \(V \rightarrow \reals\), since covectors are functions from a vector to a scalar (in my examples, I’ll use the reals as an example of a scalar, but it could be any field, i.e. rational, algebraic, reals, complex, etc.). So, what do we get when we take the tensor product of a vector and a covector? We already know this: a matrix, i.e. a (1, 1)-tensor. But what is a matrix? As I mentioned above, you can think about a matrix as a (linear) function from vectors to vectors, i.e. \(V \rightarrow V\). What if we rewrote that as \(V \rightarrow (\VS \rightarrow \reals)\)? Kind of weird at first, but if you can think about a covector as a function from a vector to a scalar, can’t we similarly think about a vector as a function from a covector to a scalar? In other words, a covector * vector is a scalar. If we have one argument (either the covector or the vector), then we can treat that argument as fixed and we’re left with a function from the other argument to a scalar. So, to summarize: \((V \times \VS) \rightarrow \reals\), \(V \rightarrow V\), \(V \rightarrow (\VS \rightarrow \reals)\), and \(\VS \rightarrow (V \rightarrow \reals)\) are all ways of saying the same thing.

What do those statements mean in the familiar context of a matrix?

  • \((V \times \VS) \rightarrow \reals\) is saying a matrix is: A function from a row vector and a vector to a scalar. Well, a row vector * a matrix * a vector = a scalar, so yea that checks out.
  • \(V \rightarrow V\) is saying a matrix is: A function from a vector to a vector. Yes, a matrix * a vector = a vector.
  • \(V \rightarrow (\VS \rightarrow \reals)\) is saying a matrix is: A function from a vector to (a function from a row vector to a scalar). A little weird, but ok, since a matrix * a vector = a vector, and vectors are functions from row vectors to scalars.
  • \(\VS \rightarrow (V \rightarrow \reals)\) is saying a matrix is: A function from a row vector to (a function from a vector to a scalar). Huh, this one is a little new. What’s a (1 x n) row vector * an (n x m) matrix? Well, it’s a (1 x m) row vector. And what’s a (1 x m) row vector? We can think of it like a function from an (m x 1) vector to a scalar. Ok, checks out!

So, our (1, 1)-tensor is like a function from a vector and a covector to a scalar, i.e. \((V \times \VS) \rightarrow \reals\). Furthermore, that function can be “partially applied”, i.e. if you pass in just the vector, you get a function from a covector to a scalar: \(V \rightarrow (\VS \rightarrow \reals)\). Likewise, if you pass just the covector, you get a function from a vector to a scalar: \(\VS \rightarrow (V \rightarrow \reals)\).

I think we’re ready to level up from (1, 1)-tensors. What about a (2, 1)-tensor? A (2, 1)-tensor is a (linear) function from 2 covectors and 1 vector to a scalar: \((V \times \VS \times \VS) \rightarrow \reals\). If you provide one covector, you’re left with a (1, 1)-tensor, i.e. \(\VS \rightarrow ((V \times \VS) \rightarrow \reals)\). So, with this recursive viewpoint, we can build up an understanding of an (n, m)-tensor. An (n, m)-tensor is a function from n covectors and m vectors to a scalar, i.e. \((\VS_1 \times \VS_2 \times \cdots \times \VS_n \times V_1 \times V_2 \times \cdots \times V_m) \rightarrow \reals\).