Here’s the punchline: the ratio between successive terms of the Fibonacci sequence converges to the golden ratio (\(\varphi\), 1.618…).

Why give away the punchline so early? Because my goal with this article is to give you an intuitive understanding of eigenvectors (and their corresponding eigenvalues) through explaining this phenomenon, and not the other way around. It’s a tall order, and this is a long post, so let’s get started.

Before delving into the why and how, let’s just demonstrate that it’s true.

n 0 1 2 3 4 5 6 7 8
nth term of Fib 0 1 1 2 3 5 8 13 21
nth term / {n-1}th term     1 2 1.5 1.66 1.6 1.625 1.615

A graph might be better at illustrating just how quickly the ratio approaches \(\varphi\).

The fib function

Let’s imagine a function which takes two (consecutive) elements of the Fibonacci sequence, and returns the next two (with one overlapping). It’s a simple function - here it is:

\[\newcommand{\vv}[1]{\overrightarrow{#1}} \newcommand{\vec}[2]{\left[\begin{matrix}#1\\#2\end{matrix}\right]} \newcommand{\fib}[3]{fib(\vec{#1}{#2}) &= \vec{#2}{#3}}\] \[\begin{align*} \fib{x}{y}{x+y} \end{align*}\]

For example:

\[\begin{align*} \fib{0}{1}{1} \\ \fib{1}{1}{2} \\ \fib{1}{2}{3} \\ \fib{2}{3}{5} \\ \fib{3}{5}{8} \\ \end{align*}\]

You get the idea.

Eigenvectors

Ok, so how does this all relate to eigenvectors? Well, what even is an eigenvector? Google says it’s

a vector which when operated on by a given operator gives a scalar multiple of itself.

That’s a bit abstract (as math often is), so let’s put it in the context of our fib function.

an input vector which, when given to the fib function, produces an output vector which is a scaled version of itself.

Better, but let’s round it out with an example.

Is \(\vec{0}{1}\) an eigenvector of the fib function? \(fib(\vec{0}{1}) = \vec{1}{1}\). Is \(\vec{1}{1}\) just a scaled version of \(\vec{0}{1}\). Nope! So, \(\vec{0}{1}\) is not an eigenvector of the fib function.

Finding an eigenvector of the fib function with guess and check could take a while, so let’s try to solve for one. Remember, we want to find an input vector \(\vec{x}{y}\) such that \(fib(\vec{x}{y})\) is a scaled version of \(\vec{x}{y}\), i.e. \(fib(\vec{x}{y}) = \alpha \vec{x}{y}\). We also know that \(fib(\vec{x}{y}) = \vec{y}{x + y}\). So, we need to solve \(\alpha \vec{x}{y} = \vec{y}{x + y}\).

\[\begin{align*} \alpha \vec{x}{y} &= \vec{y}{x + y} \\ \alpha x &= y \tag{1} \\ \alpha y &= x + y \\ \alpha^2 x &= x + \alpha x \tag{substitute $\alpha x$ for $y$ from 1} \\ \alpha^2 x - \alpha x - x &= 0 \\ \alpha^2 - \alpha - 1 &= 0 \tag{divide by x, assuming $x \neq 0$)} \\ \alpha &= \frac{1 \pm \sqrt{5}}{2} \tag{by the quadratic formula}\\ \end{align*}\]

This gives us two values of \(\alpha\), let’s call:

\[\begin{align*} \alpha_1 &= \frac{1 + \sqrt{5}}{2} \tag{$\varphi$, the golden ratio!} \\ \alpha_2 &= \frac{1 - \sqrt{5}}{2} \tag{$-\varphi + 1$} \end{align*}\]

We wanted to find an eigenvector \([x, y]\). This is easy now that we know \(\alpha\). Let’s arbitrarily choose \(x = 1\) and \(\alpha = \alpha_1\). From \((1)\), we know that \(y = \alpha_1 x\), so \(y = \alpha_1\).

Let’s check our work. We’re claiming that \(\vec{1}{\alpha_1}\) is an eigenvector. \(fib(\vec{1}{\alpha_1}) = \vec{\alpha_1}{1 + \alpha_1}\). Is \(\vec{\alpha_1}{1 + \alpha_1}\) a scaled version of \(\vec{1}{\alpha_1}\)?

The ratio of first elements is easy to compute: \(\frac{\alpha_1}{1} = \alpha_1\)

The ratio of second elements will take a bit of algebra:

\[\begin{align*} \frac{1 + \alpha_1}{\alpha_1} &= \frac{1}{\alpha_1} + 1 \\ &= \frac{2}{1 + \sqrt{5}} + 1 \\ &= \frac{3 + \sqrt{5}}{1 + \sqrt{5}} \\ &= \frac{(3 + \sqrt{5})(1 - \sqrt{5})}{(1 + \sqrt{5})(1 - \sqrt{5})} \\ &= \frac{3 -2 \sqrt{5} - 5}{1 - 5} \\ &= \frac{-2 -2 \sqrt{5}}{-4} \\ &= \frac{1 + \sqrt{5}}{2} \\ &= \alpha_1 \end{align*}\]

So, to summarize, yes! The output vector is a scaled version of the input vector, and it’s scaled by \(\alpha_1\). In other words:

\[fib(\vec{1}{\alpha_1}) = \alpha_1 \vec{1}{\alpha_1}\]

What is the ratio by which the vector is scaled (\(\alpha_1\)) in this context? That’s the eigenvalue. Each eigenvector has a corresponding eigenvalue.

Ok… but why?

Right. Why is this helpful?

Well, for one, it’s really easy to understand what will happen when you call your function on an eigenvector. It will just be multiplied by the eigenvalue.

What happens when you call your function twice on an eigenvector? Still easy, it will be multiplied by the eigenvalue squared.

How about if you call your function 100 times? multiplied by the eigenvalue ^ 100. QED.

It’s worth stressing that this is a huge shortcut! Not only in computation cost, but in intuitive understanding. If I asked you, what’s the 100th term of the fibonacci series, if the fibonacci series started with \(1\) and \(\alpha_1\), you’d probably have no idea off the top of you head. But if you know that \(\vec{1}{\alpha1}\) is an eigenvector of the fibonacci function with an eigenvalue of \(\alpha_1\), then you can calculate this very quickly: \(fib^{100}(\vec{1}{\alpha1}) = \alpha_1^{100} \vec{1}{\alpha1}\). The 100th term is the first term of the output. That’s \(\alpha_1^{100} \approx 1.6^{100} \approx 2.5 \cdot 10^{10}\).

But what about non-eigenvectors?

Yes - the elephant in the room. You’re almost surely not lucky enough to be handed an eigenvector. Here’s where the linear in linear algebra comes in.

What does it mean for a function to be linear? It means the following two properties hold:

  1. \[f(\vv{u} + \vv{v}) = f(\vv{u}) + f(\vv{v})\]
  2. \[f(\alpha \vv{u}) = \alpha f(\vv{u}) \tag{where $\alpha$ is a scalar}\]

If those properties hold, then we can do something quite magical. We can decompose any input vector into a linear combination of eigenvectors, and then deal with them separately. This may be a bit abstract, but bear with me for a second.

Assume \(\vv{e_1}\) and \(\vv{e_2}\) are two eigenvectors of the linear function \(f\) with eigenvalues \(e_1\) and \(e_2\) respectively. We are trying to understand what happens when \(f\) is applied to an input vector \(\vv{v}\). Let’s say we can find two constants \(a\) and \(b\), such that \(a \vv{e_1} + b \vv{e_2} = \vv{v}\). Then, due to the two properties of linear functions above, we can say the following:

\[\begin{align*} f(\vv{v}) &= f(a \vv{e_1} + b \vv{e_2}) \\ &= f(a \vv{e_1}) + f(b \vv{e_2}) \tag{by property 1} \\ &= a f(\vv{e_1}) + b f(\vv{e_2}) \tag{by property 2} \\ &= a e_1 \vv{e_1} + b e_2 \vv{e_2} \tag{due to the fact that $\vv{e_1}$ and $\vv{e_2}$ are eigenvectors} \\ \end{align*}\]

Wow! So we just turned our function application into something much easier to understand (and compute) - namely scalar multiplication and vector addition. Note that in order to do this, we had to know \(a\) and \(b\) up front, which is not completely trivial.

Let’s push this a bit further. Let’s represent the vector \(\vv{v}\) as a linear combination of \({e_1}\) and \({e_2}\) using the following notation: \(\vv{v} = \vec{a}{b}\). Note that this is just shorthand for \(\vv{v} = a \vv{e_1} + b \vv{e_2}\).

Using that notation, we can say \(f(\vv{v}) = \vec{a e_1}{b e_2}\), right? And \(f(f(\vv{v})) = \vec{a e_1^2}{b e_2^2}\). And using my made-up notation for repeated function application: \(f^n(\vv{v}) = \vec{a e_1^n}{b e_2^n}\).

There’s a name for what we’ve just done. It’s called a change of basis. We took our vector \(\vv{v} = \vec{x}{y}\), where \(x\) and \(y\) were coordinates in the standard basis, i.e. \(\vv{v} = \vec{x}{y} = x \vec{1}{0} + y \vec{0}{1}\), and turned it into \(\vv{v} = \vec{a}{b}\) where \(a\) and \(b\) are coordinates in a new basis - the eigenbasis - i.e. \(\vv{v} = \vec{a}{b} = a \vv{e_1} + b \vv{e_2}\). And why did we do it? Because applying the function \(f\) (any number of times) is really easy in the eigenbasis.

Back to fib

For the love of god, give me an example! You got it. Let’s go back to our fib function. Exercise for the reader, show that the fib function is linear.

We solved for the two eigenvalues of the fib function above: \(\alpha_1\) and \(\alpha_2\). Now that we know that they are eigenvalues, I’m going to rename them to be \(e_1\) and \(e_2\) to conform to our notation in the last section:

\[\begin{align*} e_1 &= \frac{1 + \sqrt{5}}{2} \approx 1.618 \\ e_2 &= \frac{1 - \sqrt{5}}{2} \approx -0.618 \end{align*}\]

We also solved for an eigenvector that corresponded to the first eigenvalue, which we will now call \(\vv{e_1} = \vec{1}{e_1}\).

We need an eigenvector that corresponds to our second eigenvalue \(e_2\) as well. Here is one: \(\vv{e_2} = \vec{1}{e_2}\). I’ll leave it as an exercise to the reader to check that this is correct.

Now, let’s try to represent our (non-eigenvector) starting input for the real fibonacci series, namely \(\vec{0}{1}\), in the eigenbasis. That equates for finding an \(a\) and \(b\), such that \(\vec{0}{1} = a \vv{e_1} + b \vv{e_2}\). And, tada, here they are: \(a \approx 0.447\) and \(b = -a \approx -0.4472\).

I’m not going to explain how to compute \(a\) and \(b\) because, well, this is going to be long enough already. But if you want to understand how to compute the coordinates of a vector in a new basis, here’s a link.

To make it more clear why these values of \(a\) and \(b\) do the trick, let’s write it out in full:

\[\vec{0}{1} = a \vv{e_1} + b \vv{e_2} \approx 0.447 \vec{1}{1.618} + -0.447 \vec{1}{-0.618}\]

And even better? A picture:

And now, we will repeatedly apply the \(fib\) function. Remember that on each application of \(fib\), the \(a\) coordinate gets multiplied by \(e_1 \approx 1.618\) and \(b\) gets multiplied by \(e_2 \approx -0.618\).

n 0 1 2 3 4 5 6 7 8 9
\(a e_1^n\) 0.447 0.724 1.171 1.894 3.065 4.96 8.025 12.985 21.01 33.994
\(b e_2^n\) -0.447 0.276 -0.171 0.106 -0.065 0.04 -0.025 0.015 -0.01 0.006
\(a e_1^n\vv{e_1} + b e_2^n\vv{e_2}\) \(\vec{0}{1}\) \(\vec{1}{1}\) \(\vec{1}{2}\) \(\vec{2}{3}\) \(\vec{3}{5}\) \(\vec{5}{8}\) \(\vec{8}{13}\) \(\vec{13}{21}\) \(\vec{21}{34}\) \(\vec{34}{55}\)
ratio   1.0 2.0 1.5 1.667 1.6 1.625 1.615 1.619 1.618

Notice that \(a\) increases exponentially, with a base of \(e_1 = \varphi \approx 1.618\), and \(b\) decreases exponentially, with a base of \(e_2 \approx -0.618\). As \(b\) approaches zero, all you’re left with is some constant (\(a e_1^n\)) times the eigenvector \(\vv{e_1} = \vec{1}{e_1}\). Since the ratio of the second to the first element in the eigenvector \(\vv{e_1}\) is \(\frac{e_1}{1} = e_1 = \varphi\), and the output of this sequence approaches a constant (\(a e_1^n\)) times \(\vv{e_1}\) as \(b \to 0\), it makes sense that the ratio of the output vectors (and therefore consecutive elements of the fibonacci sequence) also converges to \(\varphi\).

Don’t eigenvectors have something to do with matrices, though?

Yep! Matrices are the language of linear algebra because linear functions (often called linear maps) can be represented by a matrix. Why is that true? And is it always true? To be honest, I’m not the best person to explain that. Here’s a wiki article on representing linear maps with matrices, although fair warning, wiki articles on math have a tendency to be overly precise and technical at the cost of understandability.

Although I won’t prove the general case, I can show you how this particular linear map (\(fib\)) can be represented as a matrix.

Consider the matrix \(M\):

\[\newcommand{M}{\left[\begin{matrix}0 & 1\\1 & 1\end{matrix}\right]}\] \[M = \M\]

Now, let’s see the effect of multiplying a vector, I dunno… say \(\vec{0}{1}\), by \(M\):

\[M \vec{0}{1} = \M \vec{0}{1} = \vec{1}{1}\]

And if we multiply again?

\[M^2 \vec{0}{1} = \M \M \vec{0}{1} = \vec{1}{2}\]

How about 9 times?

\[M^9 \vec{0}{1} = \vec{34}{55}\]

Multiplication by \(M\) is the \(fib\) function! What are the eigenvectors and eigenvalues of the matrix? You guessed it. The same as the \(fib\) function.

Questions to test your understanding:

From above:

  1. Show that the \(fib\) function is linear.

  2. Show that \(\vv{e_2}\) is an eigenvector of the \(fib\) function.

In addition:

  1. Let’s say I have a linear function which stretches vectors in the x-direction. More specifically, it doubles the x values. What are the eigenvectors and values of that function?

  2. Let’s say I have a linear function which rotates vectors by 90°. What are the eigenvectors and eigenvalues of that function?

  3. What is a fast way to compute \(M^n\) for large values of \(n\)? How expensive is it if you just repeatedly multiplied by \(M\)? How expensive is the fast version?

  4. Are there any two starting values for the fibonacci sequence that would not eventually converge to some scaled version of \(\vv{e_1}\)? What would it converge to?

  5. How many eigenvectors does the M matrix have? How many eigenvalues?