Try your hand at this problem:

Let’s say there are only three families of operating systems in the world: Windows, Linux, and Mac. Let’s also say that, every year:

  • 20% of Windows users switch to Mac, and 1% switch to Linux.
  • 3% of Mac users switch to Windows, and 10% switch to Linux.
  • 2% of Linux users switch to Windows, and 5% switch to Mac.

In the far future, which OS will have the most users? And what percentage of users will they have?

As always, you’ll learn a lot more if you actively participate so if you want to give it a shot, now would be the time.

Given this title of this post, this may not come as a shock, but this is a perfect problem for linear algebra. However, like in the last post, let’s not jump right into matrices because I want to connection to plain ol’ algebra to be clear.

Plain ol’ Algebra

So, let’s set off to solve this world problem with normal algebra. We have three variables we need to keep track of:

  • the number of windows users (as a percentage of total users) - let’s call that \(W\)
  • the number of mac users (as a percentage) - let’s call that \(M\)
  • the number of linux users (as a percentage) - \(L\)

Now let’s convert the yearly user transition information into a system of three equations. I’m going to use a subscript \(X_n\) to represent the value of the variable \(X\) in year \(n\).

\[\begin{align*} W_{n+1} &= 0.79 W_n + 0.03 M_n + 0.02 L_n \\ M_{n+1} &= 0.20 W_n + 0.87 M_n + 0.05 L_n \\ L_{n+1} &= 0.01 W_n + 0.10 M_n + 0.93 L_n \\ \end{align*}\]

So far, all we’ve done is rote translation of our word problem into a set of equations. Now, we need to make a conceptual leap. Let’s assume there exist some values of \(W\), \(M\), and \(L\) for which \(W_{n+1} = W_n\), \(M_{n+1} = M_n\), and \(L_{n+1} = L_n\). If that were true, then we could say that \(W_{n+2} = W_{n+1}\) (to see why, replace \(X_n\) with \(X_{n+1}\) in the system of equations above). In fact, we could say that \(W_x = W_n\) for any \(x\). In other words, the values of \(W\), \(M\), and \(L\) would not change from year to year. Let’s call these hypothetical values of \(W\), \(M\), and \(L\): \(W_s\), \(M_s\), and \(L_s\), respectively (\(s\) for steady state). Now, we can re-write our system of equations as follows:

\[\begin{align*} W_s &= 0.79 W_s + 0.03 M_s + 0.02 L_s \tag{1} \\ M_s &= 0.20 W_s + 0.87 M_s + 0.05 L_s \tag{2} \\ L_s &= 0.01 W_s + 0.10 M_s + 0.93 L_s \tag{3} \\ \end{align*}\]

Three equations, three unknowns… you know the drill! Feel free to skip this next section of pure algebra. There is nothing interesting or fancy going on here.

\[\begin{align*} W_s &= 0.79 W_s + 0.03 M_s + 0.02 L_s \tag{1} \\ 0.21 W_s &= 0.03 M_s + 0.02 L_s \\ W_s &= \frac{0.03 M_s + 0.02 L_s}{0.21} \\ M_s &= 0.20 \frac{0.03 M_s + 0.02 L_s}{0.21} + 0.87 M_s + 0.05 L_s \tag{sub $W_s$ into 2} \\ M_s &\approx 0.8986 M_s + 0.069 L_s \\ 0.1014 M_s &\approx 0.069 L_s \\ M_s &\approx 0.512 L_s \\ W_s &\approx \frac{0.03 0.512 L_s + 0.02 L_s}{0.21} \\ W_s &\approx 0.1925 L_s \\ L_s &= 0.01(0.1925 L_s) + 0.1(0.512 L_s) + L_s \tag{sub $W_s$ and $M_s$ into 3} \\ L_s &= 1 L_s \\ \end{align*}\]

Huh… well that’s not super helpful. Oh yea, since they represent fractions of the total user base, they all need to add up to one!

\[\begin{align*} W_s + M_s + L_s &= 1 \\ 0.1925 L_s + 0.512 L_s + L_s &\approx 1 \\ 0.1925 L_s + 0.512 L_s + L_s &\approx 1 \\ W_s &\approx 0.1028 \\ M_s &\approx 0.3634 \\ L_s &\approx 0.5338 \\ \end{align*}\]

QED! So, the distribution of users that is stable, i.e. will not change over time, is ~54% Linux, ~36% Mac, and ~10% Windows. We haven’t really proven that we’ll actually ever get to that distribution (we won’t, but we’ll approach it), just that if we get there, we will stay there. But, I’m a bit antsy to get to the linear algebra section, given that’s the whole point of this post, so let’s move on.

Linear algebra

Now that we know how to solve the problem without linear algebra, let’s set off to solve it with linear algebra.

First things first, let’s translate our system of three equations into a matrix, since that’s one way to think about what matrices are:

\[\newcommand{T}{\left[\begin{matrix}0.79 & 0.03 & 0.02\\0.2 & 0.87 & 0.05\\0.01 & 0.1 & 0.93\end{matrix}\right]} \newcommand{\vec}[3]{\left[\begin{matrix}#1\\#2\\#3\end{matrix}\right]} \newcommand{\vv}[1]{\overrightarrow{#1}}\] \[\T \vec{W_n}{M_n}{L_n} = \vec{W_{n+1}}{M_{n+1}}{L_{n+1}}\]

As as aside, this matrix is called a transition matrix, since it represents the transition from one year to the next.

Right off the bat, this formulation (along with a numerical library like python’s numpy or sympy, which can exponentiate matrices) gives us the ability to quickly come up with a good guess of the answer. Let’s just say that, right now, everybody is a Window’s user. What will the distribution of users be in 1000 years?

\[\T^{1000} \vec{1}{0}{0} \approx \vec{0.10275689}{0.36340852}{0.53383459}\]

Look familiar? Yep, that’s basically the same solution that we came up with in the previous section (after a lot of annoying algebra). So, already linear algebra has proven useful.

However, I promised you eigenvectors, so let’s deliver on that. Note that I’m assuming that you’ve already read my previous post. If you haven’t, this will probably go much too quickly.

Unlike the last post, I’m not going to go over how to compute all the eigenvectors by hand (since the idea is the same). Instead, I’ll just use python’s sympy library to compute them for me and I’ll write the three (rounded) eigenvectors - along with their corresponding eigenvalues - below:

\[\left [ \left ( 1.0, \quad \left[\begin{matrix}0.1925\\0.6808\\1.0\end{matrix}\right]\right ), \quad \left ( 0.7489, \quad \left[\begin{matrix}0.9011\\-1.9011\\1.0\end{matrix}\right]\right ), \quad \left ( 0.8411, \quad \left[\begin{matrix}-0.1233\\-0.8767\\1.0\end{matrix}\right]\right ) \right ]\]

The first thing to notice is that the eigenvalues of the second and third eigenvectors are less than one, which - like in the fibonacci example - means that any amount of those vectors will decay to zero over time. The eigenvalue of the first eigenvector is 1, which means it will neither decay nor grow over time. It will stay exactly the same as you continually multiply by the matrix. So, with this information, we can essentially predict what will happen for any starting distribution of users.

Take an initial vector \(\vv{v}\) that represents the starting distribution of users across the three families of operating systems. Decompose \(\vv{v}\) into a linear combination of the three eigenvectors, with coefficients \(a\), \(b\), and \(c\), respectively. Over time, \(b\) and \(c\) will decay to zero as they are multiplied over an over by an eigenvalue less than one. \(a\) will stay unchanged (since the eigenvalue corresponding to the first eigenvector is one). So, eventually, you just end up with \(a\) times the first eigenvector (which we will call \(\vv{e_1}\)).

But what is \(a\)? Well, there are a few ways to answer this, but one clever way is to notice that our distribution of users must always sum to one (since… that’s what distributions do). And we know that eventually the distribution will be \(a \vv{e_1}\). So, to ensure that the elements of that long-term distribution sum to one, we can set \(a = 1 / sum(\vv{e_1})\). The actual elements of \(e_1\) are (approximately) 0.1925, 0.6808, and 1.0, so \(a \approx 0.5338\).

And with that, we have our solution:

\[0.5338 \left[\begin{matrix}0.1925\\0.6808\\1.0\end{matrix}\right] \approx \vec{0.10275689}{0.36340852}{0.53383459}\]

And (again), we have our answer! By inspecting the eigenvectors and eigenvalues of the transition matrix, we can quickly understand the long term dynamics of our system.

One last thing to point out. You might be wondering why python’s sympy said that the first eigenvector was \(\approx [0.1925, 0.6808, 1]\) instead of \(\approx [0.1, 0.36, 0.54]\), which was the solution that made sense in our context. Given what’s special about eigenvectors is that they don’t change direction, and they normally do change length (unless it’s a very special case, like in this example, where the corresponding eigenvalue is 1), the length isn’t really important. So, many numerical libraries will just normalize the length of the eigenvector to be one. sympy seems to just set the value of the third element to be one and scales the other elements accordingly. The point is, it doesn’t really matter.