Jekyll2020-02-11T00:39:20+00:00http://blog.russelldmatt.com/feed.xmlBlogDifferent infinities, and why it matters2020-01-21T00:00:00+00:002020-01-21T00:00:00+00:00http://blog.russelldmatt.com/2020/01/21/different-infinities<style> #sketch { max-width: 100%; width: 410px; height: 300px; display: block; margin: 30px auto 30px; } </style> <script src="/assets/js/p5/0.8.0/p5.js"></script> <script src=" /assets/by-post/different-infinities/sketch.js"></script> <div id="sketch"> </div> <style> table { max-width: 600px; } </style> <script type="math/tex; mode=display">\newcommand{\N}{\mathbb{N}} \newcommand{\Q}{\mathbb{Q}} \newcommand{\R}{\mathbb{R}}</script> <p>You may already know the punchline of Georg Cantor’s work on infinity, which is that there are <em>different sizes of infinity</em>. I’ve also “known” this for a while, but it was one of the many mathematical curiosities that I could recite, but that I didn’t really understand at any deep level. Recently, while learning about Godel’s theorems and computability, I ran head first into a practical consequence of this result that I’d like to discuss in this post.</p> <h3 id="what-does-it-mean-for-two-infinite-sets-to-have-different-sizes">What does it mean for two infinite sets to have different sizes?</h3> <p>For the record, that this has always been a very counterintuitive idea to me. Growing up, I heard your typical grade-school examples about infinity: <em>infinity + 1 = infinity</em>, <em>infinity * 2 = infinity</em>, or even <em>infinity * infinity = infinity</em>. From these examples, I drew the natural conclusion that infinity was this sort of black-hole from which you cannot escape. Once something was infinite, it didn’t really matter what you did to it, it just stayed infinite. A perhaps less well-founded extrapolation was that there was only one infinity. Given its black-hole-like nature, it seemed impossible to distinguish between two things that were both infinity - so maybe they were both the same size.</p> <p>So it struck me as very odd when I learned that, in fact, there are different sizes of infinities, with the typical examples being “countable” vs. “uncountable”. What does that even mean?</p> <p>Here is a rigorous definition of what it means for two sets (infinite or not) to be the same size:</p> <blockquote> <p>Two sets are the same size if there is an invertible function that maps between them.</p> </blockquote> <p>With that in mind, let’s work through our grade-school examples. In each case, I will pick two sets whose sizes correspond to the number on each side of the equation. For example, if the right hand side of one equation is <em>infinity</em>, I might pick the set of all natural numbers, <script type="math/tex">\N</script>, to represent that number. Then, we will check whether the sets representing each side of the equation are the same size (according to the stated definition above). If they are, then we can argue that the equals sign in the equation is justified.</p> <ul> <li><em>infinity + 1 = infinity</em> <ul> <li>Set for left hand side (<em>infinity + 1</em>): the set of all natural numbers with the addition of a the single negative number (-1) (i.e. <script type="math/tex">\{-1, \N\}</script>).</li> <li>Set for right hand side (<em>infinity</em>): the set of all natural numbers, <script type="math/tex">\N</script>.</li> <li>Invertible function that maps elements of the left set to elements of the right set: <script type="math/tex">f(x) = x + 1</script>?</li> <li>So, indeed, those sets are the same size.</li> </ul> </li> <li><em>infinity * 2 = infinity</em> <ul> <li>Set for LHS (<em>infinity * 2</em>): The set of all natural numbers, <script type="math/tex">\N</script>.</li> <li>Set for RHS (<em>infinity</em>): The set of all even natural numbers.</li> <li>Invertible function that each natural numbers to a unique even number: <script type="math/tex">f(x) = 2x</script>.</li> <li>So, again, according to our definition of what it means for two sets to have the same size, the set of even natural numbers and the set of all natural numbers have the same size.</li> </ul> </li> <li><em>infinity * infinity = infinity</em>: <ul> <li>Set for LHS (<em>infinity * infinity</em>): The set of all rational numbers (fractions), <script type="math/tex">\Q</script>.</li> <li>Set for RHS (<em>infinity</em>): The set of all natural numbers, <script type="math/tex">\N</script>.</li> <li>Why am I saying that the set of all rational numbers has a size of <em>infinity * infinity</em>? Many of you will have seen this before, but look at the table below that shows one way of arranging all rational numbers into an 2-D grid. Each side of the grid has all the natural numbers (except for 0 in the denominator), so the total numbers of squares in the grid (and therefore rational numbers) is <em>infinity * infinity</em>.</li> <li>Invertible function that maps each rational number to a unique natural number: It turns out, this function exists, but it’s a bit more complicated than the last two. The trick is to iterate through the 2-D grid in a zig-zag fashion as opposed to the more intuitive way of counting row by row. I recognize that’s not nearly enough to intuit the solution if you haven’t already seen it, but rather than reproduce it in full here, I will refer you to <a href="/assets/by-post/different-infinities/recounting.pdf">this short paper</a>.</li> </ul> </li> </ul> <div style="display:flex; justify-content:center; margin: 20px 0px;"> <table> <thead> <tr> <th>Denominator ↓ \ Numerator →</th> <th>0</th> <th>1</th> <th>2</th> <th>3</th> <th>…</th> </tr> </thead> <tbody> <tr> <td>1</td> <td>0/1</td> <td>1/1</td> <td>2/1</td> <td>3/1</td> <td>…</td> </tr> <tr> <td>2</td> <td>0/2</td> <td>1/2</td> <td>2/2</td> <td>3/2</td> <td>…</td> </tr> <tr> <td>3</td> <td>0/3</td> <td>1/3</td> <td>2/3</td> <td>3/3</td> <td>…</td> </tr> <tr> <td>…</td> <td>…</td> <td>…</td> <td>…</td> <td>…</td> <td>…</td> </tr> </tbody> </table> </div> <div class="aside"> <p>One way of arranging all rational numbers in a 2-D grid.</p> </div> <p>From the three examples above, you can see how we can use our definition (of what it means for two sets to have the same size) to answer the question of whether two particular sets are the same size. It may not be easy to find the invertible function between the two size, as in the case of the rational numbers, but at least it’s clear.</p> <p>Before moving on, we should point out at least one example of two infinite sets that are <em>not</em> the same size, and I will use the classic example of the real numbers <script type="math/tex">\R</script> and the natural numbers <script type="math/tex">\N</script>. Instead of actually proving that you <em>cannot</em> find an invertible mapping between these two sets (which is not an easy thing to prove), I will suggest two things:</p> <ol> <li> <p>Try it! Try to construct an invertible function that takes real numbers and produces a natural number (or vice versa) and you will get an intuitive sense for why it’s impossible.</p> </li> <li> <p>Read <a href="https://en.wikipedia.org/wiki/Cantor%27s_diagonal_argument">Cantor’s diagonal argument</a> for a proof of why it cannot be done.</p> </li> </ol> <p>To sum up, two sets are the same size if it’s possible to construct an invertible function that maps between them. A set being <em>countably infinite</em> means that it is the same size as the natural numbers (e.g. <script type="math/tex">\Q</script>). A set being <em>uncountably infinite</em> means that it is bigger (e.g. <script type="math/tex">\R</script>). Uncountably infinite is a blanket term that encompasses infinitely many different sizes.</p> <h3 id="why-does-this-matter-in-real-life">Why does this matter in real life?</h3> <p>I’m sure there are many good answers to this question, but I’m going to use the one that I stumbled across while learning about Godel’s theorems and computability.</p> <p>How do computers represent things? With sequences of bits. In other words, computers represent things in binary.</p> <p>This is <em>almost</em> too obvious to say, but there’s an invertible mapping between sequences of bits and the natural numbers - just consider the bits to be the base-2 representation of the number (that’s basically what <em>binary</em> means).</p> <p>So, if an arbitrary set <script type="math/tex">X</script> is countably infinite, then - by definition - there’s an invertible mapping between <script type="math/tex">X</script> and <script type="math/tex">\N</script>. If you have that mapping, then it’s trivial to construct a mapping between <script type="math/tex">X</script> and <em>sequences of bits</em>, the practical consequence of which is that you can represent elements of the set <script type="math/tex">X</script> on a computer!</p> <p>So, what goes wrong when the set in question is uncountably infinite, like <script type="math/tex">\R</script>? Well, you cannot come up with a unique number - and therefore a unique sequence of bits - for each real number. At some point, multiple real numbers will need to share the same representation on the computer. And this leads to well-known issues such as:</p> <div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">&gt;&gt;&gt;</span> <span class="mf">0.1</span> <span class="o">+</span> <span class="mf">0.2</span> <span class="mf">0.30000000000000004</span> <span class="o">&gt;&gt;&gt;</span> <span class="mf">0.1</span> <span class="o">+</span> <span class="mf">0.2</span> <span class="o">-</span> <span class="mf">0.3</span> <span class="o">==</span> <span class="mi">0</span> <span class="bp">False</span> </code></pre></div></div> <div class="aside"> <p>The code snippet above was produced using python, but most programming languages will suffer the same fate. And if they don’t for this example, then they will for some other example.</p> </div> <p>In fact, infinitely many real numbers will end up sharing the same representation. For example, you can pick some countably infinite set of real numbers to represent perfectly, but that will leave an uncountably infinite number of real numbers “left over”. In practice, we squash an uncountably infinite number of real numbers into every unique floating point number and hope the loss of precision isn’t too important.</p> <h3 id="are-real-numbers-forever-doomed-to-be-misrepresented-on-computers">Are real numbers forever doomed to be misrepresented on computers?</h3> <p>This is <em>highly</em> speculative (in the sense that I’m not sure I know what I’m talking about), but I think that the way quantum computers represent things, it may be possible for those computers to accurately represent real numbers. Of course, even if true, that only buys us one additional level of infinity over classical computers. Any sets that are “bigger” than <script type="math/tex">\R</script> will fail to be represented even on quantum computers.</p> <p>Why do I think that quantum computers can represent real numbers? Well, as opposed to a classical bit, which is either 0 or 1, a qubit (a bit on a quantum computer) is in some superposition of 0 and 1. One way to think about that is that there is some probability <script type="math/tex">p</script> that, when measured, the qubit will be a 0 and probability <script type="math/tex">(1-p)</script> that it will be a 1. So, in some sense, I need a real number, <script type="math/tex">p</script>, for each qubit to represent the state of the computer at any given time. In fact, you need more than that due to entanglement (correlation) between qubits, but that just strengthens my argument.</p>Introducing the stats series2020-01-01T00:00:00+00:002020-01-01T00:00:00+00:00http://blog.russelldmatt.com/2020/01/01/introducing-stats<p>This is a short post just to introduce the upcoming <strong>stats</strong> series. A series is just a set of related blog posts that will likely build on each other.</p> <p>The goal of this series is to build a strong foundation for understanding some very basic statistical methods and tests (e.g. linear regression, t-test, etc.).</p> <p>I will try to keep the following list up to date. In addition, I will tag all posts in this series with both the <code class="language-plaintext highlighter-rouge">stats</code> tag as well as the <code class="language-plaintext highlighter-rouge">series</code> tag.</p> <ul> <li> <a href="/2020/01/01/remember-linear-regression.html">How to remember linear regression</a> <span class="post-meta">Jan 1, 2020</span> </li> <li> <a href="/2020/01/01/introducing-stats.html">Introducing the stats series</a> <span class="post-meta">Jan 1, 2020</span> </li> </ul>This is a short post just to introduce the upcoming stats series. A series is just a set of related blog posts that will likely build on each other.How to remember linear regression2020-01-01T00:00:00+00:002020-01-01T00:00:00+00:00http://blog.russelldmatt.com/2020/01/01/remember-linear-regression<p>Linear regression is one of the most useful tools in statistics, but the formula is a little hard to remember. If you’re trying to find the “best fit” <script type="math/tex">x</script> in the equation <script type="math/tex">Ax \approx b</script>, here is the solution:</p> <script type="math/tex; mode=display">(A^T A)^{-1} A^T b</script> <p>If you’re expecting me to be able to produce that formula from memory… don’t hold your breath.</p> <p>However, if you understand what a linear regression <em>is</em>, then re-deriving this formula is actually shockingly easy. And, importantly, remembering “what a linear regression is” is much easier than remembering some (relatively) complicated formula. I suspect this is often true - that remembering how to derive a formula from simple ideas is easier than remembering the formula itself.</p> <div class="aside"> <p>An aside: I often struggle with finding the appropriate level at which to target my explanations. Ideally, I’d like to assume no previous knowledge and explain things from scratch, but then the posts become so long as to be useless. But if I explain things too tersely, the only people following along are the people who already understand the explanation! So, in this case, I’m going to write two versions: a short version and a long one.</p> </div> <h3 id="what-is-a-linear-regression-the-short-version">What is a linear regression: the short version</h3> <p>With real life (noisy) data, there won’t be an exact solution to the equation <script type="math/tex">Ax = b</script>. Put another way, <script type="math/tex">b</script> does not live in the column space of <script type="math/tex">A</script>. We need to find a vector which does live in the column space of <script type="math/tex">A</script> and which minimizes the squared errors between itself and <script type="math/tex">b</script>. Let’s call this vector <script type="math/tex">b^*</script>.</p> <p>What are the errors between <script type="math/tex">b^*</script> and <script type="math/tex">b</script>? Simply <script type="math/tex">b - b^*</script>, which is itself another vector. Let’s call this <script type="math/tex">\epsilon</script> (for errors). We don’t care about the errors themselves as much as the sum of the squared errors, which is just the squared length of <script type="math/tex">\epsilon</script>.</p> <p>To recap, we want to find the <script type="math/tex">b^*</script> that lives in the column space of <script type="math/tex">A</script> and which minimizes the length of <script type="math/tex">\epsilon</script>. Note that our three vectors form a triangle, i.e. <script type="math/tex">b^* + \epsilon = b</script>. At this point, the solution might become clear. If we take <script type="math/tex">b^*</script> to be the projection of <script type="math/tex">b</script> onto the column space of <script type="math/tex">A</script>, then <script type="math/tex">b^*</script> and <script type="math/tex">\epsilon</script> will form a <em>right</em> triangle with hypotenuse <script type="math/tex">b</script>, and that will minimize the length of <script type="math/tex">\epsilon</script>.</p> <p>If that’s not obvious, consider a line <script type="math/tex">A</script> and a point <script type="math/tex">b</script> which is not already on the line. What’s the minimum distance from <script type="math/tex">b</script> to <script type="math/tex">A</script>? It’s the distance of the line which connects <script type="math/tex">b</script> to <script type="math/tex">A</script> and which is perpendicular to <script type="math/tex">A</script>, i.e. it’s the line between <script type="math/tex">b</script> and <script type="math/tex">b</script>’s projection onto <script type="math/tex">A</script>.</p> <p>That’s all you have to remember in order to derive the formula for linear regression.</p> <script type="math/tex; mode=display">% <![CDATA[ \begin{align*} Ax &= b^* \tag{1} \\ A \bot (b - b^*) \tag{2} \\ A \bot (b - Ax) \tag{substitute 1 into 2} \\ A^T (b - Ax) &= 0 \tag{the definition of perpendicular} \\ A^T b - A^TAx &= 0 \\ A^T b &= A^TAx \\ (A^TA)^{-1} A^T b &= x \tag*{$\square$} \\ \end{align*} %]]></script> <h3 id="what-is-a-linear-regression-the-long-version">What is a linear regression: the long version</h3> <p>In general, a linear regression is trying to find coefficients to a linear equation that minimize the sum of the squared errors. For example, let’s say you think there’s roughly a linear relationship between the square footage of a house (sqft), the median price of all houses in that house’s neighborhood (medprice), and the price of the house (price), i.e.</p> <script type="math/tex; mode=display">\mathrm{price} = c_2 (\mathrm{sqft}) + c_1 (\mathrm{medprice}) + c_0</script> <p>Furthermore, you have some data. For each data point (house), you have the three relevant values (sqft, medprice, and price). We can organize this data into a single equation, <script type="math/tex">Ax=b</script>, using matrices where:</p> <script type="math/tex; mode=display">% <![CDATA[ \overset{A}{ \begin{bmatrix} \vert & \vert & \vert \\ \mathrm{sqft} & \mathrm{medprice} & 1 \\ \vert & \vert & \vert \\ \end{bmatrix} } \overset{x}{ \begin{bmatrix} c_2 \\ c_1 \\ c_0 \end{bmatrix} } = \overset{b}{ \begin{bmatrix} \vert \\ \mathrm{price} \\ \vert \\ \end{bmatrix} } %]]></script> <p>Each row in <script type="math/tex">A</script> will contain the two predictor values (sqft and medprice) for a given home, along with a constant 1 (to account for the <script type="math/tex">c_0</script> in our linear equation) and the corresponding row in <script type="math/tex">b</script> will have the responder variable (price).</p> <p>Now, importantly, this equation will almost always have no solution. To understand why, notice that we are trying to find a linear combination of the three columns of <script type="math/tex">A</script> that equals the vector <script type="math/tex">b</script>. We haven’t yet specified how many data points we have, but for the sake of this part of the explanation let’s assume it’s 100. That means we have a vector <script type="math/tex">b</script> which lives in a 100-dimensional space. If that throws you for a loop, think about how a vector with 2 elements lives in the x-y coordinate plane - a 2-D space, while a vector with three elements lives in the x-y-z coordinate system - a 3-D space. So, the vector <script type="math/tex">b</script> - with 100 elements - lives in a 100-dimensional space. So do the columns of <script type="math/tex">A</script>. They are, after all, each vectors with 100 elements.</p> <p>If you consider a single column of <script type="math/tex">A</script> by itself, and take all linear combinations of it (i.e. you scale it by any value), you will end up with a single line in that 100-dimensional space. We call that a 1-D subspace of the 100-dimensional space. If you consider two columns of A, and take all linear combinations of them, you will end up with a 2-D subspace (a plane through the origin) within that 100-dimensional space. And, probably obviously now, if you consider all three column vectors of A, and take all linear combinations of them, you will end up with a 3-D subspace of the 100-dimensional space. To throw some terminology at you, that 3-D subspace is <em>spanned</em> by the three column vectors of <script type="math/tex">A</script>, and it called the <em>column space of A</em>.</p> <p>Understanding how linear combinations of vectors span a space is critical, so I’ll include the following gif from 3Blue1Brown to help you understand it visually. Notice how by changing the coefficients <script type="math/tex">a</script> and <script type="math/tex">b</script>, their linear combination (<script type="math/tex">av + bw</script>) can point anywhere in the plane. That means they <em>span</em> the plane.</p> <div style="display:flex; justify-content:center; margin: 20px 0px;"> <img src=" /assets/by-post/remember-linear-regression/linear-combination.gif" /> </div> <p>I previously said that it’s unlikely that our equation <script type="math/tex">Ax = b</script> has a solution. Why is that? We can now understand that our equation only has a solution if the 100-dimensional vector <script type="math/tex">b</script> happens to lie within the 3-dimensional column space of <script type="math/tex">A</script>. That’s kind of like hoping that a bunch of points in 3-D space happen to fall exactly on a single (1-D) line (although a much more extreme version of that). It might happen, but with real data that likely has some noise in it, it’s extremely unlikely.</p> <p>So, since there’s no solution, we can’t just solve the equation directly by computing <script type="math/tex">x = A^{-1}b</script>. In fact, why don’t we stop writing <script type="math/tex">Ax=b</script>, because that’s a little misleading given there’s no solution (it’s like writing <script type="math/tex">5x = 1</script> and <script type="math/tex">2x = 2</script>, solve for <script type="math/tex">x</script>). Instead, let’s write <script type="math/tex">Ax = b^*</script>, where we assert that this equation has a solution. In other words, the only candidates for <script type="math/tex">b^*</script> are the vectors in the column space of <script type="math/tex">A</script>.</p> <p>The next step is to figure out which <script type="math/tex">b^*</script> is “best”. Linear regression is defined as trying to minimize the squared errors, so we want the <script type="math/tex">b^*</script> that minimizes the sum of the squares of the elements of <script type="math/tex">b - b^*</script>. At this point, I’m going to <a href="#what-is-a-linear-regression-the-short-version">refer you to the short version</a>. I’ve hopefully filled in the relevant background information to make that explanation accessible.</p> <h3 id="quick-demonstration-that-it-works">Quick demonstration that it works</h3> <iframe id="notebook" style="width: 800px; max-width: 100%; border: none;" src=" /assets/by-post/remember-linear-regression/notebook.html"> </iframe> <script src="/assets/js/iframe.js"></script> <script> let notebook = document.getElementById("notebook"); autoAdjustIframeHeight(notebook); </script>Linear regression is one of the most useful tools in statistics, but the formula is a little hard to remember. If you’re trying to find the “best fit” in the equation , here is the solution:Fractional Derivatives2019-11-27T00:00:00+00:002019-11-27T00:00:00+00:00http://blog.russelldmatt.com/2019/11/27/fractional-derivatives<style> hr { margin: 20px 0px; } canvas, img { box-shadow: 5px 5px 5px grey; border: 1px solid grey; width: 500px; max-width: 100%; height: 500px; display: block; margin: 30px auto 30px; } </style> <p>What’s the <script type="math/tex">\frac{1}{3}</script>rd derivative of <script type="math/tex">sin(x)</script>?</p> <p>What an absurd question - does it even make sense? I think so, but in order to build up some intuition let’s take a few steps back.</p> <hr /> <p>Imagine that you lived in the middle ages and you were comfortable with the concepts of addition and multiplication. You even understand exponents, as a shorthand for repeated multiplication. Then someone asks you, what’s <script type="math/tex">2^{\frac{1}{2}}</script>?</p> <p>Nonsense, right? <script type="math/tex">2^3</script> means <script type="math/tex">2 \cdot 2 \cdot 2</script>. There are three twos. You can’t have half a two.</p> <p>Well, as I’m sure you know, yes - you can. But think about it for a second. What does it mean? What does it mean to multiply by <script type="math/tex">x</script> half a time?</p> <p><script type="math/tex">x^n</script> <em>is</em> the number that you get when you multiply <script type="math/tex">x</script> <script type="math/tex">n</script>-times. Thinking about it this way makes the property <script type="math/tex">x^a \cdot x^b = x^{a+b}</script> obvious. If you multiply by <script type="math/tex">x</script> <script type="math/tex">a</script> times, and then <script type="math/tex">b</script> more times, you’ve multiplied by <script type="math/tex">x</script> <script type="math/tex">(a+b)</script> times. And that property is nice, because it makes sense even when <script type="math/tex">n</script> is not an integer. If I do something <script type="math/tex">\frac{1}{2}</script> a time and then I do it again <script type="math/tex">\frac{1}{2}</script> a time, how many times have I done it? <script type="math/tex">1</script> time, right?</p> <p>Which brings us to the (obvious because we already learned it) answer, which is that <script type="math/tex">2^{\frac{1}{2}} \cdot 2^{\frac{1}{2}} = 2^1 = 2</script>, i.e. <script type="math/tex">2^{\frac{1}{2}} = \sqrt{2}</script>.</p> <hr /> <p>Let’s generalize a bit and talk about <em>repeated function application</em>.</p> <p>Consider the function <script type="math/tex">f(x) = x + 10</script>. What’s <script type="math/tex">f(f(x))</script>? That’s pretty easy:</p> <script type="math/tex; mode=display">f^2(x) = f(f(x)) = f(x+10) = x+20</script> <p>Ok, how about <script type="math/tex">f^{\frac{1}{2}}(x)</script>? Given the setup, I bet you can figure it out. It’s some function that, when applied twice, gives us <script type="math/tex">f(x)</script>. What might that be? <script type="math/tex">g(x) = x+5</script> seems like a good guess.</p> <p>Let’s check it:</p> <script type="math/tex; mode=display">% <![CDATA[ \begin{align*} f^{\frac{1}{2}}(x) &= g(x) = x + 5 \\ g(g(x)) &= g(x+5) = x+10 = f(x) \end{align*} %]]></script> <p>Ok, how about another this one? If <script type="math/tex">f(x) = 2x</script>, what’s <script type="math/tex">f^{\frac{1}{2}}(x)</script>? Again, you can guess it. It’s <script type="math/tex">g(x) = \sqrt{2}x</script>.</p> <p>Alright, now let’s level up. Previously we were dealing with functions from a number to a number, but functions can take other types of things too. How about a function <script type="math/tex">f</script> which takes, as input, a function <script type="math/tex">h</script> and returns a new function? What does it do to the function? Let’s start with something easy, like it shifts it <script type="math/tex">10</script> to the right:</p> <script type="math/tex; mode=display">f(h(x)) = h(x - 10)</script> <p>Can we guess the answer for <script type="math/tex">f^{\frac{1}{2}}(x)</script>? I’m going to go out on a limb and say yes. If you want to do something twice such that the end result is shifting <script type="math/tex">10</script> to the right, shifting <script type="math/tex">5</script> to the right each time will probably do the trick.</p> <script type="math/tex; mode=display">% <![CDATA[ \begin{align*} f^{\frac{1}{2}}(h(x)) &= g(h(x)) = h(x-5) \\ g(g(h(x))) &= g(h(x-5)) = h(x - 10) \end{align*} %]]></script> <hr /> <p>Ok, now for the finale. What if our function takes the derivative of the input function? In other words:</p> <script type="math/tex; mode=display">f(h) = \frac{d}{dx}h</script> <p>Eek… that is a bit harder.</p> <p>Let’s take a quick detour and draw an analogy to linear algebra, specifically eigenvectors. If you want to multiply a vector, <script type="math/tex">v</script>, by a matrix, <script type="math/tex">M</script>, <script type="math/tex">n</script> times (where <script type="math/tex">n</script> is sufficiently large), a fast way to do it is to follow these three steps:</p> <ol> <li>Compute the eigenvectors of the matrix <script type="math/tex">M</script>. These are the vectors that, when multiplied by <script type="math/tex">M</script>, are just scaled by a constant (the constant being the eigenvalue).</li> <li>Decompose your vector into a linear combination (weighted sum) of those eigenvectors.</li> <li>Your answer is the linear combination of those eigenvectors, where each eigenvector is first scaled by its eigenvalue to the <script type="math/tex">n</script>th power.</li> </ol> <p>I tried to explain why this works in depth <a href="/2019/03/09/golden-fibonacci.html">here</a>, but the quick summary is that we found special inputs (the eigenvectors) which were particularly easy compute for our function (multiplication by <script type="math/tex">M</script>), and then we reformulated our answer as a weighted sum of the function applied to those special inputs (<script type="math/tex">n</script> times). In doing so, we turned our somewhat hard problem into a much easier one.</p> <p>One thing to mention is that this only work for <em>linear</em> functions, i.e. functions <script type="math/tex">f</script> which have the following two properties:</p> <ol> <li> <script type="math/tex; mode=display">f(u + v) = f(u) + f(v)</script> </li> <li> <script type="math/tex; mode=display">f(\alpha u) = \alpha f(u) \tag{where $\alpha$ is a scalar}</script> </li> </ol> <p>Does the derivative function have these properties? Actually yes:</p> <ol> <li> <script type="math/tex; mode=display">\frac{d}{dx}(f + g) = \frac{d}{dx}(f) + \frac{d}{dx}(g)</script> </li> <li> <script type="math/tex; mode=display">\frac{d}{dx}(\alpha f) = \alpha \frac{d}{dx}(f) \tag{where $\alpha$ is a scalar}</script> </li> </ol> <p>The derivative is a <em>linear function</em> (often called a <em>linear operator</em>). So, we can utilize the same trick.</p> <p>Can you think of any functions which have a derivative that are equal to the function itself (or, maybe, a scaled version of it)?</p> <p>Yep, you bet: <script type="math/tex">\frac{d}{dx}(e^x) = e^x</script>, and <script type="math/tex">\frac{d}{dx}(e^{\alpha x}) = \alpha e^{\alpha x}</script>.</p> <p><script type="math/tex">e^{\alpha x}</script> is an <em>eigenfunction</em> of the derivative function. How cool!</p> <p>So, <em>if</em> we could represent our input function <script type="math/tex">h</script> as a weighted sum of exponential functions, then we can trivially take the derivative any number of times (where that number doesn’t have to be an integer).</p> <p>Oh, what’s that you say? The fourier transform can convert any function into a integral (read: weighted sum) of complex exponential functions (sometimes called complex sinusoids)?</p> <script type="math/tex; mode=display">% <![CDATA[ \begin{align*} \hat{f}(\omega) &= \int_{-\infty}^{\infty} f(x) e^{-2 \pi i x \omega } dx \\ f(x) &= \int_{-\infty}^{\infty} \hat{f}(\omega) e^{2 \pi i x \omega } d\omega \\ \end{align*} %]]></script> <p>So, we’ve rewritten our function as a weighted sum of eigenfunctions of the derivative operator. The weights are <script type="math/tex">\hat{f}(\omega)</script> and the eigenfunctions are <script type="math/tex">e^{2 \pi i \omega x}</script>. So, now we can trivially<sup id="fnref:1"><a href="#fn:1" class="footnote">1</a></sup> take the <script type="math/tex">n</script>th derivative:</p> <script type="math/tex; mode=display">\frac{d^n}{dx^n} f(x) = \int_{-\infty}^{\infty} (2 \pi i \omega)^n \hat{f}(\omega) e^{2 \pi i x \omega } d\omega \\</script> <hr /> <p>At this point, we’ve solved how to take the <script type="math/tex">n</script>th derivative in the general case, but we haven’t technically answered our original question: what’s the <script type="math/tex">\frac{1}{3}</script>rd derivative of <script type="math/tex">sin(x)</script>?</p> <p>Lucky for us, the fourier transform of <script type="math/tex">sin(x)</script> is quite simple. To get a handle on it, let’s first graph <script type="math/tex">f(t) = e^{it}</script>. Unfortunately, since <script type="math/tex">e^{it}</script> is a complex number for a given <script type="math/tex">t</script>, in order to graph the function for a range of <script type="math/tex">t</script> values I’d need 3 dimensions. So, instead, I’ll graph <script type="math/tex">e^{it}</script> as a function of time (time will by my 3rd dimension).</p> <div id="sketch1"></div> <p>So that’s a single complex exponential function. What if we add one more which rotates at exactly the same rate but in the opposite direction, and then add the two values together?</p> <div id="sketch2"></div> <p>The imaginary (vertical) components cancel each other out perfectly and all we’re left with is a real number, which is twice a <script type="math/tex">sin</script> curve.</p> <p>Analytically,</p> <script type="math/tex; mode=display">f(x) = sin(x) = \frac{1}{2} (-i e^{ix} + i e^{-ix})</script> <p>Why multiply <script type="math/tex">e^{ix}</script> by <script type="math/tex">-i</script> and <script type="math/tex">e^{ix}</script> by <script type="math/tex">i</script>? Since <script type="math/tex">sin</script> starts at 0, I want the counter-clockwise complex exponential (<script type="math/tex">e^{ix}</script>) to start out pointing down (and multiplication by <script type="math/tex">-i</script> will rotate clockwise by <script type="math/tex">\pi/2</script>. Similarly, I want the clockwise one (<script type="math/tex">e^{-ix}</script>) to start out pointing up (and multiplying by <script type="math/tex">i</script> will do that).</p> <p>Let’s test our function for a few values of <script type="math/tex">x</script>:</p> <script type="math/tex; mode=display">% <![CDATA[ \begin{align*} sin(0) &= \frac{1}{2} (-i e^{i0} + i e^{-i0}) = \frac{1}{2} (-i + i) = 0 \\ sin(\pi/2) &= \frac{1}{2} (-i e^{i\pi/2} + i e^{-i\pi/2}) = \frac{1}{2} (-i \cdot i + i \cdot -i) = \frac{1}{2} (1 + 1) = 1 \\ sin(\pi) &= \frac{1}{2} (-i e^{i\pi} + i e^{-i\pi}) = \frac{1}{2} (-i \cdot -1 + i \cdot -1) = \frac{1}{2} (i + -i) = 0 \\ sin(3\pi/2) &= \frac{1}{2} (-i e^{i3\pi/2} + i e^{-i3\pi/2}) = \frac{1}{2} (-i \cdot -i + i \cdot i) = \frac{1}{2} (-1 + -1) = -1 \\ \end{align*} %]]></script> <p>So far so good. How about its derivative?</p> <script type="math/tex; mode=display">% <![CDATA[ \begin{align*} \frac{d}{dx}sin(x) &= \frac{d}{dx} \big( \frac{1}{2} (-i e^{ix} + i e^{-ix}) \big) \\ &= \frac{1}{2} (e^{ix} + e^{-ix}) \end{align*} %]]></script> <p>Well, we know what it <em>should</em> come out to, <script type="math/tex">cos(x)</script>. Does it?</p> <p>Yes, and here’s one way to think about it (you could also plug in a few values of <script type="math/tex">x</script> to really convince yourself). The form of this equation looks similar to the form of our equation for <script type="math/tex">sin(x)</script>, except that the two complex exponential functions aren’t multiplied by <script type="math/tex">-i</script> and <script type="math/tex">i</script>, respectively. That just means they both start out pointing directly to the right, instead of one pointing down and one pointing up like in the <script type="math/tex">sin(x)</script> case. You can look at the animation above and verify for yourself that if you start watching when the red and blue components are both pointing right, the graph looks like a <script type="math/tex">cos(x)</script> curve.</p> <p>What this also makes apparent, though, is that <script type="math/tex">cos(x)</script> and <script type="math/tex">sin(x)</script> are generated by the same process, it’s just that <script type="math/tex">cos(x)</script> is just <script type="math/tex">\pi/2</script> “ahead” of <script type="math/tex">sin(x)</script>. This probably sounds familiar - that <script type="math/tex">sin(x + \pi/2)</script> and <script type="math/tex">cos(x)</script> are the same thing. One easy way to prove to yourself that this is to consider the fact that <script type="math/tex">cos(a) = sin(b)</script> in the (left) right triangle below and that <script type="math/tex">b = a + \pi/2</script>.</p> <div style="text-align: center;"> <img src=" /assets/by-post/fractional-derivatives/circle.jpg" style="max-width: 400px; max-height: 400px;" /> </div> <p>Ok, this interesting and all, but let’s solve the problem.</p> <script type="math/tex; mode=display">% <![CDATA[ \begin{align*} \newcommand{\d}{\frac{d^{1/3}}{dx^{1/3}}} \d sin(x) &= \d \big( \frac{1}{2} (-i e^{ix} + i e^{-ix}) \big) \\ &= \frac{1}{2} (-i \cdot i^{1/3} e^{ix} + i \cdot (-i)^{1/3} e^{-ix}) \\ &= \frac{1}{2} (-i \cdot e^{i \frac{\pi/2}{3}} \cdot e^{ix} + i \cdot e^{i \frac{-\pi/2}{3}} \cdot e^{-ix}) \tag{using the fact that $i = e^{i\pi/2}$} \\ &= \frac{1}{2} (-i e^{i(x + \pi/6)} + i e^{-i(x + \pi/6)}) \\ &= sin(x + \pi/6) \\ \end{align*} %]]></script> <p>And, in general:</p> <script type="math/tex; mode=display">\frac{d^n}{dx^n}sin(x) = sin(x + n \pi/2)</script> <hr /> <p>Ok, one last thing (I promise!). We’ve been focusing on fractional derivatives, but how about negative ones? We have a general formula in terms of <script type="math/tex">n</script>, is there anything wrong with taking the derivative “-1” times? Nope! That should just correspond to taking the anti-derivative.</p> <p>So, in conclusion, the <script type="math/tex">\frac{-1}{\pi}</script>th derivative of <script type="math/tex">sin(x)</script> is (obviously) <script type="math/tex">sin(x - 1/2)</script>.</p> <script src="/assets/js/p5/0.8.0/p5.js"></script> <script src="/assets/js/p5/p5.clickable.js"></script> <script src=" /assets/by-post/fractional-derivatives/sketch.js"></script> <script> new p5(one, 'sketch1'); new p5(two, 'sketch2'); <!-- new p5(three, 'sketch3'); --> </script> <div class="footnotes"> <ol> <li id="fn:1"> <p>Note this is using the mathematician’s definition of trivial, i.e. “theoretically possible” <a href="#fnref:1" class="reversefootnote">&#8617;</a></p> </li> </ol> </div>Rule of 722019-11-25T00:00:00+00:002019-11-25T00:00:00+00:00http://blog.russelldmatt.com/2019/11/25/rule-of-72<p>Here’s a handy rule of thumb for calculating compound interest:</p> <div class="like-blockquote"> <p>If you want to know how many years it will take your money to double, if it grows at a yearly rate of <em>r</em>, just divide 72 by <em>r</em>.</p> </div> <p>For example, how long would it take for your money to double if it grew at a yearly rate of 5%? <script type="math/tex">72/5 = 14.4</script> years. And, sure enough, <script type="math/tex">1.05^{14.4} = 2.01</script>!</p> <p>So let’s say you’re trying to figure out how much money you’ll have saved for retirement if you save now $100,000 and it grows at an annual rate of 5% for the next 30 years? It will double every 14.4 years, so in 28.8 years it will double twice, so it will be a little more than$400,000. How’d we do? <script type="math/tex">100,000 * 1.05^{30} = 432,194</script>. Pretty good for something you can do in your head!</p> <p>Probably obvious, but this trick can also convert between a “doubling time” and an interest rate. If I tell you that your money will double in 10 years, you know the interest rate is about <script type="math/tex">72/10 = 7.2\%</script>. And, sure enough, <script type="math/tex">1.072^{10} = 2.004</script>. Very close!</p> <h3 id="how-does-it-work">How does it work?</h3> <p>What’s the equation we’re trying to solve?</p> <script type="math/tex; mode=display">(1+r)^y = 2</script> <ul> <li>r is the interest rate</li> <li>y is the doubling time (the number of years it takes to double)</li> <li>and 2 is because we want our money to double</li> </ul> <p>We need to turn an exponent into multiplication/division, which usually means taking the log of both sides. Let’s try it:</p> <script type="math/tex; mode=display">% <![CDATA[ \begin{align*} (1+r)^y &= 2 \\ ln \big( (1+r)^y \big) &= ln(2) \\ y \cdot ln(1+r) &= ln(2) \\ \end{align*} %]]></script> <p>So far, everything we’ve done is exact. Now it’s time to make a few approximations:</p> <script type="math/tex; mode=display">% <![CDATA[ \begin{align*} ln(2) &\approx 0.693 \\ ln(1+r) &\approx r \tag{for small r} \\ \end{align*} %]]></script> <p>The first one is trivial, you can just check it with a calculator. Why, though, is the second one true?</p> <p>You can think about it this way. <script type="math/tex">e^0 = 1</script>. And the derivative of <script type="math/tex">e^x</script> at 0 is also 1. So, if you zoom in really close around 0, it looks like a straight line with a y-intercept of 1 and a slope of 1. Which means, for small values of r, <script type="math/tex">1+r \approx e^r</script>. And if we take the natural log of both sides, we get <script type="math/tex">ln(1+r) \approx r</script>.</p> <p>So, using what we have so far, we can say:</p> <script type="math/tex; mode=display">% <![CDATA[ \begin{align*} y \cdot ln(1+r) &= ln(2) \\ y \cdot r &\approx 0.693 \\ y &\approx \frac{0.693}{r} \end{align*} %]]></script> <p>This works just fine, especially for really small values of r. For example, how long would it take for your money to double at a 1% interest rate? 69.3 years right? <script type="math/tex">1.01^{69.3} = 1.99</script>. Close!</p> <p>So where does 72 come from?</p> <p>Well, <script type="math/tex">ln(1+r) \approx r</script> gets to be a worse approximation as <script type="math/tex">r</script> gets large. In particular, <script type="math/tex">r</script> is an overestimate for <script type="math/tex">ln(1+r)</script>.</p> <p><img src=" /assets/by-post/rule-of-72/ln_one_plus_r_approx.png" /></p> <p>So, when we divide by <script type="math/tex">r</script> in <script type="math/tex">\frac{0.693}{r}</script>, we’re dividing by something that’s too large. For “normal” interest rate values - say, 8% - <script type="math/tex">r</script> is about 4% bigger than <script type="math/tex">ln(1+r)</script>. So, to adjust for that fact, we can just make numerator 4% bigger as well. What’s <script type="math/tex">0.693 * 1.04</script>? 0.72!</p>Here’s a handy rule of thumb for calculating compound interest:N-spheres2019-11-22T00:00:00+00:002019-11-22T00:00:00+00:00http://blog.russelldmatt.com/2019/11/22/n-sphere<p>What’s the formula for the volume of a 4-dimensional sphere? If you have that one, can you come up with a formula for the volume of an n-dimensional sphere?</p> <p>Don’t look it up! It’s a good problem. I highly encourage you to work on it before looking at my solution.</p> <p>More specifically, try to come up with an equation which relates the volume of an n-dimensional sphere to an (n-1) dimensional sphere. You may not be able to analytically evaluate your equation yourself (I wasn’t), but it should be something that a computer could solve.</p> <div onclick="showSolution()" style="cursor: pointer; font-weight: bold;"> Click to show solution </div> <p><br /></p> <hr /> <p><br /></p> <div id="solution" style="position: relative;"> <div class="blur-blocker" id="blocker"></div> <p>I took the calculus approach and modeled an n-dimensional sphere as an integral of (n-1)-dimensional spheres. I denoted the volume of an n-dimensional sphere as <script type="math/tex">f_n(r)</script>. For a 3-dimensional sphere, my approach corresponds to following picture:</p> <div style="text-align: center;"> <img src=" /assets/by-post/n-sphere/n-sphere.jpg" style="max-height: 500px; margin: 15px 0px;" /> </div> <p>More generally, my recursive formula is the following system of equations:</p> <script type="math/tex; mode=display">% <![CDATA[ \begin{align*} f_n(r) &= 2 \int_0^r f_{n-1}(x) dh \\ x &= r \sin(\theta) \\ h &= r \cos(\theta) \\ dh &= -r \sin(\theta) d\theta \end{align*} %]]></script> <p>which, if you substitute, you get:</p> <script type="math/tex; mode=display">f_n(r) = 2 \int_{\pi/2}^0 f_{n-1}(r\sin(\theta)) (-r \sin(\theta)) d\theta</script> <p>Plugging this integral into python (using sympy), and starting with the formula for the “volume” of a “0-dimensional sphere”, i.e. a point, I was able to recursively derive the formulas I recognized for a circle and a sphere and beyond!</p> <p>Note the formula for the “volume” of a 0-dimensional sphere (point) is <script type="math/tex">f_n(0) = r^0 = 1</script>.</p> <iframe id="notebook" style="width: 800px; max-width: 100%; border: none;" src=" /assets/by-post/n-sphere/n-sphere.html"> </iframe> <p>I tried to find the pattern in these formulas to come up with the closed formula solution, but in the end I gave up and looked at wikipedia, which of course has the solution. I’m not shocked I didn’t find the pattern, it’s non-trivial.</p> <hr /> <p>Quick note: The following approach might be a bit more straightforward and also works. However, I couldn’t solve the integral by hand to obtain the (known) formula for a 3-dimensional sphere - and that was how I was checking my work - which is why I went with the approach above.</p> <script type="math/tex; mode=display">f_n(r) = 2 \int_0^r f_{n-1} \Big( \sqrt{r^2 - x^2} \Big) dx</script> <div> <script src="/assets/js/iframe.js"></script> <script> let notebook = document.getElementById("notebook"); autoAdjustIframeHeight(notebook); let isBlocked = true function showSolution() { if (isBlocked) { document.getElementById("blocker").style.display = "none"; isBlocked = false } else { document.getElementById("blocker").style.display = "block"; isBlocked = true } } </script> </div> </div>What’s the formula for the volume of a 4-dimensional sphere? If you have that one, can you come up with a formula for the volume of an n-dimensional sphere?The Metric Tensor2019-10-29T00:00:00+00:002019-10-29T00:00:00+00:00http://blog.russelldmatt.com/2019/10/29/the-metric-tensor<div style="display: none;"> <p><script type="math/tex">% <![CDATA[ \newcommand{\vec}{\left[\begin{matrix}#1\\#2\end{matrix}\right]} \newcommand{\vv}{\overrightarrow{#1}} \newcommand{\norm}{\lVert#1\rVert} \newcommand{\mat}{\left[\begin{matrix}#1 & #3\\#2 & #4\end{matrix}\right]} %]]></script></p> </div> <p>In the last post, I tried to explain what a tensor is. It’s complicated; it’s a long post. But what I didn’t tackle is the why. Why do we care about this generalization of vectors and matrices?</p> <p>To be honest, I mostly don’t know yet. My hope is to actually learn the math behind general relatively at some point, and my current understanding is that tensors are part of that math. However, I do have one interesting point to make.</p> <p>What is the dot product of a vector with itself? It’s the length squared, right?</p> <p>Take, for instance, the vector <script type="math/tex">\vv{v} = [3, 4]</script> (with length 5):</p> <script type="math/tex; mode=display">\vec{3}{4} \cdot \vec{3}{4} = 3 \cdot 3 + 4 \cdot 4 = 25</script> <p>Right, of course this works. We’ve just reformulated the Pythagorean theorem in a linear-algebra sort of way.</p> <p>But wait, something is odd here. In the last post, we made a big deal about how <em>covectors</em> were different than <em>vectors</em>. <em>covectors</em> were functions from vectors to scalars, not vectors. What does it even mean, then, to multiply two vectors together? In programming terms, it’s like we’ve made a type error.</p> <p>If we wanted to construct a (multi-linear) function from 2 vectors to a scalar, as we seem to want when taking the dot product of 2 vectors, we’d need a (0, 2)-tensor. Recall, that an (n, m)-tensor is a multi-linear function from m vectors and n covectors to a scalar.</p> <p>That’s actually correct, and the (0, 2)-tensor that we want is called <em>the metric tensor</em>. To see why, let’s change our basis from the standard orthonormal basis to something else.</p> <p>Let’s use a new basis of <script type="math/tex">\vv{e_1} = [4, 4]</script> and <script type="math/tex">\vv{e_2} = [-1, 0]</script>. What are the coordinates of the vector <script type="math/tex">\vv{v}</script> in the new basis? It looks like <script type="math/tex">[1, 1]</script> will do the trick. How convenient.</p> <p>Ok, so what’s the length of <script type="math/tex">\vv{v}</script> now? It’s the same! The length of a vector does not depend on the coordinate system.</p> <p>Right, right, what I meant was, how do we compute the length of the vector now? Dot product right?</p> <script type="math/tex; mode=display">\vec{1}{1} \cdot \vec{1}{1} = 1 \cdot 1 + 1 \cdot 1 = 2</script> <p>Uh… that’s not right. No, of course that doesn’t work. The length of the vector has to depend on the length of the basis vectors. What I meant was to first scale each coordinate by the length of the appropriate basis vector before doing the multiplication. Something like this:</p> <script type="math/tex; mode=display">\vec{1}{1} \cdot \vec{1}{1} = (1 \cdot \norm{\vv{e_1}}) \cdot (1 \cdot \norm{\vv{e_1}}) + (1 \cdot \norm{\vv{e_2}}) \cdot (1 \cdot \norm{\vv{e_2}}) = 1 \cdot 32 + 1 \cdot 1 = 33</script> <p>Hmm, yea not that either. I guess I’m still trying to use the Pythagorean theorem, but my triangle is not a right triangle anymore. I’m making a triangle with one basis vector <script type="math/tex">\vv{e_1} = [4, 4]</script> and one basis vector <script type="math/tex">\vv{e_2} = [-1, 0]</script>, but those vectors aren’t orthogonal.</p> <p>All this would be much more clear with a picture:</p> <div style="text-align: center;"> <img src=" /assets/by-post/the-metric-tensor/v.jpg" style="width: 400px; margin-bottom: 20px;" /> </div> <p>So maybe law of cosines? <script type="math/tex">c^2 = a^2 + b^2 - 2ab\cos{C}</script>? Actually yes, that’s exactly right, but let me show you another way.</p> <p>Like I said before, what we want is called <em>the metric tensor</em>.</p> <script type="math/tex; mode=display">[[ {\vv{e_1}\cdot\vv{e_1}}, {\vv{e_2}\cdot\vv{e_1}} ], [ {\vv{e_1}\cdot\vv{e_2}}, {\vv{e_2}\cdot\vv{e_2}} ]]</script> <p>I wrote it out that way, as a row of row-vectors, on purpose. The metric tensor is a (0, 2)-tensor, meaning it’s a function from two vectors to a scalar, and a row of row vectors has the right dimensionality for that multiplication. Let’s try it out with our new basis:</p> <script type="math/tex; mode=display">% <![CDATA[ \begin{align*} \vv{e_1} \cdot \vv{e_1} &= 32 \\ \vv{e_1} \cdot \vv{e_2} &= -4 \\ \vv{e_2} \cdot \vv{e_2} &= 1 \\ \end{align*} %]]></script> <p>So, our metric tensor is:</p> <script type="math/tex; mode=display">[[32, -4], [-4, 1]]</script> <p>Let’s multiply it by our vector <script type="math/tex">v = [1, 1]</script>:</p> <script type="math/tex; mode=display">[[32 -4], [-4, 1]] \vec{1}{1} = [28, -3]</script> <p>And again?</p> <script type="math/tex; mode=display">[28, -3] \vec{1}{1} = 25</script> <p>It works! So, why haven’t we ever heard of this thing before? Well, let’s write out the metric tensor in the standard, orthonormal basis:</p> <script type="math/tex; mode=display">% <![CDATA[ \begin{align*} \vv{b_1} &= [1, 0] \\ \vv{b_2} &= [0, 1] \\ \vv{b_1} \cdot \vv{b_1} &= 1 \\ \vv{b_1} \cdot \vv{b_2} &= 0 \\ \vv{b_2} \cdot \vv{b_2} &= 1 \\ \end{align*} %]]></script> <p>So, the metric tensor, in an orthonormal basis, is the identity function:</p> <script type="math/tex; mode=display">[[1, 0], [0, 1]]</script> <p>which is why ignoring it, and treating vectors and covectors interchangeably, is usually fine.</p>What is a Tensor?2019-10-28T00:00:00+00:002019-10-28T00:00:00+00:00http://blog.russelldmatt.com/2019/10/28/what-is-a-tensor<p>I just completed the very good youtube playlist <a href="https://www.youtube.com/playlist?list=PLJHszsWbB6hrkmmq57lX8BV-o-YIOFsiG">Tensors for Beginners</a> by eigenchris and I want to jot down some notes before I forget everything.</p> <p><em>An (n, m)-tensor is a multi-linear function from m vectors and n covectors to a scalar.</em></p> <p>A tensor is a “geometrical object” in the same way that a vector is a “geometrical object” (and a vector is a tensor, so it really is in the same way). We often deal with the coordinates of a vector, which assumes a particular basis. But the exact same vector will have different coordinates if we change the basis. So, the vector itself is “invariant” under a change of basis, but the coordinates are not. However, the coordinates change in a predictable way under a change of basis. All the same is true for tensors (again, vectors <em>are</em> tensors).</p> <p><em>Covectors</em> are a new “type of thing”. They’re functions from a vector to a scalar. One concrete way to think about them is that they’re “row vectors”. If you multiply a row vector by a vector, you get a scalar.</p> <p><em>Tensor product</em>: So, a covector * vector = scalar. But a vector * covector = matrix. The latter is an example of a tensor product. More generally, a tensor product takes the cartesian product of the inputs, and for each ordered pair, you multiply the elements. So in the simple case of an n-dimensional vector v and an m-dimensional covector c, the tensor product v ⊗ c would have (n x m) dimensions, i.e. it can be represented by an (n x m) matrix! Think about each element of that matrix; the (i, j)th element is the product of the ith element of v and the jth element of c. So, you can see concretely what I mean by “the tensor product takes the cartesian product of the inputs, and for each ordered pair, you multiply the elements”.</p> <p>Back to “what is a tensor”. A simple (n, m)-tensor can be constructed by the tensor product of n vectors and m covectors. Again, let’s think about a matrix. We just said that a matrix can be constructed via the tensor product of a vector and a covector. So, I guess that means a matrix is a (1, 1)-tensor! So, why did I say “simple” in “A <em>simple</em> (n, m)-tensor …”. Think about the set of matrices you can construct by multiplying a vector v * a row vector c. What’s their rank? Rank 1, of course! Every column is a scaled version of every other column, since all the columns are just scaled versions of v (the jth column is v * c[j]). Same goes for rows; each row is a scaled version of c (the ith row is v[i] * c). A rank 1 matrix is a very boring matrix indeed. If you think about a matrix as a function from vector -&gt; vector (since, when you multiply a matrix by a vector you get a vector), all the output vectors lie on the same line (and that line points in same the direction as v). So, if these are 2-dimensional vectors, the rank 1 matrix will project all 2 dimensional vectors onto a line. Slight tangent, but this corresponds to having a zero determinant, having a zero eigenvalue, and being non-invertible.</p> <p>So, are all tensors simple and uninteresting in the same way? No, tensors form a vector space, meaning that they can be scaled and added to each other, and the output will be another tensor. To create more interesting tensors, you can take linear combinations of simple tensors. Again, let’s make an analogy to something familiar: vectors. Any vector can be thought of as a linear combination of a set of “basis vectors” (and that’s how we get the vector’s coordinates). In 2-d space, using the standard basis, the two basis vectors are [0,1] and [1,0]. Every other vector is a linear combination of those two “simple” vectors. Tensors work the same way. In fact, if you start with a n-dimensional vector space (with n basis vectors) and a m-dimensional covector space (with m basis covectors), you can construct (n x m) basis (1, 1)-tensors by taking the tensor product of each of the n basis vectors with each of the m basis covectors.</p> <p>To make that more concrete, let’s say n = 2 and m = 3 and let’s use the standard basis. You can construct the following 6 basis (1, 1)-tensors:</p> <script type="math/tex; mode=display">% <![CDATA[ \newcommand{\vec}{\left[\begin{matrix}#1\\#2\end{matrix}\right]} \newcommand{\covec}{\left[\begin{matrix}#1 & #2 & #3\end{matrix}\right]} \newcommand{\mat}{\left[\begin{matrix}#1 & #3 & #5 \\ #2 & #4 & #6\end{matrix}\right]} \newcommand{\VS}{V^*} \newcommand{\reals}{\mathbb{R}} \vec{1}{0} \otimes \covec{1}{0}{0} = \mat{1}{0}{0}{0}{0}{0} \\ \vec{1}{0} \otimes \covec{0}{1}{0} = \mat{0}{0}{1}{0}{0}{0} \\ \vec{1}{0} \otimes \covec{0}{0}{1} = \mat{0}{0}{0}{0}{1}{0} \\ \vec{0}{1} \otimes \covec{1}{0}{0} = \mat{0}{1}{0}{0}{0}{0} \\ \vec{0}{1} \otimes \covec{0}{1}{0} = \mat{0}{0}{0}{1}{0}{0} \\ \vec{0}{1} \otimes \covec{0}{0}{1} = \mat{0}{0}{0}{0}{0}{1} \\ %]]></script> <p>Now it’s easy to see how those 6 “simple” (1, 1)-tensors form a basis for any (2 x 3)-dimensional (1, 1)-tensor. Another thing that this example makes clear is that (1, 1) does not describe the dimensions of the matrix, it describes the number of vectors and covectors that were combined (via the tensor product) to create the tensor. What is the dimension of the (1, 1)-tensor? In this case it’s (2 x 3), but more generally if we take <script type="math/tex">dim(x)</script> to be the dimension of <script type="math/tex">x</script>, an (n, m)-tensor has dimension <script type="math/tex">dim(v_1) dim(v_2) \cdots dim(v_n) dim(c_1) dim(c_2) \cdots dim(c_m)</script>. These things can get big, fast!</p> <p>So what about these linear functions? I started the post by saying: <em>An (n, m)-tensor is a multi-linear function from m vectors and n covectors to a scalar</em>, and yet we’ve barely mentioned functions at all. Well, remember when I said that covectors were <em>functions from a vector to a scalar</em>? We were on to something there.</p> <p>Let’s denote the vector space of vectors as <script type="math/tex">V</script>. Let’s denote the vector space of covectors (called the dual vector space) with the symbol <script type="math/tex">\VS</script>. Another way to write this would be <script type="math/tex">V \rightarrow \reals</script>, since covectors are functions from a vector to a scalar (in my examples, I’ll use the reals as an example of a scalar, but it could be any field, i.e. rational, algebraic, reals, complex, etc.). So, what do we get when we take the tensor product of a vector and a covector? We already know this: a matrix, i.e. a (1, 1)-tensor. But what <em>is</em> a matrix? As I mentioned above, you can think about a matrix as a (linear) function from vectors to vectors, i.e. <script type="math/tex">V \rightarrow V</script>. What if we rewrote that as <script type="math/tex">V \rightarrow (\VS \rightarrow \reals)</script>? Kind of weird at first, but if you can think about a covector as a function from a vector to a scalar, can’t we similarly think about a vector as a function from a covector to a scalar? In other words, a covector * vector is a scalar. If we have one argument (either the covector or the vector), then we can treat that argument as fixed and we’re left with a function from the other argument to a scalar. So, to summarize: <script type="math/tex">(V \times \VS) \rightarrow \reals</script>, <script type="math/tex">V \rightarrow V</script>, <script type="math/tex">V \rightarrow (\VS \rightarrow \reals)</script>, and <script type="math/tex">\VS \rightarrow (V \rightarrow \reals)</script> are all ways of saying the same thing.</p> <p>What do those statements mean in the familiar context of a matrix?</p> <ul> <li><script type="math/tex">(V \times \VS) \rightarrow \reals</script> is saying a matrix is: A function from a row vector and a vector to a scalar. Well, a row vector * a matrix * a vector = a scalar, so yea that checks out.</li> <li><script type="math/tex">V \rightarrow V</script> is saying a matrix is: A function from a vector to a vector. Yes, a matrix * a vector = a vector.</li> <li><script type="math/tex">V \rightarrow (\VS \rightarrow \reals)</script> is saying a matrix is: A function from a vector to (a function from a row vector to a scalar). A little weird, but ok, since a matrix * a vector = a vector, and vectors <em>are</em> functions from row vectors to scalars.</li> <li><script type="math/tex">\VS \rightarrow (V \rightarrow \reals)</script> is saying a matrix is: A function from a row vector to (a function from a vector to a scalar). Huh, this one is a little new. What’s a (1 x n) row vector * an (n x m) matrix? Well, it’s a (1 x m) row vector. And what’s a (1 x m) row vector? We can think of it like a function from an (m x 1) vector to a scalar. Ok, checks out!</li> </ul> <p>So, our (1, 1)-tensor is like a function from a vector and a covector to a scalar, i.e. <script type="math/tex">(V \times \VS) \rightarrow \reals</script>. Furthermore, that function can be “partially applied”, i.e. if you pass in just the vector, you get a function from a covector to a scalar: <script type="math/tex">V \rightarrow (\VS \rightarrow \reals)</script>. Likewise, if you pass just the covector, you get a function from a vector to a scalar: <script type="math/tex">\VS \rightarrow (V \rightarrow \reals)</script>.</p> <p>I think we’re ready to level up from (1, 1)-tensors. What about a (2, 1)-tensor? A (2, 1)-tensor is a (linear) function from 2 covectors and 1 vector to a scalar: <script type="math/tex">(V \times \VS \times \VS) \rightarrow \reals</script>. If you provide one covector, you’re left with a (1, 1)-tensor, i.e. <script type="math/tex">\VS \rightarrow ((V \times \VS) \rightarrow \reals)</script>. So, with this recursive viewpoint, we can build up an understanding of an (n, m)-tensor. An (n, m)-tensor is a function from n covectors and m vectors to a scalar, i.e. <script type="math/tex">(\VS_1 \times \VS_2 \times \cdots \times \VS_n \times V_1 \times V_2 \times \cdots \times V_m) \rightarrow \reals</script>.</p> <!-- Dimensionality, revisited: Remember when we previously said that the dimension of an (n, m)-tensor is $$dim(v_1) dim(v_2) \cdots dim(v_n) dim(c_1) dim(c_2) \cdots dim(c_m)$$? Let's revisit that with our new understanding of tensors as linear functions. To keep things manageable, let's say we have a one dimensional vector which repesents the size of a house, and the size can only be one of three values {small, medium, large}. In addition, we have two (linear) functions that take our one-dimensional size "vector" and produce a scalar. To keep things concrete, function A estimates the value of the house from the size, and function B estimates the number of bedrooms. How many different --> <!-- say we have a one dimensional vector space, maybe our one dimension is the number of square feet of a house. And we have a linear function from that vector space to a real number (our covector). Maybe it represents the average price of a house with that many square feet. If we take the tensor product of our vector space and covector space, we have a (1, 1)-tensor, a function from a square footage and -->I just completed the very good youtube playlist Tensors for Beginners by eigenchris and I want to jot down some notes before I forget everything.Pythagorean Proof2019-10-17T00:00:00+00:002019-10-17T00:00:00+00:00http://blog.russelldmatt.com/2019/10/17/pythagorean-proof<p>A particularly beautiful proof of the Pythagorean Theorem:</p> <video controls="" style="min-width: 300px; max-width: 100%; max-height: 800px; border: 2px solid gray;"> <source src=" /assets/by-post/pythagorean-proof/pythag-proof.mp4" />" type="video/mp4"&gt; Your browser does not support the video tag. </video>A particularly beautiful proof of the Pythagorean Theorem:Using the Simulation Hypothesis Against Itself2019-07-12T00:00:00+00:002019-07-12T00:00:00+00:00http://blog.russelldmatt.com/2019/07/12/simulation-hypothesis-against-itself<p>Let’s formulate the <a href="https://en.wikipedia.org/wiki/Simulation_hypothesis#Simulation_hypothesis">simulation hypothesis</a>, which we will call H:</p> <ol> <li> <p>Conscious beings will eventually figure out how to simulate other conscious beings.</p> </li> <li> <p>When they do so, they will simulate <em>many</em> more of them than ever existed in their universe.</p> </li> <li> <p>Therefore, if all you know is that you are a conscious being, the probability that you exist in the first, top-level, non-simulated universe is extraordinarily small given the fact that the vast majority of conscious beings live in the lower levels of the simulations.</p> </li> </ol> <p>One interesting implication of this line of reasoning is that there are likely to be many levels of this simulation. Conscious beings in the first, top-level, non-simulated universe will simulate a universe of conscious beings in the level below them, who in turn simulate a universe of conscious beings in the level below them, and so on.</p> <p>Let’s formulate a similar hypothesis, H’, along those lines:</p> <ol> <li> <p>Conscious beings will eventually figure out how to simulate other conscious beings.</p> </li> <li> <p>Every level of the simulation will have fewer resources than the level above it.</p> </li> <li> <p>Our universe has a finite amount of resources. This is arguably a fact, not a hypothesis, but what is a fact other than a hypothesis which is extraordinarily likely to be correct, so let’s include it.</p> </li> <li> <p>Therefore, the number of levels is not infinite. There exists a “bottom” level, which will never successfully simulate another level below it.</p> </li> <li> <p>Each level (other than the bottom) will simulate <em>many</em> more conscious beings than ever existed in their level.</p> </li> <li> <p>Therefore, the vast majority of conscious beings will exist in the “bottom” level.</p> </li> </ol> <p>Hypothesis H’ sounds perfectly in line with the hypothesis H, since H’s main conclusion was that it’s unlikely you live in the top, non-simulated, level, which H’ agrees with. H’ just goes a bit further and states that, not only do you not live in the top level, but there is a high probability that you live in the bottom level of the simulation for mostly the same set of reasons. It’s important to note that, by definition, the bottom level will never figure out how to simulate a level below it.</p> <p>Whether or not we will ever simulate consciousness is highly disputed (in fact, we’re disputing it right now). But if you think that there is a high probability that we will eventually simulate consciousness, as many smart people do, then you must think that P(H’) is relatively low, since H’ implies that the probability that we live in the bottom level of the simulation (which will not be able to simulate a level below) it is high.</p> <p>In addition, to my eye, P(H) and P(H’) seem to be highly correlated. They mostly rely on the same argument, they just emphasize slightly different aspects of the same conclusion, which is that there is this pyramid of simulated universes in which each level has drastically more conscious beings than the level above it.</p> <p>So, it seems to me (note the hedging, as this is quite provisional), that the higher you think the probability is that we will eventually simulate consciousness is, the lower you should think that P(H) is (since H’ is evidence against our ability to simulate consciousness and P(H’) and P(H) are highly correlated). However, if we do simulate consciousness, then point 1 of hypothesis H (H.1) is true (H.1 says that conscious beings will eventually figure out how to simulate other conscious beings). And, at least in my opinion, H.1 is the point that, a priori, had the lowest probability of being true! In fact, conditional on point 1 being true, I’d say that P(H) is almost 1 because all the other points in H seem so obviously correct.</p> <p>To summarize, <br /></p> <hr /> <p>Proof by contradiction:</p> <p>Assume statement SC: It is very likely that we will be able to simulate consciousness.</p> <ul> <li>P(H’) ≈ 0 because… hand-wavy math? Ok fine: <ol> <li>P(SC) = P(H’ &amp; SC) + P(¬ H’ &amp; SC) ≈ 1 (by assumption)</li> <li>∴ P(¬ SC) ≈ 0 (from 1)</li> <li>∴ P(H’ &amp; ¬ SC) ≈ 0 (from 2)</li> <li>P(H’ &amp; SC) « P(H’ &amp; ¬ SC) (H’ implies that you’re likely at bottom and can’t simulate)</li> <li>∴ P(H’ &amp; SC) ≈ 0 (from 3, 4)</li> <li>∴ P(H’) = P(H’ &amp; SC) + P(H’ &amp; ¬SC) ≈ 0 (from 3, 5)</li> </ol> </li> <li> <p>P(H) is very small because P(H) is highly correlated with P(H’).</p> </li> <li>Also, P(H) is very close to 1 because <ul> <li>P(H.1) ≈ 1 due to P(SC) ≈ 1 (our assumption)</li> <li>P(H) = P(H.3 | H.1 &amp; H.2) * P(H.2 | H.1) * P(H.1)</li> <li>P(H.2 | H.1) ≈ 1 and (H.3 | H.1 &amp; H.2) ≈ 1 because they’re “obvious”</li> <li>∴ P(H) ≈ 1</li> </ul> </li> </ul> <p>Contradiction!</p> <p><strong>Therefore, it is very unlikely that we will be able to simulate consciousness.</strong></p> <hr /> <p><br /></p> <p>Obviously, the above “proof” isn’t really a proof, and we haven’t found a literal contradiction, but it does feel hand-wavily correct to me…</p> <p>“But wait!”, you may be thinking, “I thought this article was going to use the simulation hypothesis to argue against itself, which means arguing that we don’t live in a simulation - not that we won’t be able to simulate consciousness”. Ok, you got me. This doesn’t <em>directly</em> argue against the simulation hypothesis. But, I did say that P(H) is close to one if we can simulate consciousness (SC), i.e. P(H | SC) ≈ 1. And P(H) = P(H | SC) * P(SC) + P(H | ¬ SC) * P(¬ SC). So if you previously thought that P(SC) was higher than you do now, then presumably your new P(H) is also lower, since the weight you’re putting on P(H | SC) - a number close to 1 - is now lower.</p> <p>One last thought. Although I don’t yet see a critical flaw in the argument, I’m not particularly convinced by it either, which is kind of odd. And I’m not really sure why either. I think it’s partly that all these arguments are so counter-intuitive and abstract that I don’t really trust myself to be able to spot the logical flaw. But I’ll do my best - so with that in mind…</p> <h3 id="potential-holes-work-in-progress">Potential holes: work in progress</h3> <p>H’ states that because our universe is finite, and each level has fewer resources than the level above it, there are not infinite levels - but this isn’t strictly true. It means that any pyramid of simulations which includes us is finite. The level above us could be infinite. An infinite universe could, for whatever reason, choose to run a simulation with finite resources. It could also choose to run some simulations with finite resources and others with infinite. So you could imagine a branching tree of simulations, some of which are finitely long, while others are infinite. In the infinite chains, there would be no bottom, since infinity doesn’t end. So the conclusion that “most conscious beings will exist in the bottom simulation” would be patently untrue in that case, since <em>the vast majority</em> (or all? infinities are hard…) would live in the infinitely long chains. All that said, if that picture were true, it’d be completely improbable that we’d live in one of the few universes with finite resources, which we do, so I guess it’s probably not true.</p> <p>I’m sure there are many more holes… will add to this as I think of them.</p>Let’s formulate the simulation hypothesis, which we will call H: