Jekyll2020-02-11T00:39:20+00:00http://blog.russelldmatt.com/feed.xmlBlogDifferent infinities, and why it matters2020-01-21T00:00:00+00:002020-01-21T00:00:00+00:00http://blog.russelldmatt.com/2020/01/21/different-infinities<style>
#sketch {
max-width: 100%;
width: 410px;
height: 300px;
display: block;
margin: 30px auto 30px;
}
</style>
<script src="/assets/js/p5/0.8.0/p5.js"></script>
<script src="
/assets/by-post/different-infinities/sketch.js"></script>
<div id="sketch">
</div>
<style>
table {
max-width: 600px;
}
</style>
<script type="math/tex; mode=display">\newcommand{\N}{\mathbb{N}}
\newcommand{\Q}{\mathbb{Q}}
\newcommand{\R}{\mathbb{R}}</script>
<p>You may already know the punchline of Georg Cantor’s work on infinity,
which is that there are <em>different sizes of infinity</em>. I’ve also
“known” this for a while, but it was one of the many mathematical
curiosities that I could recite, but that I didn’t really understand
at any deep level. Recently, while learning about Godel’s theorems
and computability, I ran head first into a practical consequence of
this result that I’d like to discuss in this post.</p>
<h3 id="what-does-it-mean-for-two-infinite-sets-to-have-different-sizes">What does it mean for two infinite sets to have different sizes?</h3>
<p>For the record, that this has always been a very counterintuitive idea
to me. Growing up, I heard your typical grade-school examples about
infinity: <em>infinity + 1 = infinity</em>, <em>infinity * 2 = infinity</em>, or
even <em>infinity * infinity = infinity</em>. From these examples, I drew
the natural conclusion that infinity was this sort of black-hole from
which you cannot escape. Once something was infinite, it didn’t
really matter what you did to it, it just stayed infinite. A perhaps
less well-founded extrapolation was that there was only one infinity.
Given its black-hole-like nature, it seemed impossible to distinguish
between two things that were both infinity - so maybe they were both
the same size.</p>
<p>So it struck me as very odd when I learned that, in fact, there are
different sizes of infinities, with the typical examples being
“countable” vs. “uncountable”. What does that even mean?</p>
<p>Here is a rigorous definition of what it means for two sets (infinite
or not) to be the same size:</p>
<blockquote>
<p>Two sets are the same size if there is an invertible function that maps between them.</p>
</blockquote>
<p>With that in mind, let’s work through our grade-school examples. In
each case, I will pick two sets whose sizes correspond to the number
on each side of the equation. For example, if the right hand side of
one equation is <em>infinity</em>, I might pick the set of all natural
numbers, <script type="math/tex">\N</script>, to represent that number. Then, we will check
whether the sets representing each side of the equation are the same
size (according to the stated definition above). If they are, then we
can argue that the equals sign in the equation is justified.</p>
<ul>
<li><em>infinity + 1 = infinity</em>
<ul>
<li>Set for left hand side (<em>infinity + 1</em>): the set of all natural
numbers with the addition of a the single negative number (-1)
(i.e. <script type="math/tex">\{-1, \N\}</script>).</li>
<li>Set for right hand side (<em>infinity</em>): the set of all natural
numbers, <script type="math/tex">\N</script>.</li>
<li>Invertible function that maps elements of the left set to elements
of the right set: <script type="math/tex">f(x) = x + 1</script>?</li>
<li>So, indeed, those sets are the same size.</li>
</ul>
</li>
<li><em>infinity * 2 = infinity</em>
<ul>
<li>Set for LHS (<em>infinity * 2</em>): The set of all natural numbers, <script type="math/tex">\N</script>.</li>
<li>Set for RHS (<em>infinity</em>): The set of all even natural numbers.</li>
<li>Invertible function that each natural numbers to a unique even number:
<script type="math/tex">f(x) = 2x</script>.</li>
<li>So, again, according to our definition of what it means for two
sets to have the same size, the set of even natural numbers and
the set of all natural numbers have the same size.</li>
</ul>
</li>
<li><em>infinity * infinity = infinity</em>:
<ul>
<li>Set for LHS (<em>infinity * infinity</em>): The set of all rational
numbers (fractions), <script type="math/tex">\Q</script>.</li>
<li>Set for RHS (<em>infinity</em>): The set of all natural numbers, <script type="math/tex">\N</script>.</li>
<li>Why am I saying that the set of all rational numbers has a size of
<em>infinity * infinity</em>? Many of you will have seen this before, but
look at the table below that shows one way of arranging all
rational numbers into an 2-D grid. Each side of the grid has all
the natural numbers (except for 0 in the denominator), so the
total numbers of squares in the grid (and therefore rational
numbers) is <em>infinity * infinity</em>.</li>
<li>Invertible function that maps each rational number to a unique
natural number: It turns out, this function exists, but it’s a bit
more complicated than the last two. The trick is to iterate
through the 2-D grid in a zig-zag fashion as opposed to the more
intuitive way of counting row by row. I recognize that’s not
nearly enough to intuit the solution if you haven’t already seen
it, but rather than reproduce it in full here, I will refer you to
<a href="/assets/by-post/different-infinities/recounting.pdf">this short paper</a>.</li>
</ul>
</li>
</ul>
<div style="display:flex; justify-content:center; margin: 20px 0px;">
<table>
<thead>
<tr>
<th>Denominator ↓ \ Numerator →</th>
<th>0</th>
<th>1</th>
<th>2</th>
<th>3</th>
<th>…</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>0/1</td>
<td>1/1</td>
<td>2/1</td>
<td>3/1</td>
<td>…</td>
</tr>
<tr>
<td>2</td>
<td>0/2</td>
<td>1/2</td>
<td>2/2</td>
<td>3/2</td>
<td>…</td>
</tr>
<tr>
<td>3</td>
<td>0/3</td>
<td>1/3</td>
<td>2/3</td>
<td>3/3</td>
<td>…</td>
</tr>
<tr>
<td>…</td>
<td>…</td>
<td>…</td>
<td>…</td>
<td>…</td>
<td>…</td>
</tr>
</tbody>
</table>
</div>
<div class="aside">
<p>One way of arranging all rational numbers in a 2-D grid.</p>
</div>
<p>From the three examples above, you can see how we can use our
definition (of what it means for two sets to have the same size) to
answer the question of whether two particular sets are the same size.
It may not be easy to find the invertible function between the two
size, as in the case of the rational numbers, but at least it’s clear.</p>
<p>Before moving on, we should point out at least one example of two
infinite sets that are <em>not</em> the same size, and I will use the
classic example of the real numbers <script type="math/tex">\R</script> and the natural numbers
<script type="math/tex">\N</script>. Instead of actually proving that you <em>cannot</em> find an
invertible mapping between these two sets (which is not an easy thing
to prove), I will suggest two things:</p>
<ol>
<li>
<p>Try it! Try to construct an invertible function that takes real
numbers and produces a natural number (or vice versa) and you will
get an intuitive sense for why it’s impossible.</p>
</li>
<li>
<p>Read <a href="https://en.wikipedia.org/wiki/Cantor%27s_diagonal_argument">Cantor’s diagonal
argument</a>
for a proof of why it cannot be done.</p>
</li>
</ol>
<p>To sum up, two sets are the same size if it’s possible to construct an
invertible function that maps between them. A set being <em>countably
infinite</em> means that it is the same size as the natural numbers
(e.g. <script type="math/tex">\Q</script>). A set being <em>uncountably infinite</em> means that it is
bigger (e.g. <script type="math/tex">\R</script>). Uncountably infinite is a blanket term that
encompasses infinitely many different sizes.</p>
<h3 id="why-does-this-matter-in-real-life">Why does this matter in real life?</h3>
<p>I’m sure there are many good answers to this question, but I’m going
to use the one that I stumbled across while learning about Godel’s
theorems and computability.</p>
<p>How do computers represent things? With sequences of bits. In other
words, computers represent things in binary.</p>
<p>This is <em>almost</em> too obvious to say, but there’s an invertible mapping
between sequences of bits and the natural numbers - just consider the
bits to be the base-2 representation of the number (that’s basically
what <em>binary</em> means).</p>
<p>So, if an arbitrary set <script type="math/tex">X</script> is countably infinite, then - by
definition - there’s an invertible mapping between <script type="math/tex">X</script> and <script type="math/tex">\N</script>.
If you have that mapping, then it’s trivial to construct a mapping
between <script type="math/tex">X</script> and <em>sequences of bits</em>, the practical consequence of
which is that you can represent elements of the set <script type="math/tex">X</script> on a
computer!</p>
<p>So, what goes wrong when the set in question is uncountably infinite,
like <script type="math/tex">\R</script>? Well, you cannot come up with a unique number - and
therefore a unique sequence of bits - for each real number. At some
point, multiple real numbers will need to share the same
representation on the computer. And this leads to well-known issues
such as:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">>>></span> <span class="mf">0.1</span> <span class="o">+</span> <span class="mf">0.2</span>
<span class="mf">0.30000000000000004</span>
<span class="o">>>></span> <span class="mf">0.1</span> <span class="o">+</span> <span class="mf">0.2</span> <span class="o">-</span> <span class="mf">0.3</span> <span class="o">==</span> <span class="mi">0</span>
<span class="bp">False</span>
</code></pre></div></div>
<div class="aside">
<p>The code snippet above was produced using python, but most programming languages will suffer the same fate. And if they don’t for this example, then they will for some other example.</p>
</div>
<p>In fact, infinitely many real numbers will end up sharing the same
representation. For example, you can pick some countably infinite set
of real numbers to represent perfectly, but that will leave an
uncountably infinite number of real numbers “left over”. In practice,
we squash an uncountably infinite number of real numbers into every
unique floating point number and hope the loss of precision isn’t too
important.</p>
<h3 id="are-real-numbers-forever-doomed-to-be-misrepresented-on-computers">Are real numbers forever doomed to be misrepresented on computers?</h3>
<p>This is <em>highly</em> speculative (in the sense that I’m not sure I know
what I’m talking about), but I think that the way quantum computers
represent things, it may be possible for those computers to accurately
represent real numbers. Of course, even if true, that only buys us
one additional level of infinity over classical computers. Any sets
that are “bigger” than <script type="math/tex">\R</script> will fail to be represented even on
quantum computers.</p>
<p>Why do I think that quantum computers can represent real numbers?
Well, as opposed to a classical bit, which is either 0 or 1, a qubit
(a bit on a quantum computer) is in some superposition of 0 and 1.
One way to think about that is that there is some probability <script type="math/tex">p</script>
that, when measured, the qubit will be a 0 and probability <script type="math/tex">(1-p)</script>
that it will be a 1. So, in some sense, I need a real number, <script type="math/tex">p</script>,
for each qubit to represent the state of the computer at any given
time. In fact, you need more than that due to entanglement
(correlation) between qubits, but that just strengthens my argument.</p>Introducing the stats series2020-01-01T00:00:00+00:002020-01-01T00:00:00+00:00http://blog.russelldmatt.com/2020/01/01/introducing-stats<p>This is a short post just to introduce the upcoming <strong>stats</strong>
series. A series is just a set of related blog posts that will likely
build on each other.</p>
<p>The goal of this series is to build a strong foundation for
understanding some very basic statistical methods and tests
(e.g. linear regression, t-test, etc.).</p>
<p>I will try to keep the following list up to date. In addition, I will
tag all posts in this series with both the <code class="language-plaintext highlighter-rouge">stats</code> tag as well as the
<code class="language-plaintext highlighter-rouge">series</code> tag.</p>
<ul>
<li>
<a href="/2020/01/01/remember-linear-regression.html">How to remember linear regression</a>
<span class="post-meta">Jan 1, 2020</span>
</li>
<li>
<a href="/2020/01/01/introducing-stats.html">Introducing the stats series</a>
<span class="post-meta">Jan 1, 2020</span>
</li>
</ul>This is a short post just to introduce the upcoming stats series. A series is just a set of related blog posts that will likely build on each other.How to remember linear regression2020-01-01T00:00:00+00:002020-01-01T00:00:00+00:00http://blog.russelldmatt.com/2020/01/01/remember-linear-regression<p>Linear regression is one of the most useful tools in statistics, but
the formula is a little hard to remember. If you’re trying to find
the “best fit” <script type="math/tex">x</script> in the equation <script type="math/tex">Ax \approx b</script>, here is the
solution:</p>
<script type="math/tex; mode=display">(A^T A)^{-1} A^T b</script>
<p>If you’re expecting me to be able to produce that formula from
memory… don’t hold your breath.</p>
<p>However, if you understand what a linear regression <em>is</em>, then
re-deriving this formula is actually shockingly easy. And,
importantly, remembering “what a linear regression is” is much easier
than remembering some (relatively) complicated formula. I suspect
this is often true - that remembering how to derive a formula from
simple ideas is easier than remembering the formula itself.</p>
<div class="aside">
<p>An aside: I often struggle with finding the appropriate level at which
to target my explanations. Ideally, I’d like to assume no previous
knowledge and explain things from scratch, but then the posts become
so long as to be useless. But if I explain things too tersely, the
only people following along are the people who already understand the
explanation! So, in this case, I’m going to write two versions: a
short version and a long one.</p>
</div>
<h3 id="what-is-a-linear-regression-the-short-version">What is a linear regression: the short version</h3>
<p>With real life (noisy) data, there won’t be an exact solution to the
equation <script type="math/tex">Ax = b</script>. Put another way, <script type="math/tex">b</script> does not live in the
column space of <script type="math/tex">A</script>. We need to find a vector which does live in the
column space of <script type="math/tex">A</script> and which minimizes the squared errors between
itself and <script type="math/tex">b</script>. Let’s call this vector <script type="math/tex">b^*</script>.</p>
<p>What are the errors between <script type="math/tex">b^*</script> and <script type="math/tex">b</script>? Simply <script type="math/tex">b - b^*</script>,
which is itself another vector. Let’s call this <script type="math/tex">\epsilon</script> (for
errors). We don’t care about the errors themselves as much as the sum
of the squared errors, which is just the squared length of <script type="math/tex">\epsilon</script>.</p>
<p>To recap, we want to find the <script type="math/tex">b^*</script> that lives in the column space
of <script type="math/tex">A</script> and which minimizes the length of <script type="math/tex">\epsilon</script>. Note that
our three vectors form a triangle, i.e. <script type="math/tex">b^* + \epsilon = b</script>. At
this point, the solution might become clear. If we take <script type="math/tex">b^*</script> to be
the projection of <script type="math/tex">b</script> onto the column space of <script type="math/tex">A</script>, then <script type="math/tex">b^*</script>
and <script type="math/tex">\epsilon</script> will form a <em>right</em> triangle with hypotenuse <script type="math/tex">b</script>,
and that will minimize the length of <script type="math/tex">\epsilon</script>.</p>
<p>If that’s not obvious, consider a line <script type="math/tex">A</script> and a point <script type="math/tex">b</script> which
is not already on the line. What’s the minimum distance from <script type="math/tex">b</script> to
<script type="math/tex">A</script>? It’s the distance of the line which connects <script type="math/tex">b</script> to <script type="math/tex">A</script>
and which is perpendicular to <script type="math/tex">A</script>, i.e. it’s the line between <script type="math/tex">b</script>
and <script type="math/tex">b</script>’s projection onto <script type="math/tex">A</script>.</p>
<p>That’s all you have to remember in order to derive the formula for
linear regression.</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{align*}
Ax &= b^* \tag{1} \\
A \bot (b - b^*) \tag{2} \\
A \bot (b - Ax) \tag{substitute 1 into 2} \\
A^T (b - Ax) &= 0 \tag{the definition of perpendicular} \\
A^T b - A^TAx &= 0 \\
A^T b &= A^TAx \\
(A^TA)^{-1} A^T b &= x \tag*{$\square$} \\
\end{align*} %]]></script>
<h3 id="what-is-a-linear-regression-the-long-version">What is a linear regression: the long version</h3>
<p>In general, a linear regression is trying to find coefficients to a
linear equation that minimize the sum of the squared errors. For
example, let’s say you think there’s roughly a linear relationship
between the square footage of a house (sqft), the median price of all houses in that house’s neighborhood (medprice), and the price of the house (price), i.e.</p>
<script type="math/tex; mode=display">\mathrm{price} = c_2 (\mathrm{sqft}) + c_1 (\mathrm{medprice}) + c_0</script>
<p>Furthermore, you have some data. For each data point (house), you have the three relevant values (sqft, medprice, and price). We can organize this data into a single equation, <script type="math/tex">Ax=b</script>, using matrices where:</p>
<script type="math/tex; mode=display">% <![CDATA[
\overset{A}{
\begin{bmatrix}
\vert & \vert & \vert \\
\mathrm{sqft} & \mathrm{medprice} & 1 \\
\vert & \vert & \vert \\
\end{bmatrix}
}
\overset{x}{
\begin{bmatrix}
c_2 \\
c_1 \\
c_0
\end{bmatrix}
}
=
\overset{b}{
\begin{bmatrix}
\vert \\
\mathrm{price} \\
\vert \\
\end{bmatrix}
} %]]></script>
<p>Each row in <script type="math/tex">A</script> will contain the two predictor values (sqft and
medprice) for a given home, along with a constant 1 (to account for
the <script type="math/tex">c_0</script> in our linear equation) and the corresponding row in <script type="math/tex">b</script>
will have the responder variable (price).</p>
<p>Now, importantly, this equation will almost always have no solution. To understand why, notice that we are trying to find a linear combination of the three columns of <script type="math/tex">A</script> that equals the vector <script type="math/tex">b</script>. We haven’t yet specified how many data points we have, but for the sake of this part of the explanation let’s assume it’s 100. That means we have a vector <script type="math/tex">b</script> which lives in a 100-dimensional space. If that throws you for a loop, think about how a vector with 2 elements lives in the x-y coordinate plane - a 2-D space, while a vector with three elements lives in the x-y-z coordinate system - a 3-D space. So, the vector <script type="math/tex">b</script> - with 100 elements - lives in a 100-dimensional space. So do the columns of <script type="math/tex">A</script>. They are, after all, each vectors with 100 elements.</p>
<p>If you consider a single column of <script type="math/tex">A</script> by itself, and take all linear combinations of it (i.e. you scale it by any value), you will end up with a single line in that 100-dimensional space. We call that a 1-D subspace of the 100-dimensional space. If you consider two columns of A, and take all linear combinations of them, you will end up with a 2-D subspace (a plane through the origin) within that 100-dimensional space. And, probably obviously now, if you consider all three column vectors of A, and take all linear combinations of them, you will end up with a 3-D subspace of the 100-dimensional space. To throw some terminology at you, that 3-D subspace is <em>spanned</em> by the three column vectors of <script type="math/tex">A</script>, and it called the <em>column space of A</em>.</p>
<p>Understanding how linear combinations of vectors span a space is critical, so I’ll include the following gif from 3Blue1Brown to help you understand it visually. Notice how by changing the coefficients <script type="math/tex">a</script> and <script type="math/tex">b</script>, their linear combination (<script type="math/tex">av + bw</script>) can point anywhere in the plane. That means they <em>span</em> the plane.</p>
<div style="display:flex; justify-content:center; margin: 20px 0px;">
<img src="
/assets/by-post/remember-linear-regression/linear-combination.gif" />
</div>
<p>I previously said that it’s unlikely that our equation <script type="math/tex">Ax = b</script> has a solution. Why is that? We can now understand that our equation only has a solution if the 100-dimensional vector <script type="math/tex">b</script> happens to lie within the 3-dimensional column space of <script type="math/tex">A</script>. That’s kind of like hoping that a bunch of points in 3-D space happen to fall exactly on a single (1-D) line (although a much more extreme version of that). It might happen, but with real data that likely has some noise in it, it’s extremely unlikely.</p>
<p>So, since there’s no solution, we can’t just solve the equation directly by computing <script type="math/tex">x = A^{-1}b</script>. In fact, why don’t we stop writing <script type="math/tex">Ax=b</script>, because that’s a little misleading given there’s no solution (it’s like writing <script type="math/tex">5x = 1</script> and <script type="math/tex">2x = 2</script>, solve for <script type="math/tex">x</script>). Instead, let’s write <script type="math/tex">Ax = b^*</script>, where we assert that this equation has a solution. In other words, the only candidates for <script type="math/tex">b^*</script> are the vectors in the column space of <script type="math/tex">A</script>.</p>
<p>The next step is to figure out which <script type="math/tex">b^*</script> is “best”. Linear
regression is defined as trying to minimize the squared errors, so we
want the <script type="math/tex">b^*</script> that minimizes the sum of the squares of the
elements of <script type="math/tex">b - b^*</script>. At this point, I’m going to <a href="#what-is-a-linear-regression-the-short-version">refer you to
the short version</a>.
I’ve hopefully filled in the relevant background information to make
that explanation accessible.</p>
<h3 id="quick-demonstration-that-it-works">Quick demonstration that it works</h3>
<iframe id="notebook" style="width: 800px; max-width: 100%; border: none;" src="
/assets/by-post/remember-linear-regression/notebook.html">
</iframe>
<script src="/assets/js/iframe.js"></script>
<script>
let notebook = document.getElementById("notebook");
autoAdjustIframeHeight(notebook);
</script>Linear regression is one of the most useful tools in statistics, but the formula is a little hard to remember. If you’re trying to find the “best fit” in the equation , here is the solution:Fractional Derivatives2019-11-27T00:00:00+00:002019-11-27T00:00:00+00:00http://blog.russelldmatt.com/2019/11/27/fractional-derivatives<style>
hr {
margin: 20px 0px;
}
canvas, img {
box-shadow: 5px 5px 5px grey;
border: 1px solid grey;
width: 500px;
max-width: 100%;
height: 500px;
display: block;
margin: 30px auto 30px;
}
</style>
<p>What’s the <script type="math/tex">\frac{1}{3}</script>rd derivative of <script type="math/tex">sin(x)</script>?</p>
<p>What an absurd question - does it even make sense? I think so, but in order to build up some intuition let’s take a few steps back.</p>
<hr />
<p>Imagine that you lived in the middle ages and you were comfortable
with the concepts of addition and multiplication. You even understand
exponents, as a shorthand for repeated multiplication. Then someone
asks you, what’s <script type="math/tex">2^{\frac{1}{2}}</script>?</p>
<p>Nonsense, right? <script type="math/tex">2^3</script> means <script type="math/tex">2 \cdot 2 \cdot 2</script>. There are three twos. You
can’t have half a two.</p>
<p>Well, as I’m sure you know, yes - you can. But think about it for a
second. What does it mean? What does it mean to multiply by <script type="math/tex">x</script>
half a time?</p>
<p><script type="math/tex">x^n</script> <em>is</em> the number that you get when you multiply <script type="math/tex">x</script>
<script type="math/tex">n</script>-times. Thinking about it this way makes the property <script type="math/tex">x^a
\cdot x^b = x^{a+b}</script> obvious. If you multiply by <script type="math/tex">x</script>
<script type="math/tex">a</script> times, and then <script type="math/tex">b</script> more times, you’ve multiplied by <script type="math/tex">x</script>
<script type="math/tex">(a+b)</script> times. And that property is nice, because it makes sense
even when <script type="math/tex">n</script> is not an integer. If I do something <script type="math/tex">\frac{1}{2}</script> a
time and then I do it again <script type="math/tex">\frac{1}{2}</script> a time, how many times
have I done it? <script type="math/tex">1</script> time, right?</p>
<p>Which brings us to the (obvious because we already learned it) answer,
which is that <script type="math/tex">2^{\frac{1}{2}} \cdot 2^{\frac{1}{2}} = 2^1 = 2</script>,
i.e. <script type="math/tex">2^{\frac{1}{2}} = \sqrt{2}</script>.</p>
<hr />
<p>Let’s generalize a bit and talk about <em>repeated function application</em>.</p>
<p>Consider the function <script type="math/tex">f(x) = x + 10</script>. What’s <script type="math/tex">f(f(x))</script>? That’s
pretty easy:</p>
<script type="math/tex; mode=display">f^2(x) = f(f(x)) = f(x+10) = x+20</script>
<p>Ok, how about <script type="math/tex">f^{\frac{1}{2}}(x)</script>? Given the setup, I bet you can
figure it out. It’s some function that, when applied twice, gives us
<script type="math/tex">f(x)</script>. What might that be? <script type="math/tex">g(x) = x+5</script> seems like a good
guess.</p>
<p>Let’s check it:</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{align*}
f^{\frac{1}{2}}(x) &= g(x) = x + 5 \\
g(g(x)) &= g(x+5) = x+10 = f(x)
\end{align*} %]]></script>
<p>Ok, how about another this one? If <script type="math/tex">f(x) = 2x</script>, what’s
<script type="math/tex">f^{\frac{1}{2}}(x)</script>? Again, you can guess it. It’s <script type="math/tex">g(x) =
\sqrt{2}x</script>.</p>
<p>Alright, now let’s level up. Previously we were dealing with
functions from a number to a number, but functions can take other
types of things too. How about a function <script type="math/tex">f</script> which takes, as
input, a function <script type="math/tex">h</script> and returns a new function? What does it do
to the function? Let’s start with something easy, like it shifts it
<script type="math/tex">10</script> to the right:</p>
<script type="math/tex; mode=display">f(h(x)) = h(x - 10)</script>
<p>Can we guess the answer for <script type="math/tex">f^{\frac{1}{2}}(x)</script>? I’m going to
go out on a limb and say yes. If you want to do something twice such
that the end result is shifting <script type="math/tex">10</script> to the right, shifting <script type="math/tex">5</script> to the
right each time will probably do the trick.</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{align*}
f^{\frac{1}{2}}(h(x)) &= g(h(x)) = h(x-5) \\
g(g(h(x))) &= g(h(x-5)) = h(x - 10)
\end{align*} %]]></script>
<hr />
<p>Ok, now for the finale. What if our function takes the derivative of
the input function? In other words:</p>
<script type="math/tex; mode=display">f(h) = \frac{d}{dx}h</script>
<p>Eek… that is a bit harder.</p>
<p>Let’s take a quick detour and draw an analogy to linear algebra,
specifically eigenvectors. If you want to multiply a vector, <script type="math/tex">v</script>,
by a matrix, <script type="math/tex">M</script>, <script type="math/tex">n</script> times (where <script type="math/tex">n</script> is sufficiently large), a
fast way to do it is to follow these three steps:</p>
<ol>
<li>Compute the eigenvectors of the matrix <script type="math/tex">M</script>. These are the
vectors that, when multiplied by <script type="math/tex">M</script>, are just scaled by a
constant (the constant being the eigenvalue).</li>
<li>Decompose your vector into a linear combination (weighted sum) of those
eigenvectors.</li>
<li>Your answer is the linear combination of those eigenvectors, where
each eigenvector is first scaled by its eigenvalue to the <script type="math/tex">n</script>th
power.</li>
</ol>
<p>I tried to explain why this works in depth <a href="/2019/03/09/golden-fibonacci.html">here</a>, but the quick summary is that we
found special inputs (the eigenvectors) which were particularly easy
compute for our function (multiplication by <script type="math/tex">M</script>), and then we
reformulated our answer as a weighted sum of the
function applied to those special inputs (<script type="math/tex">n</script> times). In doing so,
we turned our somewhat hard problem into a much easier one.</p>
<p>One thing to mention is that this only work for <em>linear</em> functions,
i.e. functions <script type="math/tex">f</script> which have the following two properties:</p>
<ol>
<li>
<script type="math/tex; mode=display">f(u + v) = f(u) + f(v)</script>
</li>
<li>
<script type="math/tex; mode=display">f(\alpha u) = \alpha f(u) \tag{where $\alpha$ is a scalar}</script>
</li>
</ol>
<p>Does the derivative function have these properties? Actually yes:</p>
<ol>
<li>
<script type="math/tex; mode=display">\frac{d}{dx}(f + g) = \frac{d}{dx}(f) + \frac{d}{dx}(g)</script>
</li>
<li>
<script type="math/tex; mode=display">\frac{d}{dx}(\alpha f) = \alpha \frac{d}{dx}(f) \tag{where $\alpha$ is a scalar}</script>
</li>
</ol>
<p>The derivative is a <em>linear function</em> (often called a <em>linear
operator</em>). So, we can utilize the same trick.</p>
<p>Can you think of any functions which have a derivative that are equal
to the function itself (or, maybe, a scaled version of it)?</p>
<p>Yep, you bet: <script type="math/tex">\frac{d}{dx}(e^x) = e^x</script>, and
<script type="math/tex">\frac{d}{dx}(e^{\alpha x}) = \alpha e^{\alpha x}</script>.</p>
<p><script type="math/tex">e^{\alpha x}</script> is an <em>eigenfunction</em> of the derivative function.
How cool!</p>
<p>So, <em>if</em> we could represent our input function <script type="math/tex">h</script> as a weighted
sum of exponential functions, then we can trivially take the derivative
any number of times (where that number doesn’t have to be an integer).</p>
<p>Oh, what’s that you say? The fourier transform can convert any
function into a integral (read: weighted sum) of complex exponential
functions (sometimes called complex sinusoids)?</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{align*}
\hat{f}(\omega) &= \int_{-\infty}^{\infty} f(x) e^{-2 \pi i x \omega } dx \\
f(x) &= \int_{-\infty}^{\infty} \hat{f}(\omega) e^{2 \pi i x \omega } d\omega \\
\end{align*} %]]></script>
<p>So, we’ve rewritten our function as a weighted sum of eigenfunctions of the derivative operator. The weights are <script type="math/tex">\hat{f}(\omega)</script> and the eigenfunctions are <script type="math/tex">e^{2 \pi i \omega x}</script>. So, now we can trivially<sup id="fnref:1"><a href="#fn:1" class="footnote">1</a></sup> take the <script type="math/tex">n</script>th derivative:</p>
<script type="math/tex; mode=display">\frac{d^n}{dx^n} f(x) = \int_{-\infty}^{\infty} (2 \pi i \omega)^n \hat{f}(\omega) e^{2 \pi i x \omega } d\omega \\</script>
<hr />
<p>At this point, we’ve solved how to take the <script type="math/tex">n</script>th derivative in the
general case, but we haven’t technically answered our original
question: what’s the <script type="math/tex">\frac{1}{3}</script>rd derivative of <script type="math/tex">sin(x)</script>?</p>
<p>Lucky for us, the fourier transform of <script type="math/tex">sin(x)</script> is quite simple. To get a handle on it, let’s first graph <script type="math/tex">f(t) = e^{it}</script>. Unfortunately, since <script type="math/tex">e^{it}</script> is a complex number for a given <script type="math/tex">t</script>, in order to graph the function for a range of <script type="math/tex">t</script> values I’d need 3 dimensions. So, instead, I’ll graph <script type="math/tex">e^{it}</script> as a function of time (time will by my 3rd dimension).</p>
<div id="sketch1"></div>
<p>So that’s a single complex exponential function. What if we add one more
which rotates at exactly the same rate but in the opposite direction,
and then add the two values together?</p>
<div id="sketch2"></div>
<p>The imaginary (vertical) components cancel each other out perfectly
and all we’re left with is a real number, which is twice a <script type="math/tex">sin</script> curve.</p>
<p>Analytically,</p>
<script type="math/tex; mode=display">f(x) = sin(x) = \frac{1}{2} (-i e^{ix} + i e^{-ix})</script>
<p>Why multiply <script type="math/tex">e^{ix}</script> by <script type="math/tex">-i</script> and <script type="math/tex">e^{ix}</script> by <script type="math/tex">i</script>? Since
<script type="math/tex">sin</script> starts at 0, I want the counter-clockwise complex exponential
(<script type="math/tex">e^{ix}</script>) to start out pointing down (and multiplication by <script type="math/tex">-i</script>
will rotate clockwise by <script type="math/tex">\pi/2</script>. Similarly, I want the
clockwise one (<script type="math/tex">e^{-ix}</script>) to start out pointing up (and multiplying
by <script type="math/tex">i</script> will do that).</p>
<p>Let’s test our function for a few values of <script type="math/tex">x</script>:</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{align*}
sin(0) &= \frac{1}{2} (-i e^{i0} + i e^{-i0}) = \frac{1}{2} (-i + i) = 0 \\
sin(\pi/2) &= \frac{1}{2} (-i e^{i\pi/2} + i e^{-i\pi/2}) = \frac{1}{2} (-i \cdot i + i \cdot -i) = \frac{1}{2} (1 + 1) = 1 \\
sin(\pi) &= \frac{1}{2} (-i e^{i\pi} + i e^{-i\pi}) = \frac{1}{2} (-i \cdot -1 + i \cdot -1) = \frac{1}{2} (i + -i) = 0 \\
sin(3\pi/2) &= \frac{1}{2} (-i e^{i3\pi/2} + i e^{-i3\pi/2}) = \frac{1}{2} (-i \cdot -i + i \cdot i) = \frac{1}{2} (-1 + -1) = -1 \\
\end{align*} %]]></script>
<p>So far so good. How about its derivative?</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{align*}
\frac{d}{dx}sin(x)
&= \frac{d}{dx} \big( \frac{1}{2} (-i e^{ix} + i e^{-ix}) \big) \\
&= \frac{1}{2} (e^{ix} + e^{-ix})
\end{align*} %]]></script>
<p>Well, we know what it <em>should</em> come out to, <script type="math/tex">cos(x)</script>. Does it?</p>
<p>Yes, and here’s one way to think about it (you could also plug in a few values of <script type="math/tex">x</script> to really convince yourself). The form of this equation looks similar to the form of our equation for <script type="math/tex">sin(x)</script>, except that the two complex exponential functions aren’t multiplied by <script type="math/tex">-i</script> and <script type="math/tex">i</script>, respectively. That just means they both start out pointing directly to the right, instead of one pointing down and one pointing up like in the <script type="math/tex">sin(x)</script> case. You can look at the animation above and verify for yourself that if you start watching when the red and blue components are both pointing right, the graph looks like a <script type="math/tex">cos(x)</script> curve.</p>
<p>What this also makes apparent, though, is that <script type="math/tex">cos(x)</script> and <script type="math/tex">sin(x)</script> are generated by the same process, it’s just that <script type="math/tex">cos(x)</script> is just <script type="math/tex">\pi/2</script> “ahead” of <script type="math/tex">sin(x)</script>. This probably sounds familiar - that <script type="math/tex">sin(x + \pi/2)</script> and <script type="math/tex">cos(x)</script> are the same thing. One easy way to prove to yourself that this is to consider the fact that <script type="math/tex">cos(a) = sin(b)</script> in the (left) right triangle below and that <script type="math/tex">b = a + \pi/2</script>.</p>
<div style="text-align: center;">
<img src="
/assets/by-post/fractional-derivatives/circle.jpg" style="max-width: 400px; max-height: 400px;" />
</div>
<p>Ok, this interesting and all, but let’s solve the problem.</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{align*}
\newcommand{\d}{\frac{d^{1/3}}{dx^{1/3}}}
\d sin(x)
&= \d \big( \frac{1}{2} (-i e^{ix} + i e^{-ix}) \big) \\
&= \frac{1}{2} (-i \cdot i^{1/3} e^{ix} + i \cdot (-i)^{1/3} e^{-ix}) \\
&= \frac{1}{2} (-i \cdot e^{i \frac{\pi/2}{3}} \cdot e^{ix} + i \cdot e^{i \frac{-\pi/2}{3}} \cdot e^{-ix})
\tag{using the fact that $i = e^{i\pi/2}$} \\
&= \frac{1}{2} (-i e^{i(x + \pi/6)} + i e^{-i(x + \pi/6)}) \\
&= sin(x + \pi/6) \\
\end{align*} %]]></script>
<p>And, in general:</p>
<script type="math/tex; mode=display">\frac{d^n}{dx^n}sin(x) = sin(x + n \pi/2)</script>
<hr />
<p>Ok, one last thing (I promise!). We’ve been focusing on fractional
derivatives, but how about negative ones? We have a general formula
in terms of <script type="math/tex">n</script>, is there anything wrong with taking the derivative “-1”
times? Nope! That should just correspond to taking the
anti-derivative.</p>
<p>So, in conclusion, the <script type="math/tex">\frac{-1}{\pi}</script>th derivative of <script type="math/tex">sin(x)</script>
is (obviously) <script type="math/tex">sin(x - 1/2)</script>.</p>
<script src="/assets/js/p5/0.8.0/p5.js"></script>
<script src="/assets/js/p5/p5.clickable.js"></script>
<script src="
/assets/by-post/fractional-derivatives/sketch.js"></script>
<script>
new p5(one, 'sketch1');
new p5(two, 'sketch2');
<!-- new p5(three, 'sketch3'); -->
</script>
<div class="footnotes">
<ol>
<li id="fn:1">
<p>Note this is using the mathematician’s definition of trivial, i.e. “theoretically possible” <a href="#fnref:1" class="reversefootnote">↩</a></p>
</li>
</ol>
</div>Rule of 722019-11-25T00:00:00+00:002019-11-25T00:00:00+00:00http://blog.russelldmatt.com/2019/11/25/rule-of-72<p>Here’s a handy rule of thumb for calculating compound interest:</p>
<div class="like-blockquote">
<p>If you want to know how many years it will take your money to double,
if it grows at a yearly rate of <em>r</em>, just divide 72 by <em>r</em>.</p>
</div>
<p>For example, how long would it take for your money to double if it
grew at a yearly rate of 5%? <script type="math/tex">72/5 = 14.4</script> years. And, sure
enough, <script type="math/tex">1.05^{14.4} = 2.01</script>!</p>
<p>So let’s say you’re trying to figure out how much money you’ll have
saved for retirement if you save now $100,000 and it grows at an
annual rate of 5% for the next 30 years? It will double every 14.4
years, so in 28.8 years it will double twice, so it will be a little
more than $400,000. How’d we do? <script type="math/tex">100,000 * 1.05^{30} = 432,194</script>.
Pretty good for something you can do in your head!</p>
<p>Probably obvious, but this trick can also convert between a “doubling
time” and an interest rate. If I tell you that your money will double
in 10 years, you know the interest rate is about <script type="math/tex">72/10 = 7.2\%</script>.
And, sure enough, <script type="math/tex">1.072^{10} = 2.004</script>. Very close!</p>
<h3 id="how-does-it-work">How does it work?</h3>
<p>What’s the equation we’re trying to solve?</p>
<script type="math/tex; mode=display">(1+r)^y = 2</script>
<ul>
<li>r is the interest rate</li>
<li>y is the doubling time (the number of years it takes to double)</li>
<li>and 2 is because we want our money to double</li>
</ul>
<p>We need to turn an exponent into multiplication/division, which
usually means taking the log of both sides. Let’s try it:</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{align*}
(1+r)^y &= 2 \\
ln \big( (1+r)^y \big) &= ln(2) \\
y \cdot ln(1+r) &= ln(2) \\
\end{align*} %]]></script>
<p>So far, everything we’ve done is exact. Now it’s time to make a few approximations:</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{align*}
ln(2) &\approx 0.693 \\
ln(1+r) &\approx r \tag{for small r} \\
\end{align*} %]]></script>
<p>The first one is trivial, you can just check it with a calculator. Why, though, is the second one true?</p>
<p>You can think about it this way. <script type="math/tex">e^0 = 1</script>. And the derivative of <script type="math/tex">e^x</script> at 0 is also 1. So, if you zoom in really close around 0, it looks like a straight line with a y-intercept of 1 and a slope of 1. Which means, for small values of r, <script type="math/tex">1+r \approx e^r</script>. And if we take the natural log of both sides, we get <script type="math/tex">ln(1+r) \approx r</script>.</p>
<p>So, using what we have so far, we can say:</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{align*}
y \cdot ln(1+r) &= ln(2) \\
y \cdot r &\approx 0.693 \\
y &\approx \frac{0.693}{r}
\end{align*} %]]></script>
<p>This works just fine, especially for really small values of r. For example, how long would it take for your money to double at a 1% interest rate? 69.3 years right? <script type="math/tex">1.01^{69.3} = 1.99</script>. Close!</p>
<p>So where does 72 come from?</p>
<p>Well, <script type="math/tex">ln(1+r) \approx r</script> gets to be a worse approximation as <script type="math/tex">r</script> gets large. In particular, <script type="math/tex">r</script> is an overestimate for <script type="math/tex">ln(1+r)</script>.</p>
<p><img src=" /assets/by-post/rule-of-72/ln_one_plus_r_approx.png" /></p>
<p>So, when we divide by <script type="math/tex">r</script> in <script type="math/tex">\frac{0.693}{r}</script>, we’re dividing by something that’s too large. For “normal” interest rate values - say, 8% - <script type="math/tex">r</script> is about 4% bigger than <script type="math/tex">ln(1+r)</script>. So, to adjust for that fact, we can just make numerator 4% bigger as well. What’s <script type="math/tex">0.693 * 1.04</script>? 0.72!</p>Here’s a handy rule of thumb for calculating compound interest:N-spheres2019-11-22T00:00:00+00:002019-11-22T00:00:00+00:00http://blog.russelldmatt.com/2019/11/22/n-sphere<p>What’s the formula for the volume of a 4-dimensional sphere? If you have that one, can you come up with a formula for the volume of an n-dimensional sphere?</p>
<p>Don’t look it up! It’s a good problem. I highly encourage you to
work on it before looking at my solution.</p>
<p>More specifically, try to come up with an equation which relates the
volume of an n-dimensional sphere to an (n-1) dimensional sphere. You
may not be able to analytically evaluate your equation yourself (I
wasn’t), but it should be something that a computer could solve.</p>
<div onclick="showSolution()" style="cursor: pointer; font-weight: bold;">
Click to show solution
</div>
<p><br /></p>
<hr />
<p><br /></p>
<div id="solution" style="position: relative;">
<div class="blur-blocker" id="blocker"></div>
<p>I took the calculus approach and modeled an n-dimensional sphere as an integral of (n-1)-dimensional spheres. I denoted the volume of an n-dimensional sphere as <script type="math/tex">f_n(r)</script>. For a 3-dimensional sphere, my approach corresponds to following picture:</p>
<div style="text-align: center;">
<img src="
/assets/by-post/n-sphere/n-sphere.jpg" style="max-height: 500px; margin: 15px 0px;" />
</div>
<p>More generally, my recursive formula is the following system of equations:</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{align*}
f_n(r) &= 2 \int_0^r f_{n-1}(x) dh \\
x &= r \sin(\theta) \\
h &= r \cos(\theta) \\
dh &= -r \sin(\theta) d\theta
\end{align*} %]]></script>
<p>which, if you substitute, you get:</p>
<script type="math/tex; mode=display">f_n(r) = 2 \int_{\pi/2}^0 f_{n-1}(r\sin(\theta)) (-r \sin(\theta)) d\theta</script>
<p>Plugging this integral into python (using sympy), and starting with the formula for the “volume” of a “0-dimensional sphere”, i.e. a point, I was able to recursively derive the formulas I recognized for a circle and a sphere and beyond!</p>
<p>Note the formula for the “volume” of a 0-dimensional sphere (point) is <script type="math/tex">f_n(0) = r^0 = 1</script>.</p>
<iframe id="notebook" style="width: 800px; max-width: 100%; border: none;" src="
/assets/by-post/n-sphere/n-sphere.html">
</iframe>
<p>I tried to find the pattern in these formulas to come up with the closed formula solution, but in the end I gave up and looked at wikipedia, which of course has the solution. I’m not shocked I didn’t find the pattern, it’s non-trivial.</p>
<hr />
<p>Quick note: The following approach might be a bit more straightforward and also works. However, I couldn’t solve the integral by hand to obtain the (known) formula for a 3-dimensional sphere - and that was how I was checking my work - which is why I went with the approach above.</p>
<script type="math/tex; mode=display">f_n(r) = 2 \int_0^r f_{n-1} \Big( \sqrt{r^2 - x^2} \Big) dx</script>
<div>
<script src="/assets/js/iframe.js"></script>
<script>
let notebook = document.getElementById("notebook");
autoAdjustIframeHeight(notebook);
let isBlocked = true
function showSolution() {
if (isBlocked) {
document.getElementById("blocker").style.display = "none";
isBlocked = false
} else {
document.getElementById("blocker").style.display = "block";
isBlocked = true
}
}
</script>
</div>
</div>What’s the formula for the volume of a 4-dimensional sphere? If you have that one, can you come up with a formula for the volume of an n-dimensional sphere?The Metric Tensor2019-10-29T00:00:00+00:002019-10-29T00:00:00+00:00http://blog.russelldmatt.com/2019/10/29/the-metric-tensor<div style="display: none;">
<p><script type="math/tex">% <![CDATA[
\newcommand{\vec}[2]{\left[\begin{matrix}#1\\#2\end{matrix}\right]}
\newcommand{\vv}[1]{\overrightarrow{#1}}
\newcommand{\norm}[1]{\lVert#1\rVert}
\newcommand{\mat}[4]{\left[\begin{matrix}#1 & #3\\#2 & #4\end{matrix}\right]} %]]></script></p>
</div>
<p>In the last post, I tried to explain what a tensor is. It’s
complicated; it’s a long post. But what I didn’t tackle is the why.
Why do we care about this generalization of vectors and matrices?</p>
<p>To be honest, I mostly don’t know yet. My hope is to actually learn
the math behind general relatively at some point, and my current
understanding is that tensors are part of that math. However, I do
have one interesting point to make.</p>
<p>What is the dot product of a vector with itself? It’s the length squared, right?</p>
<p>Take, for instance, the vector <script type="math/tex">\vv{v} = [3, 4]</script> (with length 5):</p>
<script type="math/tex; mode=display">\vec{3}{4} \cdot \vec{3}{4} = 3 \cdot 3 + 4 \cdot 4 = 25</script>
<p>Right, of course this works. We’ve just reformulated the Pythagorean
theorem in a linear-algebra sort of way.</p>
<p>But wait, something is odd here. In the last post, we made a big deal
about how <em>covectors</em> were different than <em>vectors</em>. <em>covectors</em> were
functions from vectors to scalars, not vectors. What does it even
mean, then, to multiply two vectors together? In programming terms,
it’s like we’ve made a type error.</p>
<p>If we wanted to construct a (multi-linear) function from 2 vectors to
a scalar, as we seem to want when taking the dot product of 2 vectors,
we’d need a (0, 2)-tensor. Recall, that an (n, m)-tensor is a
multi-linear function from m vectors and n covectors to a scalar.</p>
<p>That’s actually correct, and the (0, 2)-tensor that we want is called
<em>the metric tensor</em>. To see why, let’s change our basis from the
standard orthonormal basis to something else.</p>
<p>Let’s use a new basis of <script type="math/tex">\vv{e_1} = [4, 4]</script> and <script type="math/tex">\vv{e_2} = [-1, 0]</script>. What are the coordinates of the
vector <script type="math/tex">\vv{v}</script> in the new basis? It looks like <script type="math/tex">[1, 1]</script> will do the trick.
How convenient.</p>
<p>Ok, so what’s the length of <script type="math/tex">\vv{v}</script> now? It’s the same! The length of a
vector does not depend on the coordinate system.</p>
<p>Right, right, what I meant was, how do we compute the length of the
vector now? Dot product right?</p>
<script type="math/tex; mode=display">\vec{1}{1} \cdot \vec{1}{1} = 1 \cdot 1 + 1 \cdot 1 = 2</script>
<p>Uh… that’s not right. No, of course that doesn’t work. The length
of the vector has to depend on the length of the basis vectors. What
I meant was to first scale each coordinate by the length of the appropriate basis vector before doing the multiplication. Something like this:</p>
<script type="math/tex; mode=display">\vec{1}{1} \cdot \vec{1}{1} =
(1 \cdot \norm{\vv{e_1}}) \cdot (1 \cdot \norm{\vv{e_1}})
+
(1 \cdot \norm{\vv{e_2}}) \cdot (1 \cdot \norm{\vv{e_2}})
= 1 \cdot 32 + 1 \cdot 1 = 33</script>
<p>Hmm, yea not that either. I guess I’m still trying to use the
Pythagorean theorem, but my triangle is not a right triangle anymore.
I’m making a triangle with one basis vector <script type="math/tex">\vv{e_1} = [4, 4]</script> and
one basis vector <script type="math/tex">\vv{e_2} = [-1, 0]</script>, but those vectors aren’t
orthogonal.</p>
<p>All this would be much more clear with a picture:</p>
<div style="text-align: center;">
<img src="
/assets/by-post/the-metric-tensor/v.jpg" style="width: 400px; margin-bottom: 20px;" />
</div>
<p>So maybe law of cosines? <script type="math/tex">c^2 = a^2 + b^2 - 2ab\cos{C}</script>? Actually
yes, that’s exactly right, but let me show you another way.</p>
<p>Like I said before, what we want is called <em>the metric tensor</em>.</p>
<script type="math/tex; mode=display">[[
{\vv{e_1}\cdot\vv{e_1}},
{\vv{e_2}\cdot\vv{e_1}}
],
[
{\vv{e_1}\cdot\vv{e_2}},
{\vv{e_2}\cdot\vv{e_2}}
]]</script>
<p>I wrote it out that way, as a row of row-vectors, on purpose. The
metric tensor is a (0, 2)-tensor, meaning it’s a function from two
vectors to a scalar, and a row of row vectors has the right
dimensionality for that multiplication. Let’s try it out with our new
basis:</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{align*}
\vv{e_1} \cdot \vv{e_1} &= 32 \\
\vv{e_1} \cdot \vv{e_2} &= -4 \\
\vv{e_2} \cdot \vv{e_2} &= 1 \\
\end{align*} %]]></script>
<p>So, our metric tensor is:</p>
<script type="math/tex; mode=display">[[32, -4], [-4, 1]]</script>
<p>Let’s multiply it by our vector <script type="math/tex">v = [1, 1]</script>:</p>
<script type="math/tex; mode=display">[[32 -4], [-4, 1]] \vec{1}{1} = [28, -3]</script>
<p>And again?</p>
<script type="math/tex; mode=display">[28, -3] \vec{1}{1} = 25</script>
<p>It works! So, why haven’t we ever heard of this thing before? Well, let’s write out the metric tensor in the standard, orthonormal basis:</p>
<script type="math/tex; mode=display">% <![CDATA[
\begin{align*}
\vv{b_1} &= [1, 0] \\
\vv{b_2} &= [0, 1] \\
\vv{b_1} \cdot \vv{b_1} &= 1 \\
\vv{b_1} \cdot \vv{b_2} &= 0 \\
\vv{b_2} \cdot \vv{b_2} &= 1 \\
\end{align*} %]]></script>
<p>So, the metric tensor, in an orthonormal basis, is the identity function:</p>
<script type="math/tex; mode=display">[[1, 0], [0, 1]]</script>
<p>which is why ignoring it, and treating vectors and covectors interchangeably, is usually fine.</p>What is a Tensor?2019-10-28T00:00:00+00:002019-10-28T00:00:00+00:00http://blog.russelldmatt.com/2019/10/28/what-is-a-tensor<p>I just completed the very good youtube playlist <a href="https://www.youtube.com/playlist?list=PLJHszsWbB6hrkmmq57lX8BV-o-YIOFsiG">Tensors for Beginners</a> by eigenchris and I want to jot down some notes before I forget everything.</p>
<p><em>An (n, m)-tensor is a multi-linear function from m vectors and n covectors to a scalar.</em></p>
<p>A tensor is a “geometrical object” in the same way that a vector is a “geometrical object” (and a vector is a tensor, so it really is in the same way). We often deal with the coordinates of a vector, which assumes a particular basis. But the exact same vector will have different coordinates if we change the basis. So, the vector itself is “invariant” under a change of basis, but the coordinates are not. However, the coordinates change in a predictable way under a change of basis. All the same is true for tensors (again, vectors <em>are</em> tensors).</p>
<p><em>Covectors</em> are a new “type of thing”. They’re functions from a vector to a scalar. One concrete way to think about them is that they’re “row vectors”. If you multiply a row vector by a vector, you get a scalar.</p>
<p><em>Tensor product</em>: So, a covector * vector = scalar. But a vector * covector = matrix. The latter is an example of a tensor product. More generally, a tensor product takes the cartesian product of the inputs, and for each ordered pair, you multiply the elements. So in the simple case of an n-dimensional vector v and an m-dimensional covector c, the tensor product v ⊗ c would have (n x m) dimensions, i.e. it can be represented by an (n x m) matrix! Think about each element of that matrix; the (i, j)th element is the product of the ith element of v and the jth element of c. So, you can see concretely what I mean by “the tensor product takes the cartesian product of the inputs, and for each ordered pair, you multiply the elements”.</p>
<p>Back to “what is a tensor”. A simple (n, m)-tensor can be constructed by the tensor product of n vectors and m covectors. Again, let’s think about a matrix. We just said that a matrix can be constructed via the tensor product of a vector and a covector. So, I guess that means a matrix is a (1, 1)-tensor! So, why did I say “simple” in “A <em>simple</em> (n, m)-tensor …”. Think about the set of matrices you can construct by multiplying a vector v * a row vector c. What’s their rank? Rank 1, of course! Every column is a scaled version of every other column, since all the columns are just scaled versions of v (the jth column is v * c[j]). Same goes for rows; each row is a scaled version of c (the ith row is v[i] * c). A rank 1 matrix is a very boring matrix indeed. If you think about a matrix as a function from vector -> vector (since, when you multiply a matrix by a vector you get a vector), all the output vectors lie on the same line (and that line points in same the direction as v). So, if these are 2-dimensional vectors, the rank 1 matrix will project all 2 dimensional vectors onto a line. Slight tangent, but this corresponds to having a zero determinant, having a zero eigenvalue, and being non-invertible.</p>
<p>So, are all tensors simple and uninteresting in the same way? No, tensors form a vector space, meaning that they can be scaled and added to each other, and the output will be another tensor. To create more interesting tensors, you can take linear combinations of simple tensors. Again, let’s make an analogy to something familiar: vectors. Any vector can be thought of as a linear combination of a set of “basis vectors” (and that’s how we get the vector’s coordinates). In 2-d space, using the standard basis, the two basis vectors are [0,1] and [1,0]. Every other vector is a linear combination of those two “simple” vectors. Tensors work the same way. In fact, if you start with a n-dimensional vector space (with n basis vectors) and a m-dimensional covector space (with m basis covectors), you can construct (n x m) basis (1, 1)-tensors by taking the tensor product of each of the n basis vectors with each of the m basis covectors.</p>
<p>To make that more concrete, let’s say n = 2 and m = 3 and let’s use the standard basis. You can construct the following 6 basis (1, 1)-tensors:</p>
<script type="math/tex; mode=display">% <![CDATA[
\newcommand{\vec}[2]{\left[\begin{matrix}#1\\#2\end{matrix}\right]}
\newcommand{\covec}[3]{\left[\begin{matrix}#1 & #2 & #3\end{matrix}\right]}
\newcommand{\mat}[6]{\left[\begin{matrix}#1 & #3 & #5 \\ #2 & #4 & #6\end{matrix}\right]}
\newcommand{\VS}{V^*}
\newcommand{\reals}{\mathbb{R}}
\vec{1}{0} \otimes \covec{1}{0}{0} = \mat{1}{0}{0}{0}{0}{0} \\
\vec{1}{0} \otimes \covec{0}{1}{0} = \mat{0}{0}{1}{0}{0}{0} \\
\vec{1}{0} \otimes \covec{0}{0}{1} = \mat{0}{0}{0}{0}{1}{0} \\
\vec{0}{1} \otimes \covec{1}{0}{0} = \mat{0}{1}{0}{0}{0}{0} \\
\vec{0}{1} \otimes \covec{0}{1}{0} = \mat{0}{0}{0}{1}{0}{0} \\
\vec{0}{1} \otimes \covec{0}{0}{1} = \mat{0}{0}{0}{0}{0}{1} \\ %]]></script>
<p>Now it’s easy to see how those 6 “simple” (1, 1)-tensors form a basis for any (2 x 3)-dimensional (1, 1)-tensor. Another thing that this example makes clear is that (1, 1) does not describe the dimensions of the matrix, it describes the number of vectors and covectors that were combined (via the tensor product) to create the tensor. What is the dimension of the (1, 1)-tensor? In this case it’s (2 x 3), but more generally if we take <script type="math/tex">dim(x)</script> to be the dimension of <script type="math/tex">x</script>, an (n, m)-tensor has dimension <script type="math/tex">dim(v_1) dim(v_2) \cdots dim(v_n) dim(c_1) dim(c_2) \cdots dim(c_m)</script>. These things can get big, fast!</p>
<p>So what about these linear functions? I started the post by saying: <em>An (n, m)-tensor is a multi-linear function from m vectors and n covectors to a scalar</em>, and yet we’ve barely mentioned functions at all. Well, remember when I said that covectors were <em>functions from a vector to a scalar</em>? We were on to something there.</p>
<p>Let’s denote the vector space of vectors as <script type="math/tex">V</script>. Let’s denote the vector space of covectors (called the dual vector space) with the symbol <script type="math/tex">\VS</script>. Another way to write this would be <script type="math/tex">V \rightarrow \reals</script>, since covectors are functions from a vector to a scalar (in my examples, I’ll use the reals as an example of a scalar, but it could be any field, i.e. rational, algebraic, reals, complex, etc.). So, what do we get when we take the tensor product of a vector and a covector? We already know this: a matrix, i.e. a (1, 1)-tensor. But what <em>is</em> a matrix? As I mentioned above, you can think about a matrix as a (linear) function from vectors to vectors, i.e. <script type="math/tex">V \rightarrow V</script>. What if we rewrote that as <script type="math/tex">V \rightarrow (\VS \rightarrow \reals)</script>? Kind of weird at first, but if you can think about a covector as a function from a vector to a scalar, can’t we similarly think about a vector as a function from a covector to a scalar? In other words, a covector * vector is a scalar. If we have one argument (either the covector or the vector), then we can treat that argument as fixed and we’re left with a function from the other argument to a scalar. So, to summarize: <script type="math/tex">(V \times \VS) \rightarrow \reals</script>, <script type="math/tex">V \rightarrow V</script>, <script type="math/tex">V \rightarrow (\VS \rightarrow \reals)</script>, and <script type="math/tex">\VS \rightarrow (V \rightarrow \reals)</script> are all ways of saying the same thing.</p>
<p>What do those statements mean in the familiar context of a matrix?</p>
<ul>
<li><script type="math/tex">(V \times \VS) \rightarrow \reals</script> is saying a matrix is: A function from a row vector and a vector to a scalar. Well, a row vector * a matrix * a vector = a scalar, so yea that checks out.</li>
<li><script type="math/tex">V \rightarrow V</script> is saying a matrix is: A function from a vector to a vector. Yes, a matrix * a vector = a vector.</li>
<li><script type="math/tex">V \rightarrow (\VS \rightarrow \reals)</script> is saying a matrix is: A function from a vector to (a function from a row vector to a scalar). A little weird, but ok, since a matrix * a vector = a vector, and vectors <em>are</em> functions from row vectors to scalars.</li>
<li><script type="math/tex">\VS \rightarrow (V \rightarrow \reals)</script> is saying a matrix is: A function from a row vector to (a function from a vector to a scalar). Huh, this one is a little new. What’s a (1 x n) row vector * an (n x m) matrix? Well, it’s a (1 x m) row vector. And what’s a (1 x m) row vector? We can think of it like a function from an (m x 1) vector to a scalar. Ok, checks out!</li>
</ul>
<p>So, our (1, 1)-tensor is like a function from a vector and a covector to a scalar, i.e. <script type="math/tex">(V \times \VS) \rightarrow \reals</script>. Furthermore, that function can be “partially applied”, i.e. if you pass in just the vector, you get a function from a covector to a scalar: <script type="math/tex">V \rightarrow (\VS \rightarrow \reals)</script>. Likewise, if you pass just the covector, you get a function from a vector to a scalar: <script type="math/tex">\VS \rightarrow (V \rightarrow \reals)</script>.</p>
<p>I think we’re ready to level up from (1, 1)-tensors. What about a (2, 1)-tensor? A (2, 1)-tensor is a (linear) function from 2 covectors and 1 vector to a scalar: <script type="math/tex">(V \times \VS \times \VS) \rightarrow \reals</script>. If you provide one covector, you’re left with a (1, 1)-tensor, i.e. <script type="math/tex">\VS \rightarrow ((V \times \VS) \rightarrow \reals)</script>. So, with this recursive viewpoint, we can build up an understanding of an (n, m)-tensor. An (n, m)-tensor is a function from n covectors and m vectors to a scalar, i.e. <script type="math/tex">(\VS_1 \times \VS_2 \times \cdots \times \VS_n \times V_1 \times V_2 \times \cdots \times V_m) \rightarrow \reals</script>.</p>
<!-- Dimensionality, revisited: Remember when we previously said that the dimension of an (n, m)-tensor is $$dim(v_1) dim(v_2) \cdots dim(v_n) dim(c_1) dim(c_2) \cdots dim(c_m)$$? Let's revisit that with our new understanding of tensors as linear functions. To keep things manageable, let's say we have a one dimensional vector which repesents the size of a house, and the size can only be one of three values {small, medium, large}. In addition, we have two (linear) functions that take our one-dimensional size "vector" and produce a scalar. To keep things concrete, function A estimates the value of the house from the size, and function B estimates the number of bedrooms. How many different -->
<!-- say we have a one dimensional vector space, maybe our one dimension is the number of square feet of a house. And we have a linear function from that vector space to a real number (our covector). Maybe it represents the average price of a house with that many square feet. If we take the tensor product of our vector space and covector space, we have a (1, 1)-tensor, a function from a square footage and -->I just completed the very good youtube playlist Tensors for Beginners by eigenchris and I want to jot down some notes before I forget everything.Pythagorean Proof2019-10-17T00:00:00+00:002019-10-17T00:00:00+00:00http://blog.russelldmatt.com/2019/10/17/pythagorean-proof<p>A particularly beautiful proof of the Pythagorean Theorem:</p>
<video controls="" style="min-width: 300px; max-width: 100%; max-height: 800px; border: 2px solid gray;">
<source src="
/assets/by-post/pythagorean-proof/pythag-proof.mp4" />" type="video/mp4">
Your browser does not support the video tag.
</video>A particularly beautiful proof of the Pythagorean Theorem:Using the Simulation Hypothesis Against Itself2019-07-12T00:00:00+00:002019-07-12T00:00:00+00:00http://blog.russelldmatt.com/2019/07/12/simulation-hypothesis-against-itself<p>Let’s formulate the <a href="https://en.wikipedia.org/wiki/Simulation_hypothesis#Simulation_hypothesis">simulation hypothesis</a>, which we will call H:</p>
<ol>
<li>
<p>Conscious beings will eventually figure out how to simulate other
conscious beings.</p>
</li>
<li>
<p>When they do so, they will simulate <em>many</em> more of them than ever
existed in their universe.</p>
</li>
<li>
<p>Therefore, if all you know is that you are a conscious being, the
probability that you exist in the first, top-level, non-simulated
universe is extraordinarily small given the fact that the vast
majority of conscious beings live in the lower levels of the
simulations.</p>
</li>
</ol>
<p>One interesting implication of this line of reasoning is that there
are likely to be many levels of this simulation. Conscious beings in
the first, top-level, non-simulated universe will simulate a universe
of conscious beings in the level below them, who in turn simulate a
universe of conscious beings in the level below them, and so on.</p>
<p>Let’s formulate a similar hypothesis, H’, along those lines:</p>
<ol>
<li>
<p>Conscious beings will eventually figure out how to simulate other
conscious beings.</p>
</li>
<li>
<p>Every level of the simulation will have fewer resources than the
level above it.</p>
</li>
<li>
<p>Our universe has a finite amount of resources. This is arguably a
fact, not a hypothesis, but what is a fact other than a hypothesis
which is extraordinarily likely to be correct, so let’s include it.</p>
</li>
<li>
<p>Therefore, the number of levels is not infinite. There exists a
“bottom” level, which will never successfully simulate another
level below it.</p>
</li>
<li>
<p>Each level (other than the bottom) will simulate <em>many</em> more
conscious beings than ever existed in their level.</p>
</li>
<li>
<p>Therefore, the vast majority of conscious beings will exist in the
“bottom” level.</p>
</li>
</ol>
<p>Hypothesis H’ sounds perfectly in line with the hypothesis H, since
H’s main conclusion was that it’s unlikely you live in the top,
non-simulated, level, which H’ agrees with. H’ just goes a bit
further and states that, not only do you not live in the top level,
but there is a high probability that you live in the bottom level of
the simulation for mostly the same set of reasons. It’s important to
note that, by definition, the bottom level will never figure out how
to simulate a level below it.</p>
<p>Whether or not we will ever simulate consciousness is highly disputed
(in fact, we’re disputing it right now). But if you think that there
is a high probability that we will eventually simulate consciousness,
as many smart people do, then you must think that P(H’) is relatively
low, since H’ implies that the probability that we live in the bottom
level of the simulation (which will not be able to simulate a level
below) it is high.</p>
<p>In addition, to my eye, P(H) and P(H’) seem to be highly correlated. They
mostly rely on the same argument, they just emphasize slightly
different aspects of the same conclusion, which is that there is this
pyramid of simulated universes in which each level has drastically
more conscious beings than the level above it.</p>
<p>So, it seems to me (note the hedging, as this is quite provisional),
that the higher you think the probability is that we will eventually
simulate consciousness is, the lower you should think that P(H) is
(since H’ is evidence against our ability to simulate consciousness
and P(H’) and P(H) are highly correlated). However, if we do simulate
consciousness, then point 1 of hypothesis H (H.1) is true (H.1 says
that conscious beings will eventually figure out how to simulate other
conscious beings). And, at least in my opinion, H.1 is the point
that, a priori, had the lowest probability of being true! In fact,
conditional on point 1 being true, I’d say that P(H) is almost 1
because all the other points in H seem so obviously correct.</p>
<p>To summarize,
<br /></p>
<hr />
<p>Proof by contradiction:</p>
<p>Assume statement SC: It is very likely that we will be able to simulate consciousness.</p>
<ul>
<li>P(H’) ≈ 0 because… hand-wavy math? Ok fine:
<ol>
<li>P(SC) = P(H’ & SC) + P(¬ H’ & SC) ≈ 1 (by assumption)</li>
<li>∴ P(¬ SC) ≈ 0 (from 1)</li>
<li>∴ P(H’ & ¬ SC) ≈ 0 (from 2)</li>
<li>P(H’ & SC) « P(H’ & ¬ SC) (H’ implies that you’re likely at bottom and can’t simulate)</li>
<li>∴ P(H’ & SC) ≈ 0 (from 3, 4)</li>
<li>∴ P(H’) = P(H’ & SC) + P(H’ & ¬SC) ≈ 0 (from 3, 5)</li>
</ol>
</li>
<li>
<p>P(H) is very small because P(H) is highly correlated with P(H’).</p>
</li>
<li>Also, P(H) is very close to 1 because
<ul>
<li>P(H.1) ≈ 1 due to P(SC) ≈ 1 (our assumption)</li>
<li>P(H) = P(H.3 | H.1 & H.2) * P(H.2 | H.1) * P(H.1)</li>
<li>P(H.2 | H.1) ≈ 1 and (H.3 | H.1 & H.2) ≈ 1 because they’re “obvious”</li>
<li>∴ P(H) ≈ 1</li>
</ul>
</li>
</ul>
<p>Contradiction!</p>
<p><strong>Therefore, it is very unlikely that we will be able to simulate consciousness.</strong></p>
<hr />
<p><br /></p>
<p>Obviously, the above “proof” isn’t really a proof, and we haven’t
found a literal contradiction, but it does feel hand-wavily correct to
me…</p>
<p>“But wait!”, you may be thinking, “I thought this article was going to
use the simulation hypothesis to argue against itself, which means
arguing that we don’t live in a simulation - not that we won’t be able
to simulate consciousness”. Ok, you got me. This doesn’t <em>directly</em>
argue against the simulation hypothesis. But, I did say that P(H) is
close to one if we can simulate consciousness (SC), i.e. P(H | SC)
≈ 1. And P(H) = P(H | SC) * P(SC) + P(H | ¬ SC) * P(¬ SC).
So if you previously thought that P(SC) was higher than you do now,
then presumably your new P(H) is also lower, since the weight you’re
putting on P(H | SC) - a number close to 1 - is now lower.</p>
<p>One last thought. Although I don’t yet see a critical flaw in the
argument, I’m not particularly convinced by it either, which is kind
of odd. And I’m not really sure why either. I think it’s partly that
all these arguments are so counter-intuitive and abstract that I
don’t really trust myself to be able to spot the logical flaw. But
I’ll do my best - so with that in mind…</p>
<h3 id="potential-holes-work-in-progress">Potential holes: work in progress</h3>
<p>H’ states that because our universe is finite, and each level has
fewer resources than the level above it, there are not infinite
levels - but this isn’t strictly true. It means that any pyramid of
simulations which includes us is finite. The level above us could be
infinite. An infinite universe could, for whatever reason, choose to
run a simulation with finite resources. It could also choose to run
some simulations with finite resources and others with infinite. So
you could imagine a branching tree of simulations, some of which are
finitely long, while others are infinite. In the infinite chains,
there would be no bottom, since infinity doesn’t end. So the
conclusion that “most conscious beings will exist in the bottom
simulation” would be patently untrue in that case, since <em>the vast
majority</em> (or all? infinities are hard…) would live in the
infinitely long chains. All that said, if that picture were true,
it’d be completely improbable that we’d live in one of the few
universes with finite resources, which we do, so I guess it’s probably
not true.</p>
<p>I’m sure there are many more holes… will add to this as I think of them.</p>Let’s formulate the simulation hypothesis, which we will call H: