Jekyll2021-10-22T15:12:22+00:00http://blog.russelldmatt.com/feed.xmlBlogThe Two Envelopes Problem2021-10-22T00:00:00+00:002021-10-22T00:00:00+00:00http://blog.russelldmatt.com/2021/10/22/two-envelopes-problem<p>If you haven’t already encountered the famous paradoxical “two
envelopes problem”, then I highly suggest you consider the following
prompt and return to this article much later. Without struggling with
the problem yourself first, the “resolution” below won’t seem nearly
as satisfying.</p>
<p>From <a href="https://en.wikipedia.org/wiki/Two_envelopes_problem">Wikipedia</a>:</p>
<blockquote>
<p>You are given two indistinguishable envelopes, each containing money. One contains twice as much as the other. You may pick one envelope and keep the money it contains. Having chosen an envelope at will, but before inspecting it, you are given the chance to switch envelopes. Should you switch?</p>
</blockquote>
<p>It seems that you can make compelling arguments for “always switch” and “it doesn’t matter whether or not you switch”.</p>
<p><strong>Always switch:</strong> Let’s denote the amount of money in the envelope that you chose as \(X\). The other envelope contains either \(2X\) or \(\frac{1}{2}X\). If we think these outcomes are equally likely, then the expected value of switching is \(\frac{1}{2}(2X) + \frac{1}{2}(\frac{1}{2}X) = \frac{5}{4}X\). In other words, if you switch then, in expectancy, you end up with more than you started. So always switch!</p>
<p><strong>It doesn’t matter:</strong> But that seems crazy! The problem is completely symmetric: you’re presented with two envelopes and you chose one at random. Why would it make any sense to switch when you could have just as easily randomly chosen the other envelope? Furthermore, if you do switch and then are presented with the option to switch again, doesn’t the same logic apply? But switching twice is the same as not switching so… that can’t be right. Common sense (and symmetry) strongly suggests that switching can’t matter.</p>
<p>One common objection to the argument for <strong>always switch</strong> above is
that we assumed that getting \(\frac{1}{2}X\) and \(2X\) were equally
likely, but that doesn’t make a lot of sense. If the probability of
the other enveloping having \(\frac{1}{2}X\) or \(2X\) were the same,
that implies the following two states are equally likely: the two
envelopes have \(\frac{1}{2}X\) and \(X\) in them, or the two
envelopes have \(X\) and \(2X\) in them. We will denote these pairs
of amounts as \((\frac{1}{2}X, X)\) and \((X, 2X)\).</p>
<p>But then we can apply the same logic to the case where our chosen
envelope has \(2X\) in it and we conclude that the pairs \((X, 2X)\)
and \((2X, 4X)\) must also be equally likely. And so on for \((2X,
4X)\) and \((4X, 8X)\), \((4X, 8X)\) and \((8X, 16X)\), and so on. We
can also apply this logic to smaller and smaller pairs of amounts,
e.g. \((\frac{1}{4}X, \frac{1}{2}X)\) and \((\frac{1}{2}X, X)\),
\((\frac{1}{8}X, \frac{1}{4}X)\) and \((\frac{1}{4}X, \frac{1}{2}X)\),
etc. In effect, you end up with an infinite number of equally likely
possibilities, which is an <a href="https://en.wikipedia.org/wiki/Prior_probability#Improper_priors">improper prior
distribution</a>.
We need the sum of the probabilities of our possible pairs of amounts to equal
\(1\), but when we sum the probabilities of this weird improper
distribution, we effectively get \(\infty \cdot \frac{1}{\infty}\),
which is not well defined.</p>
<p>Furthermore, the problem didn’t actually say that getting \(\frac{1}{2}X\) and \(2X\) were equally likely, so let’s dispense with that assumption.</p>
<p>To make this discussion more rigorous and less vague, let’s consider a new problem that has the same paradoxical properties as the origional one, but in which we know the exact distribution of the outcomes. To give credit where credit is due, everything below is due to the following excellent youtube video: <a href="https://www.youtube.com/watch?v=_NGPncypY68">https://www.youtube.com/watch?v=_NGPncypY68</a>.</p>
<hr />
<p><br /></p>
<h3 id="a-more-well-specified-problem">A more well-specified problem</h3>
<p>Below is a table of all possible states that the two envelopes (A and B) can be in, along with the probability of being in that state:</p>
<table>
<thead>
<tr>
<th>State</th>
<th>Probability</th>
<th>Envelope A</th>
<th>Envelope B</th>
</tr>
</thead>
<tbody>
<tr>
<td>\(S_1\)</td>
<td>1/2</td>
<td>$1</td>
<td>$10</td>
</tr>
<tr>
<td>\(S_2\)</td>
<td>1/4</td>
<td>$10</td>
<td>$100</td>
</tr>
<tr>
<td>\(S_3\)</td>
<td>1/8</td>
<td>$100</td>
<td>$1,000</td>
</tr>
<tr>
<td>…</td>
<td> </td>
<td> </td>
<td> </td>
</tr>
<tr>
<td>\(S_n\)</td>
<td>\(1/2^n\)</td>
<td>\(10^{n-1}\)</td>
<td>\(10^n\)</td>
</tr>
</tbody>
</table>
<p>To be clear, there are infinitely many states (not just \(n\) of them). Given that, the probabilities sum to \(1\), as they should. As in the original problem, you choose an envelope at random - meaning you’ll choose Envelope A with a 50% chance and Envelope B with a 50% chance, but you won’t know which one you’ve chosen. After picking an envelope, but before inspecting it, you’re given the option to switch. Should you?</p>
<p>To decide whether or not we should switch envelopes, let’s compute the expected value of switching. I’m going to use the <a href="https://en.wikipedia.org/wiki/Law_of_total_expectation">“law of total expectation”</a>, which is a fancy way of saying that I’ll compute \(E[switching]\) by:</p>
\[E[switching] = \sum\limits_{Y=y} E[switching | Y = y] \cdot P[Y = y]\]
<p>Where \(Y\) is some other random variable. I’m not saying concretely what \(Y\) is because I’m going to compute this three different ways using three different \(Y\)s.</p>
<h3 id="1-y-is-which-state-were-in-s_1-s_2-s_3--"><strong>1. \(Y\) is which state we’re in (\(S_1\), \(S_2\), \(S_3\), … )</strong></h3>
<p>To start, let’s compute the E[switching] by conditioning on which state we’re in.</p>
\[\begin{align}
E[switching] &= E[switching | state = S_1] \cdot P[state = S_1] \\
&+ E[switching | state = S_2] \cdot P[state = S_2] \\
&+ E[switching | state = S_3] \cdot P[state = S_3] \\
...
\end{align}\]
<p>Now we just need to compute each term:</p>
\[\begin{align}
E[switching | state = S_1] &= 1/2 (+9) + 1/2 (-9) &= 0 \\
E[switching | state = S_2] &= 1/2 (+90) + 1/2 (-90) &= 0 \\
E[switching | state = S_3] &= 1/2 (+900) + 1/2 (-900) &= 0 \\
\end{align}\]
<p>… you get the picture. Every term = 0, so <strong>E[switching] is clearly = 0.</strong></p>
<h3 id="2-y-is-the-value-in-the-envelope-we-picked"><strong>2. \(Y\) is the value in the envelope we picked</strong></h3>
<p>We’ll do exactly what we did before, but instead of conditioning on which state we’re in, let’s condition on the value of the envelope we picked (note: we don’t know this value, but it must have some value, right?):</p>
\[\begin{align}
E[switching] &= E[switching | picked = \$1] \cdot P[picked = \$1] \\
&+ E[switching | picked = \$10] \cdot P[picked = \$10] \\
&+ E[switching | picked = \$100] \cdot P[picked = \$100] \\
...
\end{align}\]
<p>Now we compute the terms:</p>
\[\begin{align}
E[switching | picked = \$1] &= +9 &> 0 \\
E[switching | picked = \$10] &= 2/3 (-9) + 1/3 (+90) &> 0 \\
E[switching | picked = \$100] &= 2/3 (-90) + 1/3 (+900) &> 0 \\
\end{align}\]
<p>… you get the picture. Every term > 0 (and we multiply each term by some positive probability), so <strong>E[switching] is clearly > 0.</strong></p>
<h3 id="3-y-is-the-value-in-the--other--envelope"><strong>3. \(Y\) is the value in the <em>other</em> envelope</strong></h3>
<p>We’ll do exactly what we did before, but instead of conditioning on the value in the envelope that we picked, we’ll condition on the value of the envelope we <em>didn’t</em> pick (which I’m calling “other”):</p>
\[\begin{align}
E[switching] &= E[switching | other = \$1] \cdot P[other = \$1] \\
&+ E[switching | other = \$10] \cdot P[other = \$10] \\
&+ E[switching | other = \$100] \cdot P[other = \$100] \\
...
\end{align}\]
<p>Now we compute the terms:</p>
\[\begin{align}
E[switching | other = \$1] &= -9 &< 0 \\
E[switching | other = \$10] &= 2/3 (+9) + 1/3 (-90) &< 0 \\
E[switching | other = \$100] &= 2/3 (+90) + 1/3 (-900) &< 0 \\
\end{align}\]
<p>… you get the picture. Every term < 0 (and we multiply each term by some positive probability), so <strong>E[switching] is clearly < 0.</strong></p>
<h2 id="what-gives">What gives!?</h2>
<p>According to the video (and I don’t know how much I should trust this random video), here’s what gives: The random variable “profit of switching envelopes” has no expected value. It’s not zero, it’s not positive infinity, and it’s not negative infinity. It’s simply not defined. This also explains why using the “law of total expectation” breaks down. As the Wikipedia article states, you can only use the law of total expectation on a random variable \(X\) if \(E[X]\) is defined. Here is a link to the video at the moment that he explains the resolution: <a href="https://youtu.be/_NGPncypY68?t=1213">https://youtu.be/_NGPncypY68?t=1213</a></p>
<p>When we compute the expected value, we’re summing up an infinite number of terms. In this case, <strong>the order in which we sum the terms matters</strong>. This is a very unusual property. This property occurs when the sum of all the positive terms in the series is +infinity and the sum of all the negative terms is -infinity. In those cases, you can rearrange the order of summation and get completely different results. Since no one summation order is “more correct” than another, this infinite series has no well-defined sum.</p>
<p>In case it’s not obvious, notice that in the three versions of E[switching] above, the only difference is the way in which we ordered and then grouped the terms of the sum. Here’s a picture to make it more clear:</p>
<p><img src=" /assets/by-post/two-envelopes-problem/two-envelopes.jpg" style="max-width: 500px" /></p>
<p>The thing I like about this explanation is that it not only resolves the paradox, but it also shows why you can make convincing arguments for either strategy (switch or don’t switch).</p>
<h2 id="remaining-dissonance">Remaining dissonance</h2>
<p>Although I’m quite pleased with the resolution above, I’m still a bit unsettled by the fact that I would still answer “yes” to the following two questions:</p>
<p>If we changed the “well-specified problem” to say that you could open <em>the envelope you chose</em> before deciding whether or not to switch, would you conclude that switching is <strong>better</strong> no matter what you saw in your envelope?</p>
<p>If we changed the “well-specified problem” to say that you could open <em>the other envelope</em> before deciding whether or not to switch, would you conclude that switching is <strong>worse</strong> no matter what you saw in your envelope?</p>If you haven’t already encountered the famous paradoxical “two envelopes problem”, then I highly suggest you consider the following prompt and return to this article much later. Without struggling with the problem yourself first, the “resolution” below won’t seem nearly as satisfying.The Joy of Discovering Math2021-10-21T00:00:00+00:002021-10-21T00:00:00+00:00http://blog.russelldmatt.com/2021/10/21/the-joy-of-discovering-math<p>Try to discover things for yourself. Let yourself struggle - <em>really struggle</em> - before seeing the answer. Whether you successfully solve the problem yourself or not, the result will be so much more satisfying.</p>
<p>Below is a passage from Knuth’s book <a href="https://smile.amazon.com/Surreal-Numbers-Donald-Knuth/dp/0201038129">Surreal Numbers</a>. It’s the most insightful and self-aware description of the asymmetry between discovering math for yourself and “being taught” math that I’ve ever seen:</p>
<div style="display:flex; flex-direction: column; align-items:center">
<img src="
/assets/by-post/the-joy-of-discovering-math/surreal1.png" style="max-width: 500px" />
<img src="
/assets/by-post/the-joy-of-discovering-math/surreal2.png" style="max-width: 500px" />
<img src="
/assets/by-post/the-joy-of-discovering-math/surreal3.png" style="max-width: 500px" />
</div>
<p>I noticed that the advice embedded in the passage above aligns almost perfectly with how Po-Shen Loh describes how he taught himself math in the following clip:</p>
<div style="display:flex; flex-direction: column; align-items:center">
<iframe width="560" height="315" src="https://www.youtube.com/embed/vpVRQuBWctQ?start=89" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen=""></iframe>
</div>Try to discover things for yourself. Let yourself struggle - really struggle - before seeing the answer. Whether you successfully solve the problem yourself or not, the result will be so much more satisfying.G13. Godel’s Proof (sketch)2021-09-26T00:00:00+00:002021-09-26T00:00:00+00:00http://blog.russelldmatt.com/2021/09/26/g13-godels-proof-sketch<style> .ul { white-space:nowrap; } </style>
<p>In this post, we present a sketch of Godel’s proof of his first
incompleteness theorem. As the word sketch suggests, we will lay out
the broad strokes of the proof without filling in many of the details.
I personally find this level of description best for giving me an
intuitive feel for the proof as a whole. Granted, that may only be
true because I’ve already spent many hours pouring over the details
and so I’m not troubled by the lack of rigor. Either way, I hope this
will give you a high level understanding of how the proof goes. At
the end, I will present links to further reading where you can fill in
the details for yourself.</p>
<p>To start, let’s fix on a particular formal system, which we will call
\(PM\). In the end, we will show that Godel’s proof applies to any
sufficiently strong formal system, but we can leave generalizations
for later.</p>
<p>First, Godel came up with an encoding scheme that can associate a
unique number to any formula within \(PM\). In a similar way, he was able
to associate a unique number with any proof (or derivation) within \(PM\).
A formula’s number is sometimes called its “Godel number” (abbreviated
as g.n.) and a proof’s number is called its “super g.n”.</p>
<p>Next, Godel constructed the following formula (with Godel number 42<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>):</p>
<p><strong>Formula 42</strong>: \(\lnot \exists m. \mathrm{Proof}(m, 42)\)</p>
<p>\(\mathrm{Proof}(m,n)\) is a relation between two numbers that is true
iff \(m\) encodes a \(PM\) derivation (proof) of formula with number \(n\).
Take it on faith, for now, that such a relation can be expressed
within \(PM\).</p>
<p>If we intepret formula 42, it says the following:</p>
<blockquote>
<p><em>It is not true that there exists a number m which encodes a \(PM\) proof
of formula 42</em>. In other words, <em>formula 42 is not provable within
\(PM\)</em>.</p>
</blockquote>
<p>The problems that follow from formula 42 are probably somewhat
self-evident, but for the sake of clarity let’s spell them out.</p>
<h2 id="the-semantic-argument">The semantic argument</h2>
<p>Say we found a proof of formula 42 within \(PM\). Then formula 42, when
interpreted, would make a false statement. It claims that no proof
exists, and yet we found one. This is bad; we just proved a false
statement within \(PM\). This means that \(PM\) is not sound, since a sound
system can only derive true statements. To summarize, if we can prove
formula 42, then \(PM\) is not sound. Contrapositively, if \(PM\) is sound,
then \(PM\) cannot prove formula 42.</p>
<p>Let’s assume \(PM\) is sound and therefore it cannot prove formula 42.
Now formula 42, when interpreted, makes a true statement! It claims
that no proof exists, and indeed we cannot find one. The trouble in
this case is that we’ve found a true statement that we cannot prove.
We cannot prove a true statement, and therefore \(PM\) is incomplete.</p>
<p>That is not strictly true. What if we could prove the negation of
formula 42? Wouldn’t that undermine the claim that \(PM\) is incomplete?
After all, incompleteness just means you can derive either \(\varphi\)
or \(\lnot \varphi\), for any formula \(\varphi\).</p>
<p>We can show, using similar arguments, that we cannot prove the
negation of formula 42. We have already shown that if \(PM\) is sound,
then it cannot prove formula 42, which means that formula 42 is true.
That implies that the negation of formula 42 is false. A sound formal
system can only prove true statements, so again - assuming \(PM\) is
sound - it cannot prove the negation of formula 42 either.</p>
<p>This completes a sketch of what’s called <strong>the semantic argument</strong>.
It says that if a formal system is sound and sufficiently expressive,
then it is incomplete. Of course we’ve left out all the details. We
haven’t shown why a sufficiently expressive formal system can indeed
express the \(\mathrm{Proof}\) relation. Even after that, it takes
considerable mental acrobatics to construct formula 42 such that its
own Godel number happens to be the same number for which it claims
there is no proof. Lastly, we have not shown Godel’s numbering
scheme, although that part is relatively straightforward.</p>
<p>Notice that the semantic argument derives incompleteness from a <em>sound
and sufficiently expressive</em> formal system, whereas the Godel’s
incompleteness theorem claims that <em>consistent and sufficiently
strong</em> formal systems are incomplete. The argument that derives
incompleteness from a <em>consistent and sufficiently strong</em> formal
system is called <strong>the syntactic argument</strong>. It’s considerably more
involved, so it’s worth pausing to reflect that the semantic argument
should be quite satisfying! Any formal system that we hope to use as
a foundation for all of mathematics had better be sound. Otherwise,
it’s able to prove formulas that make false statements, which doesn’t
sound like a great fit. So, if you try to follow the syntactic
argument below and find that your brain is left looking like a
pretzel, you can rest easy knowing that the semtantic argument is
“good enough”.</p>
<h2 id="the-syntactic-argument">The syntactic argument</h2>
<p>The syntactic argument does not assume that \(PM\) is sound, only that it
is consistent. Consistency is a weaker requirement than soundness,
which is what makes this argument more impressive. However, in
weaking one requirement, it needs to strengthen another. Instead of
requiring that \(PM\) is sufficiently expressive, it requires that it’s
sufficiently strong. Remember that sufficiently strong means that \(PM\)
can not only express all primitive recursive relations, but that it
can capture them. Here is a refresher on what it means to capture a
property (or relation) \(P\).</p>
<p>A formal system \(T\) can <em>capture</em> a property \(P\) by the open formula
\(\varphi(x)\) iff, for any \(n\):</p>
<ul>
<li>if \(n\) has the property \(P\), then \(T \vdash \varphi(\bar{n})\), and</li>
<li>if \(n\) does not have the property \(P\), then \(T \vdash \lnot \varphi(\bar{n})\)</li>
</ul>
<h4 id="if-pm-is-consistent-and-sufficiently-strong-then-pm-cannot-prove-formula-42">If \(PM\) is consistent (and sufficiently strong), then \(PM\) cannot prove formula 42</h4>
<p>Say we found a proof of formula 42 within \(PM\). Let’s compute the super
g.n. of that proof and call it \(p\). We can now derive the formula
\(\mathrm{Proof}(p, 42)\) within \(PM\). It’s easy to miss, but here is
where we needed the requirement that \(PM\) is <em>sufficiently strong</em> and
therefore can <em>capture</em> any primitive recursive relation. If \(PM\) can
capture the \(\mathrm{Proof}\) relation, then \(PM \vdash
\mathrm{Proof}(p, 42)\).</p>
<p>Formula 42 states \(\lnot \exists m. \mathrm{Proof}(m, 42)\) which is
equivalent to \(\forall m \lnot \mathrm{Proof}(m, 42)\). If we
instantiate the \(\forall m\) quantifier with the number \(p\), we get
\(\lnot \mathrm{Proof}(p, 42)\).</p>
<p>In summary, if we find a proof of formula 42 with super g.n. \(p\), \(PM\)
can derive both \(\mathrm{Proof}(p, 42)\) and \(\lnot \mathrm{Proof}(p,
42)\). In other words, if we can find a proof of formula 42, then \(PM\)
is inconsistent. Contrapositively, if \(PM\) is consistent (and
sufficiently strong), then \(PM\) cannot prove formula 42.</p>
<h4 id="if-pm-is-consistent-then-pm-cannot-prove-lnot-formula-42">If \(PM\) is consistent, then \(PM\) cannot prove \(\lnot\) formula 42</h4>
<p>In order to derive incompleteness, we also need to show that \(PM\) cannot
derive the negation of formula 42. In the semantic argument, this was
easy. We relied on the fact that if \(PM\) could not prove formula 42,
then formula 42 was true, and therefore \(\lnot\) formula 42 was false,
and a sound formal system cannot prove false statements. QED.</p>
<p>In the syntactic argument, it’s not so easy. This is where the
details get particularly subtle. We need to take a small digression
to explain the idea of \(\omega\)-consistency.</p>
<p><strong>\(\omega\)-inconsistency</strong>:</p>
<blockquote>
<p>A theory T is \(\omega\)-inconsistent iff, for some open formula
\(\varphi(x)\), \(T \vdash \exists \varphi(x)\) and yet for every number
\(m\) we have \(T \vdash \lnot \varphi(m)\).</p>
</blockquote>
<p>\(\omega\)-inconsistency is a Very Bad Thing (TM). It basically says
that you can prove something is not true for every single number, but
also you can prove that there exists “some” number for which it’s
true. In the same way that any useful formal system should be
consistent, it should also be \(\omega\)-consistent. Note that
\(\omega\)-consistency is a stronger requirement than plain
consistency; \(\omega\)-consistency implies plain consistency, but
plain consistency does not imply \(\omega\)-consistency.</p>
<p>We will now try to finish the syntactic argument, using the stronger
assumption that \(PM\) is \(\omega\)-consistent.</p>
<h4 id="if-pm-is-omega-consistent-then-pm-cannot-prove-lnot-formula-42">If \(PM\) is \(\omega\)-consistent, then \(PM\) cannot prove \(\lnot\) formula 42</h4>
<p>Say that \(PM\) is \(\omega\)-consistent and we can find a proof of
\(\lnot\) formula 42. If \(PM\) is \(\omega\)-consistent, then it is also
consistent, meaning that it cannot prove formula 42.</p>
<p>If \(PM\) cannot prove formula 42, we know that, for any \(m\), \(\lnot
\mathrm{Proof}(m, 42)\), otherwise we’ve found the proof of formula 42
within PM.</p>
<p>Recall the definition of formula 42 is \(\lnot \exists
m. \mathrm{Proof}(m, 42)\). So \(\lnot\) formula 42 is equivalent to
\(\exists m. \mathrm{Proof}(m, 42)\).</p>
<p>Now let’s bring the argument home. Say that \(PM\) can prove \(\lnot\)
formula 42, which is equivalent to \(\exists m. \mathrm{Proof}(m, 42)\).
If \(PM\) is consistent, then it cannot also prove formula 42, which means
that for any \(m\), it can prove \(\lnot \mathrm{Proof}(m, 42)\). The
ability to prove \(\lnot \mathrm{Proof}(m, 42)\), for any \(m\), as well
as \(\exists m. \mathrm{Proof}(m, 42)\) would mean that \(PM\) is
\(\omega\)-inconsistent. Contrapositively, if \(PM\) is
\(\omega\)-consistent, then it cannot prove \(\lnot\) formula 42.</p>
<p>This completes a sketch of the <strong>the syntactic argument</strong>. We’ve
demonstrated if \(PM\) is \(\omega\)-consistent and sufficiently strong then it
cannot derive either formula 42 or the negation of formula 42, making
it incomplete.</p>
<p>If you think that a bait-and-switch just occurred where we promised to
derive incompleteness from consistency but instead assumed
\(\omega\)-consistency, you’re 100% correct. Godel’s 1931 proof did in
fact require \(\omega\)-consistency to complete the syntactic argument.
However, in 1936, John Barkley Rosser proved <a href="https://en.wikipedia.org/wiki/Rosser%27s_trick" title="Rosser's
trick">Rosser’s
trick</a>, which showed that the requirement for \(\omega\)-consistency
may be weakened to consistency.</p>
<h2 id="filling-in-the-details">Filling in the details</h2>
<p>In the proof sketch above, I asked you to take a few things on faith.
One, that we could associate a unique number with any formula or proof
within PM. Next, that the \(\mathrm{Proof}\) relation could be
expressed and even captured within PM. Finally, that we can construct
“formula 42” (usually called the Godel sentence \(G\)) such that it’s
own Godel number is 42 <em>and</em> it claims that there is no proof of the
formula with number 42.</p>
<p>Originally, I had planned on filling in each of these details with
additional posts. However, filling in these details in a rigorous way
requires a full book (which I read, it’s called <a href="https://www.amazon.com/Introduction-Theorems-Cambridge-Introductions-Philosophy/dp/0521674530">An Introduction to
Gödel’s
Theorems</a>
by Peter Smith). So instead of trying to explain these details myself at some
intermediate level of rigor, which may or may not satisfy you, I’m
going to reference you to specific points at the book that explain
each point. Not only will that likely be a better explanation, but if
it doesn’t make sense or you require more background, you’ll have an
entire book to fall back on. Without further ado:</p>
<ol>
<li>
<p>Two formalized arithmetics</p>
<p>Everything we’ve talked about up to now has talked about “formal
systems” in general, but I found it quite helpful to see the
specifics of a few concrete formal theories of arithmetic. In
<a href="/assets/by-post/books/pdfs/Introduction-to-Godels-Theorems.pdf#page=76">chapter
10</a>,
Smith introduces \(BA\) (Baby Arithmetic) and then \(Q\). A few
chapters later, Smith introduces <a href="/assets/by-post/books/pdfs/Introduction-to-Godels-Theorems.pdf#page=106">First-order Peano
Arithmetic</a>
or \(PA\) (Note: \(PA\) and the \(PM\) system which I refer to
above are essentially the same).</p>
</li>
<li>
<p><a href="/assets/by-post/books/pdfs/Introduction-to-Godels-Theorems.pdf#page=150">Godel Numbering</a></p>
<p>For whatever reason, I find this detail relatively straightforward. It’s nice to see the details worked out, though.</p>
</li>
<li>
<p>The \(\mathrm{Proof}\) relation can be expressed</p>
<p>This is done in two steps. First, you show that the \(\mathrm{Proof}\) relation is primitive recursive (<a href="/assets/by-post/books/pdfs/Introduction-to-Godels-Theorems.pdf#page=154&zoom=100,90,100">19.4</a> and <a href="/assets/by-post/books/pdfs/Introduction-to-Godels-Theorems.pdf#page=163&zoom=100,90,533">20.4</a>). Then, you show that your formal system can express <em>all</em> primitive recursive relations (<a href="/assets/by-post/books/pdfs/Introduction-to-Godels-Theorems.pdf#page=127">15</a>).</p>
</li>
<li>
<p>The \(\mathrm{Proof}\) relation can be captured</p>
<p>I found this part of the proof to be, by far, the hardest to follow. I wish I could give you a two sentence explanation of the key insight here, but I’m not really sure what it is. Sometimes you just have to sit and stare at something until it finally clicks.</p>
<p>In <a href="/assets/by-post/books/pdfs/Introduction-to-Godels-Theorems.pdf#page=138">chapter 17</a>, Smith shows that any primitive recursive function or relation (including the \(\mathrm{Proof}\) relation) can be captured by \(Q\), and hence in \(PA\). Most of the heavy lifting of this proof is done by invoking <a href="/assets/by-post/books/pdfs/Introduction-to-Godels-Theorems.pdf#page=93&zoom=100,90,510">Theorem 11.5</a> that states that \(Q\) is \(\Sigma_1\)-complete.</p>
<p>Here’s my best attempt at describing what clicked for me: The proof of Theorem 11.5 shows that \(Q\) is \(\Sigma_1\)-complete, which means that every \(\Sigma_1\) formula can be either proven or disproven in \(Q\). I think I was expecting the proof of this statement to show me <em>how</em> to prove any arbitrary \(\Sigma_1\) formula, but that’s not what’s being claimed. How you show that a true \(\Sigma_1\) formula can be derived is extremely unsatisfying. If you have a true formula that says “there exists some \(x\) for which \(P(x)\) is true”, you can prove that by finding a specific \(x\) for which it’s true and then adding the existential quantifier at the front. But finding that specific \(x\) for which \(P(x)\) is true might be insanely hard.</p>
<p>Let’s say that Goldbach’s conjecture is false. Namely, “there exists some even number greater than two that is <em>not</em> the sum of two primes”. If that statement is true, then it’s provable within \(PA\). How would you prove it? Step 1: find an even number greater than two that is not the sum of two primes. Step 2: From that instance, derive the existential statement. Step 1 may take you a while.</p>
</li>
<li>
<p>Constructing Godel’s sentence</p>
<p>The final coup d’etat comes in <a href="/assets/by-post/books/pdfs/Introduction-to-Godels-Theorems.pdf#page=166">Chapter 21: \(PA\) is incomplete</a>. In this chapter, Smith shows how to construct Godel’s version of “formula 42” which claims (about itself) that it has no proof.</p>
</li>
</ol>
<p>And with that, I conclude my series on Godel’s First Incompleteness Theorem!</p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>His actual formula most certainly did not have a Godel number of 42. Normally, people refer to the number of his formula with the letter \(G\), but I find having a concrete number makes my brain hurt just a bit less. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>G12. Understanding Godel’s First Incompleteness Theorem - A Summary2021-07-18T00:00:00+00:002021-07-18T00:00:00+00:00http://blog.russelldmatt.com/2021/07/18/g12-understanding-godels-first-incompleteness-theorem<style> .ul { white-space:nowrap; } </style>
<p>If you’ve read through the entire Godel series thus far, you now have the prerequisite knowledge to precisely understand that statement that Godel’s first incompleteness theorem is making.</p>
<p>Here, again, is Godel’s first incompleteness theorem:</p>
<div id="theorem">
<p>Any <span class="consistency">consistent</span> <span class="formal-systems">formal system</span> F that is <span class="strong">sufficiently strong</span> is <span class="completeness">incomplete</span>; i.e., there are statements of the language of F which can neither be proved nor disproved in F.</p>
</div>
<p>This theorem pertains only to <span class="formal-systems"><em>formal
systems</em></span>, which are very rigid systems in which one starts from
a set of axioms and transforms them using the predefined
transformation rules of the system to derive more theorems. The
theorem is not making a statement about informal proofs which are
allowed to use any assumptions or leaps of logic that seem “obviously
true”. Formal systems in isolation can be somewhat meaningless, but
they are usually designed with an interpretation in mind. One can use
such an interpretation to translate formulas within a formal system
into mathematical statements.</p>
<p>Furthermore, this theorem is dealing with <span class="strong"><em>sufficiently strong</em></span> formal systems, by which
we mean that the formal system can <em>capture</em> all primitive recursive
relations.</p>
<p>The theorem says that if such a formal system is <span class="consistency"><em>consistent</em></span>, meaning that there are no
formulas \(\varphi\) for which it can derive both \(\varphi\) and its
negation \(\lnot \varphi\), then it is <span class="completeness"><em>incomplete</em></span>, meaning that there exists
some formula \(\varphi\) for which it cannot derive either \(\varphi\)
or \(\lnot \varphi\).</p>
<p>The important implication of this is that one of \(\varphi\) or
\(\lnot \varphi\) must be true, so if the formal system can derive
neither, then <strong>there exists a true statement that the formal system
can express, but not derive</strong>.</p>
<h3 id="why-is-that-so-surprising">Why is that so surprising?</h3>
<p>I personally find it quite intuitive (although apparently wrong) that
“mathematical truth” and “provable” are two ways of saying the same
thing. Before studying Godel, if someone had told me that there was a
true statement which was “unprovable”, I’d be pretty confused as to
what they meant by “true”. How are you sure it’s true if you can’t
prove it?</p>
<p>Now having a deeper understanding of Godel’s work, I understand there
are a few problems with that line of thought. For one, provable using
what initial assumptions? Every proof has to start <em>somewhere</em>, and
those are the starting axioms that you assume to be true.</p>
<p>I also now know that one way to show that a “true” statement is
unprovable is by showing that neither it nor it’s negation can be
proven. The beauty of this technique is that sidesteps the problem of
having to somehow show that a statement is true without being able to
prove it. Instead, all you need to agree on is that one of X or “not
X” must be true. If a system can prove neither, then there exists <em>a</em>
true statement that it cannot prove.</p>
<p>I want to quickly clarify that Godel’s theorem does not say that there
are true statements that cannot be proven under <em>any</em> formal system.
Just that, for a given formal system, there exist true statements that
cannot be proven. It’s a subtle distinction, but an important one.
<strong>It means that we cannot find a <em>single</em> set of initial assumptions
(and transformation rules) from which to derive all mathematical
truths.</strong> We may be able to derive all truths, but different truths
may need different starting assumptions. You have to admit, something
feels very unsatisfying about that.</p>
<p>If you find this extremely counterintuitive, you’re not alone. As
explained in <a href="{ %post_url
2021-07-05-g3-why-do-we-care-about-formal-systems.html %}">Why do we care about formal systems?</a>, finding a
single set of initial assumptions from which to derive all
mathematical truths was not some unrealistic ideal that no mathematicians
found plausible. On the contrary, in the early 1900’s many of the
world’s foremost mathematicians were <a href="https://en.wikipedia.org/wiki/Hilbert%27s_program">explicitly working towards this
goal</a>.</p>
<p>So, when Godel published his paper <em>On Formally Undecidable
Propositions of Principia Mathematica And Related Systems</em>, it
seriously shocked the mathematical community. In a single instant,
the goal of finding a single formal system on which all math could be
based - likely the life’s work of many mathematicians at the time -
was shown to be unattainable. Scientifically minded people often say
that being proven wrong is a gift because it’s in those moments when
you learn the most. Even so, I suspect that this was a tough pill for
some to swallow.</p>
<h3 id="the-proof">The proof</h3>
<p>In a future series of posts we will outline how Godel actually went
about proving this theorem.</p>
<script src="/assets/js/rough-notation.js"></script>
<script defer="">
MathJax.Hub.Queue(function () {
let blue = "#5680E9";
let lblue = "#84CEEB";
let teal = "#5AB9EA";
let grey = "#C1C8E4";
let purple = "#8860D0";
let pastel_red = "#FF6961";
class_to_color = {
"consistency": lblue,
"formal-systems": pastel_red,
"strong": purple,
"completeness": blue,
}
for (class_name in class_to_color) {
let color = class_to_color[class_name];
let elts = document.getElementsByClassName(class_name);
for (el of elts) {
RoughNotation.annotate(el, { type: "underline", color: color }).show();
}
}
let theorem = document.getElementById("theorem");
RoughNotation.annotate(theorem, {
type: 'bracket',
color: pastel_red,
brackets: ['left', 'right'],
animate: false
}).show();
(function () {
let elts = Array.from(document.getElementsByClassName("ul"));
for (el of elts) {
RoughNotation.annotate(el, { type: "underline", color: "red" }).show();
};
})();
});
</script>G11. Primitive Recursive Functions2021-07-14T00:00:00+00:002021-07-14T00:00:00+00:00http://blog.russelldmatt.com/2021/07/14/g11-primitive-recursive-functions<style> .ul { white-space:nowrap; } </style>
<p>Godel’s incompleteness theorem talks about formal systems that are <span class="ul">“sufficiently strong”</span>.
In this post, we will clarify what exactly is meant by that phrase.</p>
<p>Primitive recursive functions are a very large class of functions
that, very roughly speaking, correspond to functions that you can
compute “with only for loops”. The constraint of using only for loops
(as opposed to while loops) means that these functions cannot create
infinite loops and will therefore <span class="ul">always complete in a finite number of steps</span>. Furthermore, the number of steps it will take to complete is <span class="ul">bounded</span> (since
for loops have a predetermined length). We will define them much more
precisely later, but first we should talk about why we’re interested
in them.</p>
<h3 id="motivation">Motivation</h3>
<p>As we mentioned in the previous post, Godel defined what is known as “the Godel sentence” which can be interpreted as “this statement cannot be derived within this formal system”. At first glance, it’s not obvious that such a statement can be constructed within the formal system that Godel was using. However, Godel meticulously built up a series of relations - helper functions if you will - that he used in building his sentence. Furthermore, he showed that each one is “primitive recursive”. Lastly, he showed that his formal system can express (actually capture) <em>any</em> primitive recursive relation. Through this chain of logic, he showed that his formal system <span class="ul">can express the Godel sentence</span>.</p>
<p>This chain of logic actually says a bit more. It shows that <em>any</em> formal system, not just the one that Godel was using, that can express all primitive recursive relations can express a version of “the Godel sentence”. This is what is meant by a <strong>sufficiently expressive</strong> formal system: a formal system that can express all primitive recursive relations. The stronger version of this is a <strong>sufficiently strong</strong> formal system, which is a formal system that can <em>capture</em> all primitive recursive relations.</p>
<blockquote>
<p>Remember that expressing a relation between two numbers \(x\) and \(y\) with an open formula \(\varphi(x, y)\) means that the formula is <em>true</em> iff \(x\) and \(y\) have that particular relation, while capturing the relation means that \(\varphi(x, y)\) is derivable within the formal system if \(x\) and \(y\) have that particular relation and \(\lnot \varphi(x, y)\) is derivable if not.</p>
<p>Expressing is to truth as capturing is to derivability.</p>
</blockquote>
<p>Hopefully that sufficiently motivates the desire to understand what a primitive recursive relation (or function) actually is. So let’s get started.</p>
<h3 id="definition">Definition</h3>
<p>As with all of my posts, there probably exist better explanations out there on the internet. In this case, however, I think I’ve found one. The first 4 videos of <a href="https://www.youtube.com/playlist?list=PLC-8dKj3F0NUnR8LeBGH3utAI9aQjFbi5">this 5-video YouTube playlist</a> do an excellent job at defining and explaining primitive recursive functions. I will attempt to explain it myself below, but I highly recommend watching those videos.</p>
<p>We will start with the precise, but incredibly abstract definition and then work through a series of examples.</p>
<div class="aside">
<p>A quick note about notation before we start. Functions that take \(n\) arguments are called \(n\)-ary functions. One notational method to make it clear that a function takes \(n\) arguments to write it like so \(f(x_1, \ldots, x_n)\). This is clear and intuitive, but long - especially when composing \(k\) functions each with \(m\) arguments. A different method would be to notate each \(n\)-ary function with an \(n\) superscript, like so: \(f^n\). We will use both methods below.</p>
</div>
<div class="like-blockquote">
<p>The basic primitive recursive functions are given by these axioms:</p>
<ol>
<li><strong>Constant function</strong>: The 0-ary constant function \(Z^0 = 0\) is primitive recursive.</li>
<li><strong>Successor function</strong>: The 1-ary successor function \(S^1\), which returns the successor of its argument, is primitive recursive. That is, \(S^1(k) = k + 1\).</li>
<li><strong>Projection function</strong>: For every \(n≥1\) and each \(i\) with \(1≤i≤n\), the \(n\)-ary projection function \(P^n_i\), which returns its \(i\)-th argument, is primitive recursive. For example, \(P^3_2(x,y,z) = y\).</li>
</ol>
<p>More complex primitive recursive functions can be obtained by applying the operations given by these axioms:</p>
<ol>
<li>
<p><strong>Composition</strong>: Given a \(k\)-ary primitive recursive function \(f^k\), and \(k\) many \(m\)-ary primitive recursive functions \(g^m_1,\ldots,g^m_k\), the composition of \(f^k\) with \(g^m_1,\ldots,g^m_k\), i.e. the \(m\)-ary function
\(h^m(x_1,\ldots,x_m) = f^k(g^m_1(x_1,\ldots,x_m),\ldots,g^m_k(x_1,\ldots,x_m))\) is primitive recursive.</p>
</li>
<li>
<p><strong>Primitive recursion operator</strong>: Given \(f^k\), a \(k\)-ary primitive recursive function, and \(g^{k+2}\), a \((k+2)\)-ary primitive recursive function, the primitive recursion of \(f^k\) and \(g^{k+2}\) is defined as the \((k+1)\)-ary function \(h^{k+1}\) constructed as follows:
\(\begin{aligned}
h^{k+1} (0, x_1, \ldots, x_k) &= f^k (x_1, \ldots, x_k) \\
h^{k+1} (S(y), x_1, \ldots, x_k) &= g^{k+2} (y, h (y, x_1, \ldots, x_k), x_1, \ldots, x_k)\end{aligned}\)</p>
</li>
</ol>
<p>We will use the symbol \(Pr^{k+1}(f^k,g^{k+2})\) to indicate the primitive recursion of \(f^k\) and \(g^{k+2}\).</p>
<p>The <strong>primitive recursive</strong> functions are the basic functions and those obtained from the basic functions by applying composition and primitive recursion a finite number of times.</p>
</div>
<h3 id="interpretation-of-the-primitive-recursion-operator">Interpretation of the Primitive Recursion Operator</h3>
<p>In a rare turn of events, Wikipedia gives a (somewhat) intuitive way to think about the primitive recursive operator as a for loop:</p>
<blockquote>
<p>Interpretation. The function \(h\) acts as a for loop from 0 up to the value of its first argument. The rest of the arguments for \(h\), denoted here with \(x_i\)’s \((i = 1, \ldots, k)\), are a set of initial conditions for the for loop which may be used by it during calculations but which are immutable by it. The functions \(f\) and \(g\) on the right side of the equations which define \(h\) represent the body of the loop, which performs calculations. Function \(f\) is only used once to perform initial calculations. Calculations for subsequent steps of the loop are performed by \(g\). The first parameter of \(g\) is the “current” value of the for loop’s index. The second parameter of \(g\) is the result of the for loop’s previous calculations, from previous steps. The rest of the parameters for \(g\) are those immutable initial conditions for the for loop mentioned earlier. They may be used by \(g\) to perform calculations but they will not themselves be altered by \(g\).</p>
</blockquote>
<h3 id="examples">Examples</h3>
<p>The only way that I was able to really understand primitive recursion was seeing many examples and then working through a few myself. Let’s start with an easy one.</p>
<div class="brkt-l">
<h4 id="add2x--x--2">Add2(x) = x + 2</h4>
<p>To implement \(Add2^1(x) = x + 2\), we just need to apply the successor function \(S^1\) twice, which we can do via composition. Since the successor function is primitive recursive and composition is also primitive recursive, then the resulting \(Add2^1\) function is also primitive recursive.</p>
<p>\(Add2^1(x) = S^1(S^1(x)) = x + 2\)</p>
</div>
<div class="brkt-l">
<h4 id="zerox--0">Zero(x) = 0</h4>
<p>Notice that the 0-ary zero function \(Z^0\) is given to us as an axiom, but not the 1-ary zero function \(Z^1(x) = 0\). We can define it ourselves using primitive recursion:</p>
<p>\(\begin{aligned}
Z^1(0) &= f^0() = Z^0 \\
Z^1(y+1) &= g^2(y, Z^1(y)) = P^2_2(y,Z^1(y)) = Z^1(y) \\
Z^1 &= Pr(f^0, g^2)
\end{aligned}\)</p>
</div>
<p>Try to manually compute \(Z^1(2)\) using the definition above. Once you’re done, click <a href="/assets/by-post/g11-primitive-recursive-functions/Z2.jpeg">here</a> to check your work.</p>
<div class="brkt-l">
<h4 id="addxy--x--y">Add(x,y) = x + y</h4>
<p>\(\begin{aligned}
Add^2(0, y) &= f^1(y) = P^1_1(y) = y \\
Add^2(x+1, y) &= g^3(x, Add^2(x,y), y) = S(P^3_2) = Add^2(x,y) + 1 = x + y + 1 \\
Add^2 &= Pr(f^1, g^3)
\end{aligned}\)</p>
</div>
<div class="brkt-l">
<h4 id="multxy--x--y">Mult(x,y) = x * y</h4>
<p>\(\begin{aligned}
Mult^2(0, y) &= f^1(y) = Z^1(y) = 0 \\
Mult^2(x+1, y) & = g^3(x,Mult(x,y), y) = Add^2(P^3_2, P^3_3) = Add^2(Mult(x,y),y) = x \cdot y + y \\
Mult^2 &= Pr(f^1, g^3)
\end{aligned}\)</p>
</div>
<p>Notice the role of the projection functions in the examples thus far. They serve a critical, but trivial role. They allow you to select which arguments to pass to another function. In \(Add^2\), the composition of \(S(P^3_2)\) is just the function \(g(x,y,z) = S(y)\), since \(P^3_2\) is a function which takes 3 arguments and returns the 2nd. In \(Mult^2\), \(Add^2(P^3_2, P^3_3)\) is really just a confusing way to write \(g(x,y,z) = Add^2(y, z)\). From now on, for the sake of readability, I will omit the projection functions and allow myself to select and reorder arguments with the knowledge that we can make this rigorous via the use of projection functions if we need.</p>
<p>Also notice that I used \(Add^2\) to define \(Mult^2\). This is perfectly acceptable since primitive recursive functions are the basic functions and those obtained from the basic functions by applying composition and primitive recursion <em>a finite number of times</em>. So, we can use any primitive recursive function in the definition of another primitive recursive function.</p>
<p>Practice is critical in order to intuitively grasp how primitive recursion is like a for loop. Try to manually compute \(Mult^2(3,5)\). Once you’re done, click <a href="/assets/by-post/g11-primitive-recursive-functions/Mult35.jpeg">here</a> to check your work.</p>
<div class="brkt-l">
<h4 id="pown-x--xn">Pow(n, x) = \(x^n\)</h4>
<p>\(\begin{aligned}
Pow^2(0, x) &= f^1(x) = S(x) \\
Pow^2(n+1, x) &= g^3(n, Pow^2(n, x), x) = Mult^2(x, Pow^2(n,x)) = x \cdot x^n \\
Pow^2 &= Pr(f^1, g^3)
\end{aligned}\)</p>
</div>
<p>Are you getting the hang of these yet? If you can define a base case for when the first argument is equal to 0, and a recursive case that computes \(f(x+1,y)\) based on some combination of \(x\), \(y\), and \(f(x,y)\), then you can combine these using primitive recursion to define your function for any \(x\) and \(y\).</p>
<h3 id="a-goal-the-div-function">A Goal: The Div function</h3>
<p>We’re going to now head towards an ambitious goal. I want to define the following primitive recursive function: \(Div(x,y)\) which equals \(1\) if \(x\) is divisible by \(y\) and \(0\) otherwise. To do this, we’re going to need to build up a series of simpler primitive recursive functions to help.</p>
<div class="brkt-l">
<h4 id="sgnx--1-if-x--0-else-0">Sgn(x) = 1 if x > 0 else 0</h4>
<p>\(\begin{aligned}
Sgn^1(0) &= f^0() = Z^0 = 0 \\
Sgn^1(x+1) &= g^2(x, Sgn(x)) = S(Z^2) = 1 \\
Sgn^1 &= Pr(f^0, g^2)
\end{aligned}\)</p>
</div>
<div class="brkt-l">
<h4 id="predx--max0-x---1">Pred(x) = max(0, x - 1)</h4>
<p>Note that our “predecessor” function can never be negative, because primitive recursive functions only deal with the natural numbers, so \(Pred(0) = 0\).</p>
\[\begin{aligned}
Pred^1(0) &= f^0() = Z^0 = 0 \\
Pred^1(x+1) &= g^2(x, Pred^1(x)) = x \\
Pred^1 &= Pr(f^0, g^2)
\end{aligned}\]
</div>
<div class="brkt-l">
<h4 id="subxy--max0-y---x">Sub(x,y) = max(0, y - x)</h4>
<p>Note that our subtraction function can never be negative, like \(Pred\). Also note that \(Sub(x,y)\) is \(max(0, y - x)\) not \(max(0, x - y)\).</p>
\[\begin{aligned}
Sub^2(0, y) &= f^1(y) = y \\
Sub^2(x+1, y) &= g^3(x, Sub^2(x,y), y) = Pred^1(Sub^2(x,y)) \\
Sub^2 &= Pr(f^1, g^3)
\end{aligned}\]
</div>
<div class="brkt-l">
<h4 id="absdiffxy---x---y">Absdiff(x,y) = \(| x - y|\)</h4>
<p>\(Absdiff^2(x,y) = Add^2(Sub^2(x,y),Sub^2(y,x))\)</p>
</div>
<div class="brkt-l">
<h4 id="neqxy--1-if-x-neq-y-else-0">Neq(x,y) = 1 if \(x \neq y\) else 0</h4>
<p>\(Neq^2(x,y) = Sgn^1(Absdiff^2(x,y))\)</p>
</div>
<div class="brkt-l">
<h4 id="eqxy--1-if-x--y-else-0">Eq(x,y) = 1 if \(x = y\) else 0</h4>
<p>\(Eq(x,y) = Sub^2(Neq^2(x,y), S^1(Z^2)) = 1 - Neq^2(x,y)\)</p>
</div>
<div class="brkt-l">
<h4 id="remxy--x--y">Rem(x,y) = x % y</h4>
\[\begin{aligned}
Rem^2(0, y) &= f^1(x) = Z^1(x) = 0 \\
Rem^2(x+1, y) &= g^3(x, Rem^2(x, y), y) = Neq(Rem^2(x,y) + 1,y) \cdot (Rem^2(x,y) + 1) \\
Rem^2 &= Pr(f^1, g^3)
\end{aligned}\]
<p>The recursive case (\(g^3\)) is a little unintuitive. It basically says, if the (remainder of x / y) + 1 = y then 0 else (remainder of x / y) + 1.</p>
</div>
<div class="brkt-l">
<h4 id="divxy--1-if-x-is-divisible-by-y-else-0">Div(x,y) = 1 if x is divisible by y else 0</h4>
\[\begin{aligned}
Div^2(x,y) = Eq^2(Rem^2(x,y),Z^2(x,y)) = Eq^2(Rem^2(x,y), 0)
\end{aligned}\]
<p>The last step was refreshingly simple. No primitive recursion, just simple function composition.</p>
</div>
<h3 id="whats-not-primitive-recursive">What’s not primitive recursive?</h3>
<p>When introducing a property, it’s helpful to show a few examples of
things that <em>do not</em> have that property. In this case, what functions
<em>are not</em> primitive recursive?</p>
<p>Let’s take a step back and define three broad classes of functions:</p>
<ul>
<li>Primitive Recursive Functions</li>
<li>Computable functions that are not primitive recursive</li>
<li>Uncomputable functions</li>
</ul>
<p>Intuitively, the primitive recursive functions are the set of
functions that can be computed using “only for loops” which means they
must terminate (there are no infinite while loops) <em>and</em> the number of
steps can be bounded (since for loops have a predetermined length).</p>
<p>What, then, are computable functions? Intuitively, computable
functions are the set of functions that can be computed by a computer
(e.g. a Turing machine) given unlimited amounts of time and space.
Essentially, we remove the restriction that it must only use for loops
which means that the number of steps it takes to compute the result is
no longer necessarily bounded. The most classic example of such a
function is the <a href="https://en.wikipedia.org/wiki/Ackermann_function">Ackermann
function</a>.</p>
<p>What, then, are the uncomputable functions!? An unhelpful (but
accurate) definition is that they are the set of functions that are
not… computable. A more intuitive definition is that there is no
<em>finite</em> procedure (or algorithm) that can compute the function. The
most famous example of such a problem is <a href="https://en.wikipedia.org/wiki/Halting_problem">the Halting
problem</a>. A simpler
example is <a href="https://en.wikipedia.org/wiki/Busy_beaver">the Busy
beaver</a>. I will give a
slapdash explanation of the busy beaver problem here and why it’s
uncomputable.</p>
<p>An “nth Busy beaver” is binary-alphabet Turing machine with \(n\)
states that reads a tape initially consisting of all zeros. The Turing
machine will run and must eventually halt. At the point of halting,
the tape must contain as many or more 1’s on it than any other \(n\)
state Turing machine would produce under the same scenario.</p>
<p>The Busy beaver function takes as input \(n\) and returns the number
of 1’s that the “nth Busy beaver” would produce. Wikipedia states
that determining whether an arbitrary Turing machine is a busy beaver
is undecidable.</p>
<p>You may think: why not just enumerate all possible \(n\)-state
binary-alphabet Turing machines, run them all, and see which one
produces the most 1’s after they all halt? I <em>think</em> the problem with
this is that some of those Turing machines may run forever. Consider
a Turing machine that you’re attempting to “test” that has run for one
millions steps so far. How would you reliably decide whether or not
it will eventually halt? This seems like the Halting problem, which
is also uncomputable. Take this entire paragraph with a large grain
of salt as these are my own speculations and I’m relatively new to
these ideas.</p>
<p>One interesting addendum is that most functions are uncomputable,
which is unintuitive given every function we run into on a daily basis
is probably computable. This reminds me of the fact that almost all
real numbers are trancendental, but I bet you can’t name more than 2.</p>
<script src="/assets/js/rough-notation.js"></script>
<script defer="">
MathJax.Hub.Queue(function () {
(function () {
let elts = Array.from(document.getElementsByClassName("ul"));
for (el of elts) {
RoughNotation.annotate(el, { type: "underline", color: "red" }).show();
};
})();
(function () {
let elts = Array.from(document.getElementsByClassName("brkt-l"));
for (el of elts) {
RoughNotation.annotate(el, { type: 'bracket', color: 'red', padding: [0, 5], brackets: ['left'] }).show();
};
})();
});
</script>G10. Expressibility and Capturability2021-07-13T00:00:00+00:002021-07-13T00:00:00+00:00http://blog.russelldmatt.com/2021/07/13/g10-expressibility-and-capturability<style> .ul { white-space:nowrap; } </style>
<p>A critical step in Godel’s proof is his construction of “the Godel sentence” which, when interpreted, means <span class="ul">“this statement cannot be derived within this formal system”</span>. The formal system in which he constructed this statement is one that deals with the natural numbers, first order logic, and elementary arithmetic such as the successor function.</p>
<p><span class="ul">How in the world, then, did he express such a statement?</span> It is certainly not obvious that it’s possible. After all, it is not true that any formal system can express any statement, so the ability to write down a formula that has the above meaning is not a given. It takes a lot of work to demonstrate that such a statement can be expressed.</p>
<p>We’re not going to demonstrate how Godel expressed this statement in this post, but rather talk about the notion of expressibility in general. We want to know, precisely, what it means to be able to express something in a formal system, as well as the stronger property of capturing (or representing<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>).</p>
<p>First, let’s recall statements that cannot be expressed within particular formal systems. We’ve seen examples of this already. In the Add system, we could not express (true) statements of addition that dealt with negative numbers. When discussing first vs second order logic, we noted that first order logic could not quantify over sets. So, first order logic would not be able to express the statement “there exists a property \(P\) that has 3 elements”.</p>
<p>Now let’s see a few examples of formulas that <em>do</em> express familiar properties:</p>
<h4 id="evenness">Evenness</h4>
\[\exists v (2 \times v = x)\]
<p>The above formula is an <em>open</em> formula that expresses the property of being even. A formula is open when there are one or more variables that are not bound. In this case, \(v\) is bound by the quantifier \(\exists\), but \(x\) is not. So this formula has one free variable: \(x\). This open formula expresses the property of being even because for any number \(x\), this formula is true iff \(x\) is even. Put another way, this open formula has the set of even numbers as its extension.</p>
<h4 id="primeness">Primeness</h4>
\[(x \neq 1 \land \forall u \forall v (u \times v = x \supset (u = 1 \lor v = 1)))\]
<p>The above open formula expresses the property of being prime. In words, it says that \(x \neq 1\) and for all two numbers \(u\) and \(v\), if \(u \times v = x\) then either \(u = 1\) or \(v = 1\).</p>
<h4 id="definition">Definition</h4>
<p>An open formula \(\varphi(x)\) can <em>express</em> a property \(P\) iff, for any \(n\):</p>
<ul>
<li>if \(n\) has the property \(P\), then \(\varphi(\bar{n})\) is true, and</li>
<li>if \(n\) does not have the property \(P\), then \(\lnot \varphi(\bar{n})\) is true.</li>
</ul>
<p>This definition can be extended to many-place relations (not just one-place properties) in the obvious way.</p>
<h2 id="capturing-relations">Capturing Relations</h2>
<p>There is a stronger version of expressing a property (or relation) which is <em>capturing</em> a property (or relation).</p>
<p>A formal system \(T\) can <em>capture</em> a property \(P\) by the open formula \(\varphi(x)\) iff, for any \(n\):</p>
<ul>
<li>if \(n\) has the property \(P\), then \(T \vdash \varphi(\bar{n})\), and</li>
<li>if \(n\) does not have the property \(P\), then \(T \vdash \lnot \varphi(\bar{n})\)</li>
</ul>
<p>This definition can be extended to many-place relations (not just one-place properties) in the obvious way.</p>
<p>Expressing a property with a formula \(\varphi(x)\) means that the \(\varphi(x)\) is <span class="ul"><em>true</em></span> iff \(x\) has the relevant property, capturing a property means that \(\varphi(x)\) is <span class="ul">derivable</span> in the formal system iff \(x\) has the relevant property.</p>
<p><span class="ul"><em>Expressing is to truth as capturing is to derivability.</em></span></p>
<script src="/assets/js/rough-notation.js"></script>
<script defer="">
MathJax.Hub.Queue(function () {
let elts = Array.from(document.getElementsByClassName("ul"));
for (el of elts) {
RoughNotation.annotate(el, { type: "underline", color: "red" }).show();
}
});
</script>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>In another example of Naming Is Hard™, the notion of capturability goes by other names, most notably representability (which is used by <a href="https://www.amazon.com/G%C3%B6del-Escher-Bach-Eternal-Golden/dp/0465026567">GEB</a>). <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>G9. First vs. Second Order Logic2021-07-12T00:00:00+00:002021-07-12T00:00:00+00:00http://blog.russelldmatt.com/2021/07/12/g9-first-vs-second-order-logic<style> .ul { white-space:nowrap; } </style>
<p>Godel’s incompleteness theorem deals with “first order logic”, so it’s worth a post explaining what that is. In particular, we will distinguish it from zeroth order logic as well as from second and higher order logics.</p>
<p>What distinguishes these levels of logic from one another is what they’re allowed to <em>quantify</em> over. The two most common quantifiers are the symbols \(\forall\) which stands for “for all” and \(\exists\) which stands for “there exists”. One might use these quantifiers to make a statement like so:</p>
\[\forall x ((Prime(x) \land x > 2) \supset Odd(x))\]
<p>which is a formal way of stating that all primes greater than 2 are odd. If I translated this formula into a sentence, it would read: “for all \(x\), if \(x\) is prime and \(x\) is greater than 2, then \(x\) is odd”. I’m assuming I’ve already defined the \(Prime\) and \(Odd\) properties elsewhere, but you get the idea.</p>
<p>When we quantify over variables (\(x\) in this case), we first need to specify the domain over which these variable can range. Maybe the domain of \(x\) is the natural numbers, maybe it’s the real numbers, or maybe it’s something altogether different like colors. I can quantify over colors. There exists a color \(x\) that I like more than red. QED.</p>
<p>Let’s say that we’ve specified the domain is the natural numbers, as Godel did.</p>
<p><strong>Zeroth order logic</strong> is <span class="ul">not allowed to quantify</span> over the domain at all. By the way, propositional logic is another name for zeroth order logic.</p>
<p><strong>First order logic</strong> is <span class="ul">allowed to quantify over <em>individuals</em></span> of the domain (e.g. over natural numbers). The statement I made above about primes is a statement of first order logic in that \(x\) is to be interpreted as a natural number.</p>
<p><strong>Second order logic</strong> is <span class="ul">allowed to quantify over <em>sets</em></span> of the domain. Recall that a property of the natural numbers can be thought of as the set of numbers which have that property. So, the ability to quantify over sets gives one the ability to quantify over properties (or relations, more generally). Here is a statement that requires second order logic:</p>
\[\exists P. P(5) \land P(7)\]
<p>In words, there exists a property \(P\) such that both \(5\) and \(7\) have the property \(P\). Maybe \(P\) is the property of being odd, or the property of being prime, or the property of being less than 10, … the list can go on.</p>
<p><strong>Higher order logics</strong>: You can extend this pattern to higher order logics. Third order logic can quantify over sets of sets… and so on.</p>
<script src="/assets/js/rough-notation.js"></script>
<script defer="">
MathJax.Hub.Queue(function () {
let elts = Array.from(document.getElementsByClassName("ul"));
for (el of elts) {
RoughNotation.annotate(el, { type: "underline", color: "red" }).show();
}
});
</script>G8. Relations (in Logic, not in Love)2021-07-11T00:00:00+00:002021-07-11T00:00:00+00:00http://blog.russelldmatt.com/2021/07/11/g8-relations<style> .ul { white-space:nowrap; } </style>
<p>The proof of Godel’s theorems deals with “relations”, which was not a term that I was familiar with before learning about Godel. I could have benefited from a short introduction, so here goes.</p>
<p>Let’s see what wikipedia has to say…</p>
<blockquote>
<p>In mathematics, a <strong>finitary relation</strong> over sets <em>X_1, …, _X__n</em> is a subset of the Cartesian product <em>X_1 × … × _X__n</em>; that is, it is a set of <em>n</em>-tuples (<em>x_1, …, _x__n</em>) consisting of elements <em>x__i</em> in <em>X__i</em>.</p>
</blockquote>
<p>Oh, wikipedia… how you can turn simple ideas into unintelligible gibberish. Let’s try another source: Encyclopedia Brittanica…</p>
<blockquote>
<p><strong>Relation</strong>, in <a href="https://www.britannica.com/topic/logic">logic</a>, a set of ordered pairs, triples, quadruples, and so on. A set of ordered pairs is called a two-place (or dyadic) relation; a set of ordered triples is a three-place (or triadic) relation; and so on. In general, a relation is any set of ordered n-tuples of objects.</p>
</blockquote>
<p>Much more clear. Now let’s see a bunch of examples:</p>
<p><strong>Equality</strong>: \(Eq(m, n)\) is a two-place relation that consists of all pairs \((m, n)\) such that \(m = n\).</p>
<p><strong>Less than</strong>: \(Lt(m, n)\) is a two-place relation that consists of all pairs \((m, n)\) where \(m < n\).</p>
<p>Hopefully you’re starting to get the picture.</p>
<p><strong>Prime</strong>: \(Prim(n)\) is a one-place relation that consists of all elements \(n\) that are prime. One-place relations are often called “properties”.</p>
<p><strong>Divisibility</strong>: \(Div(m, n)\) consists of all pairs \((m, n)\) such that \(m\) is divisible by \(n\).</p>
<p><strong>Factorial</strong>: \(Fact(m, n)\) consists of all pairs \((m, n)\) s.t. \(m = n!\).</p>
<p><strong>Prime Factors</strong>: \(NPrimeFactors(m, n)\) consists of all pairs \((m, n)\) s.t. the number \(m\) has \(n\) unique prime factors.</p>
<p><strong>Prime Factor Multiplicity</strong>: \(exf(m, n, i)\) consists of all triples \((m, n, i)\) s.t. \(i\) is the exponent of the \(n\)th prime number in \(m\)’s prime factorization.</p>
<p>In this way, you can see how relations are extremely general and can represent properties of numbers (e.g. primeness) as well as things that are normally described as functions (e.g. factorial or the exponent of the \(n\)th prime number in \(m\)’s factorization), but if you boil it down… it’s just a set.</p>
<script src="/assets/js/rough-notation.js"></script>
<script defer="">
MathJax.Hub.Queue(function () {
let elts = Array.from(document.getElementsByClassName("ul"));
for (el of elts) {
RoughNotation.annotate(el, { type: "underline", color: "red" }).show();
}
});
</script>G7. Can insufficiently strong formal systems be both consistent and complete?2021-07-10T00:00:00+00:002021-07-10T00:00:00+00:00http://blog.russelldmatt.com/2021/07/10/g7-can-insufficiently-strong-formal-systems-be-both-consistent-and-complete<style> .ul { white-space:nowrap; } </style>
<p>Yes<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>. We’ve actually already seen an example of this. The Add system first introduced in <a href="/2021/07/02/g1-what-is-a-formal-system.html">G1. What is a formal system?</a> and analyzed more thoroughly in <a href="/2021/07/07/g5-soundness-consistency-and-completeness.html">G5. Soundness, Consistency, and Completeness
</a>. However, I can come up with an even more trivial example. Consider the following system:</p>
<p>The <strong>True</strong> system:</p>
<ol>
<li>Alphabet: \(True\), \(False\)</li>
<li>Grammar:
<ul>
<li>The only valid formulas are either \(True\) and \(False\)</li>
</ul>
</li>
<li>Transformation rules: None</li>
<li>Axioms:
- \(True\)</li>
</ol>
<p>I think you can guess my intended interpretation for these “symbols”. This system is so incredibly trivial that we can enumerate all formulas (\(True\), and \(False\)) as well as all theorems (\(True\)).</p>
<p><strong>Is this system sound?</strong> Unquestionably yes. All theorems are “true” in the standard interpretation. The only theorem is literally the symbol \(True\), which stands for true.</p>
<p><strong>Is this system consistent?</strong> Yes. I am quite certain that this system cannot derive a contraction, as there is only one theorem.</p>
<p><strong>Is this system complete?</strong> Again, yes. Every true formula that is expressible in the system (which is just the formula \(True\)) is a theorem.</p>
<p>Hopefully this trivial example makes it clear that weak formal systems can quite easily be sound, consistent, and complete.</p>
<p>Are there more interesting examples of systems that are sound, consistent, and complete? Yes, many. You may not know that before Godel proved his incompleteness theorem, he first proved Godel’s completeness theorem which proved that propositional logic is, in fact, (semantically) complete!</p>
<script src="/assets/js/rough-notation.js"></script>
<script defer="">
MathJax.Hub.Queue(function () {
let elts = Array.from(document.getElementsByClassName("ul"));
for (el of elts) {
RoughNotation.annotate(el, { type: "underline", color: "red" }).show();
}
});
</script>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>A rare exception to <a href="https://en.wikipedia.org/wiki/Betteridge%27s_law_of_headlines">Betteridge’s law of headlines</a> <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>G6. Inconsistent systems can prove anything2021-07-08T00:00:00+00:002021-07-08T00:00:00+00:00http://blog.russelldmatt.com/2021/07/08/g6-inconsistent-systems-can-prove-anything<style> .ul { white-space:nowrap; } </style>
<h3 id="or-inconsistent-systems-are-complete">Or… inconsistent systems are complete</h3>
<h3 id="or-if-p-and-neg-p-then-anything">Or… if \(p\) and \(\neg p\), then anything</h3>
<p>A rather counterintuitive observation is that many inconsistent systems are, in fact, complete! More precisely, if a formal system that includes propositional logic can derive a contradiction, then it can derive anything, and is therefore complete.</p>
<p>Why? Because it turns out that the following formula is a theorem of propositional logic:</p>
\[p \supset (\sim p \supset q)\]
<p><em>Exercise for the reader: derive the formula.</em> <a href="/assets/by-post/if-p-and-not-p-then-anything/proof.html">[My almost solution]</a></p>
<p>In words, ‘if p, then if not p, then q’. In other words, if you can derive both \(p\) and \(\lnot p\), then (via the rule of detachment twice) you can derive \(q\). And remember, by the rule of substitution, <em>any</em> formula can be substituted for a sentential variable to derive another formula. So, if we can derive \(p\) and \(\lnot p\), and <span class="ul">we can substitute anything we want for \(q\)</span>, then we can derive anything! To put it shortly, from a contradiction, anything can be derived.</p>
<p>This fact has an interesting consequence. It’s actually quite cute. We just showed that if a system (that includes propositional logic) can derive a contradiction, then it can derive any formula. So, <span class="ul">if any formula <em>cannot</em> be derived, then it must not be able to derive a contraction!</span> In other words, proving that a single formula is not derivable within a system is equivalent to proving that the system is consistent.</p>
<p>How might we prove that a formula <em>cannot</em> be derived? By employing some <em>meta-mathematical</em> reasoning about the system as a whole. Note, we are going to prove something about the system, which is very different from deriving something within the system. Here’s one general strategy:</p>
<ol>
<li>Find a property which is true of all the axioms.</li>
<li>Demonstrate that the property is preserved via all the transformation rules. In other words, demonstrate that all formulas derived from the axioms, i.e. theorems, will also have this property.</li>
<li>Find a formula which does not have this property. This formula must not be a theorem.</li>
</ol>
<p>As a toy example, imagine all the axioms were made up of 25 or more symbols. And imagine every transformation rule only increases the number of symbols in the derived formula. If you can construct a formula with fewer than 25 symbols, then you know it’s not a theorem.</p>
<script src="/assets/js/rough-notation.js"></script>
<script defer="">
MathJax.Hub.Queue(function () {
let elts = Array.from(document.getElementsByClassName("ul"));
for (el of elts) {
RoughNotation.annotate(el, { type: "underline", color: "red" }).show();
}
});
</script>