Jekyll2022-08-11T20:59:11+00:00http://blog.russelldmatt.com/feed.xmlBlogThe Coriolis Force2022-07-26T00:00:00+00:002022-07-26T00:00:00+00:00http://blog.russelldmatt.com/2022/07/26/the-coriolis-force<style>
.canvas {
position: relative;
box-shadow: 5px 5px 5px grey;
border: 1px solid grey;
width: 400px;
max-width: 100%;
height: 400px;
display: block;
margin: 0 auto 30px;
}
#overlay {
display: flex;
align-items: baseline;
justify-content: flex-end;
position: absolute;
opacity: 0.6;
background-color: gray;
z-index: 1;
width: 100%;
height: 100%;
width: 400px;
}
#overlay img {
cursor: pointer;
}
</style>
<p>Fun fact: it’s not a force at all!</p>
<p>Here’s a question to ponder: Say I’m standing on the equator and I shoot a bullet directly north with an extremely powerful gun. Assume that (somehow) my bullet will stay at a constant altitude. Will my bullet:</p>
<ol>
<li>straight up and pass directly over the north pole</li>
<li>upwards, but at an angle such that it will pass to the right of the north pole</li>
<li>upwards, but at an angle such that it will pass to the left of the north pole</li>
</ol>
<p>It’s really not obvious. Or at least it wasn’t to me. The Coriolis effect (again, not a force!) explains which of those is the correct answer.</p>
<h2 id="linear-conveyor-belts">Linear Conveyor Belts</h2>
<p>Let’s start with a much simpler question. Imagine there is a series of conveyor belts, like so:</p>
<div id="linear" class="canvas"></div>
<p>The conveyor belts are moving at different speeds. The bottom one is traveling the fastest while the top one isn’t moving at all. Imagine standing on the bottom one and firing a bullet “directly north”, i.e. directly towards the top of the screen.</p>
<p>Imagine you fire the bullet at the start of the animation when all the red lines are perfectly aligned into a line (restart the animation above to see what I mean). Will the bullet travel:</p>
<ol>
<li>straight up and eventually pass directly over the top red line (which isn’t moving)</li>
<li>upwards, but at an angle such that it will pass to the right of the top red line</li>
<li>upwards, but at an angle such that it will pass to the left of the top red line</li>
</ol>
<p>Make a guess and then play the animation below:</p>
<div id="linear-with-ball" class="canvas"></div>
<p>The answer is 2. It travels at an angle and passes to the right of the top red line. Why? Even though you fire the bullet “directly north”, the bullet inherits your horizontal velocity as well. The velocity of the bullet is the sum of whatever velocity you have when you fire it <em>plus</em> the velocity the gun adds to the bullet in the direction that you fire it.</p>
<h2 id="circular-conveyor-belts">Circular Conveyor Belts</h2>
<p>Now let’s change the conveyor belts to move in a circle, rather than in a straight line, like so:</p>
<div id="circular" class="canvas"></div>
<p>Just like in the linear conveyor belts example, we have a series of conveyor belts that are all moving at different speeds. The outermost conveyor belt is moving the fastest and the innermost conveyor belts is moving the slowest.</p>
<p>But notice an important difference between this example and the last one: now the red line segments <em>stay aligned</em>, even as the conveyor belts move at different speeds. Why? Becuase all the belts are moving at the same angular velocity and therefore rotate around the circle the same number of times per second. The outermost belt needs to move faster in order to perform a rotation in the same amount of time as the innermost belt.</p>
<p>Ok now let’s ask the same question as before: Imagine you fire the bullet at the start of the animation when all the red lines are perfectly aligned vertically. Will the bullet travel:</p>
<ol>
<li>straight up and eventually pass directly over the center of the circle</li>
<li>upwards, but at an angle such that it will pass to the right of the center of the circle</li>
<li>upwards, but at an angle such that it will pass to the left of the center of the circle</li>
</ol>
<p>Make a guess and then play the animation below:</p>
<div id="circular-with-ball" class="canvas"></div>
<p>Again, the answer is 2 - and for exactly the same reason as before. Even though you fire the bullet “directly north”, the bullet inherits your horizontal velocity. This makes the bullet travel “northeast” and pass to the right of the center of the circle.</p>
<h2 id="the-globe">The Globe</h2>
<p>Let’s make one last adjustment. Instead of circular conveyor belts, let’s consider a 2D picture of the globe with the north pole exactly at the center, like so:</p>
<p><img src=" /assets/by-post/the-coriolis-force/globe-n.jpg" class="canvas" /></p>
<div id="globe-n" class="canvas"></div>
<p>What’s different about this example? In all the previous examples, we were dealing with “flat 2D space”. That means that the trajectory of the bullet looked like a straight line from our “birds-eye” perspective. That’s no longer true. We’ve now projected the top of a sphere (the northern hemisphere of the globe) onto a 2D circle by flattening it out. Distances on our projection don’t faithfully correspond to distances on the sphere. In particular, since the outer edges of the sphere (near the equator) have been flattened more than the center of the circle, the bullet appears to move slower at the edges of the circle. This effect can be seen by the non-uniform spacing of the concentric circles. The true distance between consecutive circles is constant, but it doesn’t appear that way in our projection.</p>
<p>Importantly, though, one fact stays the same. The outermost regions of the circle (and sphere) are still moving faster than the innermost regions - exactly as they did when we were dealing with circular conveyor belts.</p>
<p>You guessed it; let’s ask the same question as before: Imagine you’re on the equator and you fire a bullet directly north. Will the bullet travel:</p>
<ol>
<li>straight up and pass directly over the north pole</li>
<li>upwards, but at an angle such that it will pass to the right of the north pole</li>
<li>upwards, but at an angle such that it will pass to the left of the north pole</li>
</ol>
<p>Make a guess and then play the animation below:</p>
<div id="globe-n-with-ball" class="canvas"></div>
<p>Again, the answer is 2. The bullet travels with a constant northern velocity as well as a constant eastward velocity (which it picked up from you as the source). This, combined with the spherical geometry of the globe, makes the bullet’s path look like a spiral that veers to the right of the north pole. Unlike in the circular converyor belt example, eventually the bullet does pass directly over the north pole. The simulation assumes it always travels north with a fixed velocity, and the north pole is the northernmost point, so it has to hit it eventually.</p>
<h2 id="the-observer-and-fictitious-forces">The Observer and “Fictitious Forces”</h2>
<p>Every animation so far has take the perspective of a birds-eye observer stationed high above the north pole. What if we instead visualized the globe example, but took the perspective of the observer who is firing the bullet. The important difference is that, according to the observer, they are not spinning in a circle around the globe. They are stationary.</p>
<p>What might that look like?</p>
<div id="globe-n-with-ball-observer" class="canvas"></div>
<p>From the reference frame of the (rotating) observer, it looks like the bullet starts traveling directly north but then oddly accelerates to the east. When viewed from the perspective of observer, this eastward acceleration is hard to explain. However, having come to this through a series of much more straightfoward examples, we now know that this eastward motion is just a consequence of the eastward velocity the bullet picked up from the observer who fired it (who we assumed was standing on the equator). There is no actual acceleration.</p>
<p>This sort of “unexplained motion” has a funny name - it’s called a “<a href="https://en.wikipedia.org/wiki/Fictitious_force">fictitious force</a>”. It’s also called a pseudo-force, an inertial force, and an apparent force. It’s the first time (that I can remember) that I’ve run into this concept. It applies when the motion of an object looks as if there’s some external force acting on it because it’s accelerating (e.g. changing directions in this case), however it’s actually not. The apparent force is coming from the fact that the observer is the one accelerating. In this case, the observer is spinning in a circle, constantly accelerating towards the center of the circle. The bullet is traveling in a straight line (albeit on a curved surface which makes it hard to see).</p>
<h2 id="the-coriolis-effect">The Coriolis Effect</h2>
<p>This finally brings us to the coriolis effect. We’ve actually already fully explained it, but to summarize:</p>
<p>The Coriolis effect describes the tendency for things (usually winds) to accelerate to the east when traveling north and to accelerate to the west when traveling south.</p>
<p>Or wait… actually that’s not quite right. That’s only true in the northern hemisphere. What’s different about the southern hemisphere? Let’s take a look:</p>
<p><img src=" /assets/by-post/the-coriolis-force/globe-n-and-s.jpg" class="canvas" style="height: 200px" /></p>
<div id="globe-s-with-ball" class="canvas"></div>
<p>The only difference is that when you view the globe from the south, it spins in the opposite direction. Working through all our examples while changing the direction of rotation will make it clear that in the southern hemisphere, the Coriolis effect is reversed, i.e.:</p>
<p>In the southern hemisphere, winds have a tendency to accelerate to the east when traveling <em>south</em> and accelerate to the west when traveling <em>north</em>.</p>
<h2 id="hurricanes">Hurricanes</h2>
<p>The canonical example of the Coriolis effect is that hurricane winds create a spiral in one direction in the northern hemisphere and the opposite direction of the southern hemisphere.</p>
<p>To understand this, you first have to know that the center of a hurricane is an area of low pressure. Winds generally want to move towards low pressure areas and so winds pulled towards the center of the hurricane from all directions.</p>
<p>How can we visualize this? Actually, we already have. In the globe examples above, we were treating the north pole as a the destination of the bullet. Instead, let’s imagine the north pole is the center of the hurricane and the object traveling towards it is air. If we assume that the air travels at a constant northern velocity (probably a bad assumption), our previous animations are exactly what we’re looking for:</p>
<p><img src=" /assets/by-post/the-coriolis-force/hurricanes.jpg" class="canvas" style="height: 267px" /></p>
<h3>Northern Hemisphere:</h3>
<div id="hurricane-n" class="canvas"></div>
<h3>Southern Hemisphere:</h3>
<div id="hurricane-s" class="canvas"></div>
<script src="/assets/js/p5/0.8.0/p5.js"></script>
<script src="
/assets/by-post/the-coriolis-force/sketches.js"></script>
<script>
function makeAnimation(id, sketchf) {
let canvas = document.getElementById(id);
let overlay = document.createElement('div');
overlay.id = 'overlay';
canvas.appendChild(overlay);
let img = document.createElement('img');
img.src = '/assets/by-post/the-coriolis-force/youtube.svg';
img.height = '96';
img.width = '96';
img.onclick = hide;
overlay.appendChild(img);
let state = { running: false };
let p = new p5((p) => sketchf(p, state, onStop), id);
function hide() {
p.reset();
state.running = true;
overlay.style.visibility = 'hidden';
}
function onStop() {
state.running = false;
overlay.style.visibility = 'visible';
}
}
makeAnimation('linear', (p, state, onStop) =>
sketchLinear(p, false, state, onStop)
);
makeAnimation('linear-with-ball', (p, state, onStop) =>
sketchLinear(p, true, state, onStop)
);
makeAnimation('circular', (p, state, onStop) =>
sketchCircle(p, false, state, onStop)
);
makeAnimation('circular-with-ball', (p, state, onStop) =>
sketchCircle(p, true, state, onStop)
);
makeAnimation('globe-n', (p, state, onStop) =>
sketchSphere(
p,
{ dw: 0.005, observer: false, displayBall: false },
state,
onStop
)
);
makeAnimation('globe-n-with-ball', (p, state, onStop) =>
sketchSphere(
p,
{ dw: 0.005, observer: false, displayBall: true },
state,
onStop
)
);
makeAnimation('globe-n-with-ball-observer', (p, state, onStop) =>
sketchSphere(
p,
{ dw: 0.005, observer: true, displayBall: true },
state,
onStop
)
);
makeAnimation('globe-s-with-ball', (p, state, onStop) =>
sketchSphere(
p,
{ dw: -0.005, observer: false, displayBall: true },
state,
onStop
)
);
makeAnimation('hurricane-n', (p, state, onStop) =>
sketchSphere(
p,
{ dw: 0.005, observer: false, displayBall: true },
state,
onStop
)
);
makeAnimation('hurricane-s', (p, state, onStop) =>
sketchSphere(
p,
{ dw: -0.005, observer: false, displayBall: true },
state,
onStop
)
);
</script>Fun fact: it’s not a force at all!Why was Special Relativity invented?2022-07-19T00:00:00+00:002022-07-19T00:00:00+00:00http://blog.russelldmatt.com/2022/07/19/why-was-special-relativity-invented<p>Before trying to understand special relativity, you should ask yourself the following question: <em>what problem does special relativity solve?</em> Put another way, <em>why was this theory invented in the first place?</em></p>
<p>Like many theories in physics, special relativity was invented to fit empirical data. In particular, experiments had shown that light did not behave like a particle, nor did it behave like a wave. Let’s take a closer look.</p>
<h3 id="does-light-behave-like-a-particle">Does light behave like a particle?</h3>
<p>Let’s say I can throw a baseball at a top speed of 50mph. If I’m standing on the pitcher’s mound and you’re standing on home plate and I throw the baseball towards you at my top speed, ignoring things like air resistance, the baseball will move towards you at a speed of 50mph.</p>
<p>How might I increase the baseball’s speed relative to you? One way would be to stand on the back of a pickup truck that was moving towards you at a speed of 30mph. If I throw the baseball at you now, the baseball will move away from me at my top speed (50mph) but towards you at a speed of 80mph. The speed of the pickup truck and the baseball add together to produce a higher speed relative to you.</p>
<p>But there’s nothing special about a baseball in this example. All particle-like things act this way. I could change the example such that I’m now firing a bullet at you. The numbers would change, but the general point would stay the same: the bullet would travel faster, relative to you, when I fire it from the back of a moving pickup truck.</p>
<p>How about light? Instead of firing a bullet, let’s say I “fired” a photon (a particle of light) towards you by turning on a flashlight in your direction. Classical Newtonian mechanics would suggest the same result: the photon would travel faster, relative to you, when fired from the back of a moving pickup truck.</p>
<p>However, this is where the behavior of light deviates from the behavior of other particles. Experiments in the late nineteenth and early twentieth centuries strongly indicated that this effect does <em>not</em> occur with light. One such experiment was made by Russian astronomers in 1955 who measured the speed of light from two opposite sides of the rotating sun. One side is rotating towards us while the other is rotating away from us, so one would expect to measure different light speeds from each side. However, it was found that the speed of light was the same from both sides.</p>
<p>In short, the speed of light did not seem to depend on the motion of its source.</p>
<h3 id="does-light-behave-like-a-wave">Does light behave like a wave?</h3>
<p>Now, before you go off and think <em>how strange</em> (as I did), let me try to convince you that this is, in fact, not strange at all when it comes to waves! Maybe there is nothing mysterious about light; maybe it just behaves like all other waves.</p>
<p>Take sound waves as an example. The speed of a sound wave does not depend on the speed of the source of the sound. In fact, all sound waves travel at roughly the same speed<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>.</p>
<p>Why doesn’t it depend on the speed of the source? I don’t really know but… <em><digression></em> I think it has something to do with the fact that - in a wave - no single physical object is traveling with the wave. Instead, the wave is causing very local perturbations in the particules that make up whatever medium it’s traveling in (e.g. water for water waves, air for sound waves). Consider a single water wave propagating to the right, and specifically think about a water molecule that is currently on the crest of that wave. Now consider that same particle one second later. How far to the right has it moved? Not at all. It has moved down, not to the right (even though the wave crest is moving right). Not all waves work this way. Transverse waves cause the medium to move perpendicular to the direction of the wave; longitudinal waves cause the medium to move parallel to the wave. But in either case, particules are moving back and forth in a relatively small region; they are not tracking the motion of, say, the crest of the wave. <em></digression></em></p>
<p>I bet you already knew that waves don’t behave like particles, even if you hadn’t thought too deeply about it. Have you ever heard of a “sonic boom”? That occurs when an aircraft travels faster than the sound waves that it is producing. This sort of thing makes no sense when talking about the motion of particles. If I throw a baseball from a vehicle moving at a constant velocity, I’m not going to somehow pass that baseball. The speed is, by definition, my speed <em>plus</em> whatever speed I can throw a baseball. But supersonic aircrafts do in fact pass the sound waves that they themselves produce. Waves are weird.</p>
<p>So we’ve established that the speed of waves are independent of the speed of their source. Experiments suggested that the speed of light is also independent of the motion of its source. So maybe light is just a wave?</p>
<p>Sure, let’s continue thinking about light as a wave. If it’s a wave, then it travels by disturbing some medium. Let’s call this medium the “<a href="https://en.wikipedia.org/wiki/Luminiferous_aether">luminiferous aether</a>” as they did in the nineteenth century, or “ether” for short. We’ve established that the speed of the wave doesn’t depend on the speed of the source, but does the speed of the wave <em>relative to an observer</em> depend on the speed of the observer <em>relative to the ether</em>? In most waves, it does.</p>
<p>Here’s an example. Consider a sound wave that travels through the air at some constant velocity \(v\). The speed of the sound wave <em>relative to me</em> is also \(v\) if I’m standing still (or more precisely, if I’m not moving relative to the air). But it’s more than \(v\) if I run through the air towards the oncoming sound wave, and less than \(v\) if I run through the air away from the sound wave.</p>
<p>Is the same true of light? Does the speed of light depend on the motion of the observer? Put another way, does it depend on the motion of the observer relative to the ether (the suggeted medium for light)?</p>
<p>Enter the <a href="https://en.wikipedia.org/wiki/Michelson%E2%80%93Morley_experiment">Michelson–Morley_experiment</a>. If light moves at a constant speed through the ether and we move at different speeds relative to the ether, then light should move at different speeds relative to us! So they measured the speed of light (relative to us) in many different directions hoping that they would get different answers. From this, they would be able to infer our motion relative to the ether. Alas, as you can probably predict, the experiment found no detectable differences in the speed of light in any direction. This was the first strong evidence against the existence of a light ether.</p>
<h3 id="the-real-problem">The real problem</h3>
<p>This brings us to the real problem that special relativity is designed to solve. Not only does the speed of light not depend on the speed of the light source, but it also doesn’t depend on the speed of the observer. Now we are in truly unfamiliar territory. Particles nor waves behave in this manner.</p>
<p>The strangeness can be described as follows. Let’s say there are three things A, B, and C, arranged in that order from left to right, and they are all moving at constant speed to the right. A is moving the fastest, B the second fastest, and C the slowest. A will pass B at some speed, relative to B. A must then pass C at a <em>different</em> speed, relative to C, right? Since B and C are travelling at different speeds, then A must pass B and C at different (relative) speeds, right?</p>
<p>Our every day experience suggests yes, but that’s not how light works. If A was a light beam, it would somehow pass B at a speed of \(c\), relative to B, and then also pass C at a speed of \(c\), relative to C. Again, this is assuming constant speeds for all three objects. This defies intution.</p>
<p>Let’s consider one last example to drive the aburdity home. There’s a race between a sports car, a human, and a tree. Since, generally speaking, sports cars are faster than humans who are in turn faster than trees, we give them different starting points for this race. The race begins! The race car quickly acheives its top speed of 200mph. The human quickly acheives their top speed of 10mph. The tree doesn’t move. Before long, the race car passes the human. What is the race car’s speed <em>relative to the human</em>? 190mph. Later, it passes the tree. What is its speed <em>relative to the tree</em>? 200mph. It has to be different because the human and the tree aren’t moving at the same speed. Well, not so for light beams. Replace the race car with a beam of light and the answer to both questions is \(c\).</p>
<p>Something has to give.</p>
<h3 id="special-relativity">Special Relativity</h3>
<p>Einstein’s theory of special relativity assumes the following two postulates to be true and follows them through to their (fairly absurd) logical conclusions:</p>
<ol>
<li>There is no way to tell whether an object is at rest or in uniform motion relative to a fixed ether. <em>Translation: light is unlike other waves in that there is no detectable medium.</em></li>
<li>Regardless of the motion of its source, light always moves through empty space with the same constant speed. <em>Translation: light is unlike particles in that its velocity does not add with the velocity of its source.</em></li>
</ol>
<p>Other physicists had considered these two postulates, but most thought they violated common sense so much that they preferred to believe that one of them must be wrong. Einstein accepted them and, from them, deduced that there must be no meaning to the concepts of absolute length or time.</p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>This isn’t strictly true. It depends on certain properties of the medium, which is air in this case. Temperature, pressure, humidity, etc. can affect the speed of a sound wave. But the important point remains, which is that the speed of the source is not a factor. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>Before trying to understand special relativity, you should ask yourself the following question: what problem does special relativity solve? Put another way, why was this theory invented in the first place?Why do things sink or float?2022-07-10T00:00:00+00:002022-07-10T00:00:00+00:00http://blog.russelldmatt.com/2022/07/10/why-do-things-float<style>
img {
width: 300px;
max-width: 100%;
box-shadow: 5px 5px 5px grey;
border: 1px solid grey;
display: block;
margin: 30px auto 30px;
}
</style>
<p>Because they’re more or less dense than water. QED. Any more questions?</p>
<p>Yeah, sure… of course that’s true, but <em>why</em>? Why do denser things sink while less dense things float?</p>
<p>That’s a harder question. Let’s start with a common explanation (that’s pretty much correct, by the way) and then try to poke some holes in it.</p>
<h2 id="common-explanation">Common explanation</h2>
<p>Consider a vertical column of water. Now consider a horizontal plane \(p_1\) cutting through the column of water at a depth of \(h_1\). What are the forces acting on this horizontal plane?</p>
<p><img src=" /assets/by-post/why-do-things-float/1.jpg" /></p>
<p>Well, all the water sitting above the plane is pushing down on it. With what force? The force is just the weight of the water, so \(F = mg\) where \(m\) is the mass of all the water above the plane.</p>
<p>Are there any opposing forces? Yes. The water below the plane is pushing back (up) with an equal and opposite force. In a sense, it’s holding the water up. This would be more obvious if we were talking about solids, but it’s still true for liquids.</p>
<p><img src=" /assets/by-post/why-do-things-float/2.jpg" /></p>
<p>Ok, now consider the exact same scenario except position the horizontal plane at a greater depth \(h_2 > h_1\). Let’s call this (deeper) plane \(p_2\). What are the forces on \(p_2\)?</p>
<p>It’s the same story as before - the force pushing down is \(F = mg\). The only difference is that \(m\) is greater since there is more water (with more mass) above the plane.</p>
<p>Similarly, the water below the plane is pushing back up with the same, greater force.</p>
<p><img src=" /assets/by-post/why-do-things-float/3.jpg" /></p>
<p>Now imagine the water that’s sitting between \(p_1\) and \(p_2\). We’ve just established that there’s a greater force pushing up on \(p_2\) (at a depth of \(h_2\)) than pushing down on \(p_1\) (at a depth of \(h_1\)). So if there’s a net force up on that region of water, why isn’t it moving up? Because the net force up perfectly balances with the force of gravity pulling that region/volume of water down. A net force of zero means it doesn’t move up or down.</p>
<p>For the last step of this explanation, let’s imagine replacing all the water in the column between \(p_1\) and \(p_2\) with an empty box with zero mass. Now what happens? There is still a net force up on that region of water but - since the box has no mass - there is no balancing gravitational force. In this case there really is a net force pushing up and so the box starting floating up towards the surface.</p>
<p>This argument doesn’t rely on the box having literally zero mass. Anything with a mass that is less than the net force pushing up will float up. Since the net force up exactly equals the gravitation force on that volume of water, anything that has less mass per unit volume than water will float up. In other worse, anything that’s less dense than water will float. QED.</p>
<h2 id="whats-the-problem">What’s the problem?</h2>
<p>The problem is that I can make the exact same argument about solids. Instead of imagining a column of water with three discrete regions (the water above \(p_1\), between \(p_1\) and \(p_2\), and below \(p_2\)), imagine three blocks of iron stacked on top of each other. Let’s go through the argument again.</p>
<p><img src=" /assets/by-post/why-do-things-float/4.jpg" /></p>
<p>\(p_1\) is the plane that separates the top block \(b_1\) from the second block \(b_2\). What forces are acting on that plane? Same as before - the block \(b_1\) is pushing down with a gravitational force \(F = m_{b_1}g\). The block \(b_2\) is pushing back up with an equal and opposite force.</p>
<p>What are the forces on \(p_2\)? The force pushing down is the gravitational force of block \(b_1\) and \(b_2\) combined: \(F = (m_{b_1} + m_{b_2})g\). The block \(b_3\) must be pushing back up with an equal and opposite force.</p>
<p>If \(b_3\) is pushing up on \(b_2\) with a force of \((m_{b_1} + m_{b_2})g\) and \(b_1\) is pushing down on \(b_2\) with a force of \(m_{b_1}g\), then there is a net force up of \(m_{b_2}g\). Why isn’t \(b_2\) moving up then? Because the net force is exactly equal to the gravitation force on \(b_2\) itself: \(F = m_{b_2}g\).</p>
<p>Now replace block \(b_2\) with a equally sides block of wood that weighs less than iron. Let’s call this block \(w_2\) (\(w\) for wood). The force pushing up on the wood block is greater than the force pushing down on it. Furthermore, the net force up is <em>greater than</em> the gravitation force on the wood block (\(m_{b_2}g > m_{w_2}g\)). So what will happen? The wood block will start floating up!</p>
<p>Oh wait… except it won’t. Common sense tells us that if you stack three solid blocks on top of each other, the middle block will definitely not start floating in the air - even if the middle block weights less than the other two blocks.</p>
<h2 id="so-what-went-wrong">So what went wrong?</h2>
<p>The argument does not rely on the ability of the medium to <em>flow</em>. But clearly this is an important property when it comes to whether something will actually be able to sink or float in real life. Things can float or sink in liquids and gases, but not solids. Liquids and gases can flow; solids cannot.</p>
<p>We can also approach “what went wrong” from a different angle. In the case with three solids, one step of the argument is just false. When you replace the second block of iron with a wood block, the force pushing up on the wood block from below will decrease. Why? The force pushing up on the wood block is exactly equal to the gravitation force of the top two blocks pushing down. When you replace the iron middle block \(b_2\) with a wooden one, the total mass of the top two blocks decreases, which in turn decrease the force with which they push down, which in turn decreases the force with which the bottom block pushes up. Everything changes to quickly reach a new equilibrium where all forces balance and nothing moves.</p>
<p>So why doesn’t that happen with water? I think in some cases it can, but usually it doesn’t. Why not?</p>
<p>Honestly I’m not 100% sure. Any argument I put forth will very likely have flaws at least as serious as the argument I’m currently poking holes in. But if you want me to give it a hand-wavy attempt, here goes:</p>
<p>Liquids have a tendency to equalize pressure in a way that solids don’t. I think this relates to their ability to flow.</p>
<p>Consider two stacks of solid blocks side by side. One stack contains iron blocks and the other wood blocks. The pressure in the bottom iron block will be much greater than the pressure in the bottom wooden block. There is no transfer of pressure between the solid blocks.</p>
<p>Now consider two columns of water side by side (and water can flow between these two columns). I think it would be difficult to create a situation where the pressure at the bottom of one column was very different than the pressure at the bottom of the other columns. Water would quickly flow from the high pressure columns to the low pressure column to equalize the pressures.</p>
<h2 id="fixing-the-argument">Fixing the argument</h2>
<p>Again, I’d like to caveat this section by saying that <em>I’m not sure</em> how to fix the argument. This is just my best guess.</p>
<p>Let’s say we start with a large pool of water in equilibrium. Then, we magically snap our fingers and replace some small region of water with an empty massless box. What happens?</p>
<p>The water below the box was previously at whatever pressure was necessary to hold up all the water above it. Now, however, it has less mass weighing down on it than before. So, at time zero, it’s pushing back “too hard” and it will move the column of water up just a tiny bit as it expands (and reduces its internal pressure).</p>
<p>However, now we’re in a strange state where there is a region of water (under the empty box) which has a lower pressure than the water directly to the left and right of it. Remember, the water directly to the left and right is still holding up just as much water as before we snapped our fingers and added the empty box. So what happens?</p>
<p>Since the water in the region below the box has lower pressure it’s pushing less hard than it used to on the water to its left and right. The water to the left and right is pushing just as hard as before. Therefore, the water to the left and right will expand just enough to equalize the pressure. But now it has a lower pressure than the water around it!</p>
<p>And this process continues on and on until everything is in rough equilibrium again and… long story short, the water below the empty box is pushing up on it <em>roughly</em> as hard as it was before, causing a net up force and moving the empty box up.</p>
<h2 id="reflections">Reflections</h2>
<p>We started with an argument that made a decent amount of sense, but then realized that it must be wrong because it implied that solid objects would float. One way to realize that something was wrong with the argument was to see that it didn’t rely - in any way - on the substance’s ability to flow, and we know that’s required.</p>
<p>So am I trying to say the starting argument is “bad”? No, actually. It’s just simplified. It explained a lot, but it ignored some aspects of the problem. But what explanation doesn’t do that? Even after adding a reliance on the ability of a liquid to flow, I’m <em>sure</em> there’s still something wrong or unrealistic with the new explanation.</p>
<p>Really, I just thought this was an interesting problem to being with. I honestly did want to understand what made something float or sink. Furthermore, I thought the process of realizing that something must be wrong with the initial explanation by realizing that it applied perfectly well to solids was also interesting. It reminded me of the way proofs work in math. You have some argument, but then realize it implies things that are false, and you need to either abandon or update the argument.</p>
<p>Hope you found it interesting too!</p>Why do astronauts float?2022-04-26T00:00:00+00:002022-04-26T00:00:00+00:00http://blog.russelldmatt.com/2022/04/26/why-do-astronauts-float<p>Consider the astronauts in the International Space Station (ISS),
which is a giant spaceship orbiting earth. They aren’t walking around
with their feet on the floor like you or me on earth - they’re
floating. Why? Why are they floating and we aren’t?</p>
<p>As usual, I recommend trying to think about this for yourself before
reading on; the explanation will be much more interesting that way.</p>
<p>I’ve asked a few people this question and so, instead of going
straight to the answer, let’s follow a common train of thought.</p>
<p><em>Maybe it’s because there isn’t any gravity up there? Ok, maybe not
no gravity, but very little. It’s like when Neil Armstrong took a
step on the moon and seemed to float for a while before coming back
down to the moon’s surface.</em></p>
<p>Good thought, but no. The earth’s radius is about 4,000 miles. The
ISS is only about 250 miles above the earth’s surface. That’s
nothing! If you took a huge step back and looked at the earth from
afar, you’d barely be able to tell the difference between us on the
earth’s surface and the astronauts in the ISS.</p>
<p>If you want to get numerical, here’s the formula for gravity:</p>
\[F = G \frac{m_1 m_2}{r^2}\]
<p>So how much less gravity do you feel on the ISS than on the earth’s
surface? Well, the only difference is \(r\). \(4000^2/4250^2\) is
about 0.88. So, the force of gravity is noticeably less on the ISS,
but only by about 12%. That doesn’t account for the difference
between walking on the ground and floating through space.</p>
<p><em>Does it have something to do with the fact that they’re orbiting?
Like, when you’re in orbit, gravity is pulling you towards the earth
with just the right acceleration (relative to your tangential
velocity) to keep you at exactly the same distance from earth. Am I
on the right track?</em></p>
<p>Definitely on the right track, although here’s a hint. Even if they
weren’t in orbit, I still think they’d be floating. Let’s say their
tangential velocity wasn’t fast enough to keep them perfectly in
orbit, so instead they were spiraling in towards earth, I still think
they’d be floating. Or go the other way - say their tangential
velocity was too fast and they were spiraling away from earth. I
still think they’re be floating.</p>
<p><em>Oh… it’s like skydiving then? You’re falling? But if feels like floating?</em></p>
<p><strong>Yes, exactly! The astronauts are in “free fall”</strong>, meaning that the
only force acting upon them is gravity. Why doesn’t the space station
exert any force on them? Because the space station is also in free
fall, and it’s falling at exactly the same rate!</p>
<p>Something important to remember is that gravity will accelerate two
objects at the same rate, regardless of their mass. This is because
gravity exerts a force proportional to mass, but - for a given force -
acceleration is inversely proportional to mass. Those two effects end
up canceling out so that acceleration due to gravity does not depend
on mass. Probably easier to see it with symbols than with words:</p>
\[\begin{align}
F &= G \frac{m_1 m_{earth}}{r^2} \\
F &= m_1 a \\
G \frac{m_1 m_{earth}}{r^2} &= m_1 a \\
G \frac{m_{earth}}{r^2} &= a
\end{align}\]
<p>So, acceleration (of an object) depends on the mass of the earth, but
not the mass of the object.</p>
<p>Where were we? Right, so both the astronaut and the spaceship and
everything else around them are falling towards the earth at exactly
the same rate, which means they don’t exert any forces on each other.</p>
<p>Here’s an analogy: What if you and a friend jumped off a cliff
together? You’d both “float” right next to each other (while
careening to your doom). Air resistance would make that example feel
less serene than it does for astronauts, but the analogy mostly holds.
Let’s say that instead of jumping, you and your friend were in the back of a
large van (not strapped in) as it drove off the cliff. Then you both
would be floating (inside the van), much like two astronauts inside
the space station.</p>
<p><em>Cool! But if being in orbit isn’t required, does that mean that astronauts are always floating?</em></p>
<p>Yes, anytime the spaceship that they’re on doesn’t have its engines
turned on. If the spaceship turns on its engines, that will
accelerate the spaceship. Assuming you were previously floating
inside it, eventually a spaceship wall will “bump into you” and exert
a force on you to keep your acceleration in line with the ship’s.</p>Consider the astronauts in the International Space Station (ISS), which is a giant spaceship orbiting earth. They aren’t walking around with their feet on the floor like you or me on earth - they’re floating. Why? Why are they floating and we aren’t?The Two Envelopes Problem2021-10-22T00:00:00+00:002021-10-22T00:00:00+00:00http://blog.russelldmatt.com/2021/10/22/two-envelopes-problem<p>If you haven’t already encountered the famous paradoxical “two
envelopes problem”, then I highly suggest you consider the following
prompt and return to this article much later. Without struggling with
the problem yourself first, the “resolution” below won’t seem nearly
as satisfying.</p>
<p>From <a href="https://en.wikipedia.org/wiki/Two_envelopes_problem">Wikipedia</a>:</p>
<blockquote>
<p>You are given two indistinguishable envelopes, each containing money. One contains twice as much as the other. You may pick one envelope and keep the money it contains. Having chosen an envelope at will, but before inspecting it, you are given the chance to switch envelopes. Should you switch?</p>
</blockquote>
<p>It seems that you can make compelling arguments for “always switch” and “it doesn’t matter whether or not you switch”.</p>
<p><strong>Always switch:</strong> Let’s denote the amount of money in the envelope that you chose as \(X\). The other envelope contains either \(2X\) or \(\frac{1}{2}X\). If we think these outcomes are equally likely, then the expected value of switching is \(\frac{1}{2}(2X) + \frac{1}{2}(\frac{1}{2}X) = \frac{5}{4}X\). In other words, if you switch then, in expectancy, you end up with more than you started. So always switch!</p>
<p><strong>It doesn’t matter:</strong> But that seems crazy! The problem is completely symmetric: you’re presented with two envelopes and you chose one at random. Why would it make any sense to switch when you could have just as easily randomly chosen the other envelope? Furthermore, if you do switch and then are presented with the option to switch again, doesn’t the same logic apply? But switching twice is the same as not switching so… that can’t be right. Common sense (and symmetry) strongly suggests that switching can’t matter.</p>
<p>One common objection to the argument for <strong>always switch</strong> above is
that we assumed that getting \(\frac{1}{2}X\) and \(2X\) were equally
likely, but that doesn’t make a lot of sense. If the probability of
the other enveloping having \(\frac{1}{2}X\) or \(2X\) were the same,
that implies the following two states are equally likely: the two
envelopes have \(\frac{1}{2}X\) and \(X\) in them, or the two
envelopes have \(X\) and \(2X\) in them. We will denote these pairs
of amounts as \((\frac{1}{2}X, X)\) and \((X, 2X)\).</p>
<p>But then we can apply the same logic to the case where our chosen
envelope has \(2X\) in it and we conclude that the pairs \((X, 2X)\)
and \((2X, 4X)\) must also be equally likely. And so on for \((2X,
4X)\) and \((4X, 8X)\), \((4X, 8X)\) and \((8X, 16X)\), and so on. We
can also apply this logic to smaller and smaller pairs of amounts,
e.g. \((\frac{1}{4}X, \frac{1}{2}X)\) and \((\frac{1}{2}X, X)\),
\((\frac{1}{8}X, \frac{1}{4}X)\) and \((\frac{1}{4}X, \frac{1}{2}X)\),
etc. In effect, you end up with an infinite number of equally likely
possibilities, which is an <a href="https://en.wikipedia.org/wiki/Prior_probability#Improper_priors">improper prior
distribution</a>.
We need the sum of the probabilities of our possible pairs of amounts to equal
\(1\), but when we sum the probabilities of this weird improper
distribution, we effectively get \(\infty \cdot \frac{1}{\infty}\),
which is not well defined.</p>
<p>Furthermore, the problem didn’t actually say that getting \(\frac{1}{2}X\) and \(2X\) were equally likely, so let’s dispense with that assumption.</p>
<p>To make this discussion more rigorous and less vague, let’s consider a new problem that has the same paradoxical properties as the origional one, but in which we know the exact distribution of the outcomes. To give credit where credit is due, everything below is due to the following excellent youtube video: <a href="https://www.youtube.com/watch?v=_NGPncypY68">https://www.youtube.com/watch?v=_NGPncypY68</a>.</p>
<hr />
<p><br /></p>
<h3 id="a-more-well-specified-problem">A more well-specified problem</h3>
<p>Below is a table of all possible states that the two envelopes (A and B) can be in, along with the probability of being in that state:</p>
<table>
<thead>
<tr>
<th>State</th>
<th>Probability</th>
<th>Envelope A</th>
<th>Envelope B</th>
</tr>
</thead>
<tbody>
<tr>
<td>\(S_1\)</td>
<td>1/2</td>
<td>$1</td>
<td>$10</td>
</tr>
<tr>
<td>\(S_2\)</td>
<td>1/4</td>
<td>$10</td>
<td>$100</td>
</tr>
<tr>
<td>\(S_3\)</td>
<td>1/8</td>
<td>$100</td>
<td>$1,000</td>
</tr>
<tr>
<td>…</td>
<td> </td>
<td> </td>
<td> </td>
</tr>
<tr>
<td>\(S_n\)</td>
<td>\(1/2^n\)</td>
<td>\(10^{n-1}\)</td>
<td>\(10^n\)</td>
</tr>
</tbody>
</table>
<p>To be clear, there are infinitely many states (not just \(n\) of them). Given that, the probabilities sum to \(1\), as they should. As in the original problem, you choose an envelope at random - meaning you’ll choose Envelope A with a 50% chance and Envelope B with a 50% chance, but you won’t know which one you’ve chosen. After picking an envelope, but before inspecting it, you’re given the option to switch. Should you?</p>
<p>To decide whether or not we should switch envelopes, let’s compute the expected value of switching. I’m going to use the <a href="https://en.wikipedia.org/wiki/Law_of_total_expectation">“law of total expectation”</a>, which is a fancy way of saying that I’ll compute \(E[switching]\) by:</p>
\[E[switching] = \sum\limits_{Y=y} E[switching | Y = y] \cdot P[Y = y]\]
<p>Where \(Y\) is some other random variable. I’m not saying concretely what \(Y\) is because I’m going to compute this three different ways using three different \(Y\)s.</p>
<h3 id="1-y-is-which-state-were-in-s_1-s_2-s_3--"><strong>1. \(Y\) is which state we’re in (\(S_1\), \(S_2\), \(S_3\), … )</strong></h3>
<p>To start, let’s compute the E[switching] by conditioning on which state we’re in.</p>
\[\begin{align}
E[switching] &= E[switching | state = S_1] \cdot P[state = S_1] \\
&+ E[switching | state = S_2] \cdot P[state = S_2] \\
&+ E[switching | state = S_3] \cdot P[state = S_3] \\
...
\end{align}\]
<p>Now we just need to compute each term:</p>
\[\begin{align}
E[switching | state = S_1] &= 1/2 (+9) + 1/2 (-9) &= 0 \\
E[switching | state = S_2] &= 1/2 (+90) + 1/2 (-90) &= 0 \\
E[switching | state = S_3] &= 1/2 (+900) + 1/2 (-900) &= 0 \\
\end{align}\]
<p>… you get the picture. Every term = 0, so <strong>E[switching] is clearly = 0.</strong></p>
<h3 id="2-y-is-the-value-in-the-envelope-we-picked"><strong>2. \(Y\) is the value in the envelope we picked</strong></h3>
<p>We’ll do exactly what we did before, but instead of conditioning on which state we’re in, let’s condition on the value of the envelope we picked (note: we don’t know this value, but it must have some value, right?):</p>
\[\begin{align}
E[switching] &= E[switching | picked = \$1] \cdot P[picked = \$1] \\
&+ E[switching | picked = \$10] \cdot P[picked = \$10] \\
&+ E[switching | picked = \$100] \cdot P[picked = \$100] \\
...
\end{align}\]
<p>Now we compute the terms:</p>
\[\begin{align}
E[switching | picked = \$1] &= +9 &> 0 \\
E[switching | picked = \$10] &= 2/3 (-9) + 1/3 (+90) &> 0 \\
E[switching | picked = \$100] &= 2/3 (-90) + 1/3 (+900) &> 0 \\
\end{align}\]
<p>… you get the picture. Every term > 0 (and we multiply each term by some positive probability), so <strong>E[switching] is clearly > 0.</strong></p>
<h3 id="3-y-is-the-value-in-the--other--envelope"><strong>3. \(Y\) is the value in the <em>other</em> envelope</strong></h3>
<p>We’ll do exactly what we did before, but instead of conditioning on the value in the envelope that we picked, we’ll condition on the value of the envelope we <em>didn’t</em> pick (which I’m calling “other”):</p>
\[\begin{align}
E[switching] &= E[switching | other = \$1] \cdot P[other = \$1] \\
&+ E[switching | other = \$10] \cdot P[other = \$10] \\
&+ E[switching | other = \$100] \cdot P[other = \$100] \\
...
\end{align}\]
<p>Now we compute the terms:</p>
\[\begin{align}
E[switching | other = \$1] &= -9 &< 0 \\
E[switching | other = \$10] &= 2/3 (+9) + 1/3 (-90) &< 0 \\
E[switching | other = \$100] &= 2/3 (+90) + 1/3 (-900) &< 0 \\
\end{align}\]
<p>… you get the picture. Every term < 0 (and we multiply each term by some positive probability), so <strong>E[switching] is clearly < 0.</strong></p>
<h2 id="what-gives">What gives!?</h2>
<p>According to the video (and I don’t know how much I should trust this random video), here’s what gives: The random variable “profit of switching envelopes” has no expected value. It’s not zero, it’s not positive infinity, and it’s not negative infinity. It’s simply not defined. This also explains why using the “law of total expectation” breaks down. As the Wikipedia article states, you can only use the law of total expectation on a random variable \(X\) if \(E[X]\) is defined. Here is a link to the video at the moment that he explains the resolution: <a href="https://youtu.be/_NGPncypY68?t=1213">https://youtu.be/_NGPncypY68?t=1213</a></p>
<p>When we compute the expected value, we’re summing up an infinite number of terms. In this case, <strong>the order in which we sum the terms matters</strong>. This is a very unusual property. This property occurs when the sum of all the positive terms in the series is +infinity and the sum of all the negative terms is -infinity. In those cases, you can rearrange the order of summation and get completely different results. Since no one summation order is “more correct” than another, this infinite series has no well-defined sum.</p>
<p>In case it’s not obvious, notice that in the three versions of E[switching] above, the only difference is the way in which we ordered and then grouped the terms of the sum. Here’s a picture to make it more clear:</p>
<p><img src=" /assets/by-post/two-envelopes-problem/two-envelopes.jpg" style="max-width: calc(min(500px, 100vw))" /></p>
<p>The thing I like about this explanation is that it not only resolves the paradox, but it also shows why you can make convincing arguments for either strategy (switch or don’t switch).</p>
<h2 id="remaining-dissonance">Remaining dissonance</h2>
<p>Although I’m quite pleased with the resolution above, I’m still a bit unsettled by the fact that I would still answer “yes” to the following two questions:</p>
<p>If we changed the “well-specified problem” to say that you could open <em>the envelope you chose</em> before deciding whether or not to switch, would you conclude that switching is <strong>better</strong> no matter what you saw in your envelope?</p>
<p>If we changed the “well-specified problem” to say that you could open <em>the other envelope</em> before deciding whether or not to switch, would you conclude that switching is <strong>worse</strong> no matter what you saw in your envelope?</p>If you haven’t already encountered the famous paradoxical “two envelopes problem”, then I highly suggest you consider the following prompt and return to this article much later. Without struggling with the problem yourself first, the “resolution” below won’t seem nearly as satisfying.The Joy of Discovering Math2021-10-21T00:00:00+00:002021-10-21T00:00:00+00:00http://blog.russelldmatt.com/2021/10/21/the-joy-of-discovering-math<p>Try to discover things for yourself. Let yourself struggle - <em>really struggle</em> - before seeing the answer. Whether you successfully solve the problem yourself or not, the solution will be so much more satisfying.</p>
<p>Below is a passage from Knuth’s book <a href="https://smile.amazon.com/Surreal-Numbers-Donald-Knuth/dp/0201038129">Surreal Numbers</a>. It’s the most insightful and self-aware description of the asymmetry between discovering math for yourself and “being taught” math that I’ve ever seen:</p>
<div style="display:flex; flex-direction: column; align-items:center">
<img src="
/assets/by-post/the-joy-of-discovering-math/surreal1.png" style="max-width: 500px" />
<img src="
/assets/by-post/the-joy-of-discovering-math/surreal2.png" style="max-width: 500px" />
<img src="
/assets/by-post/the-joy-of-discovering-math/surreal3.png" style="max-width: 500px" />
</div>
<p>I noticed that the advice embedded in the passage above aligns almost perfectly with how Po-Shen Loh describes how he taught himself math in the following clip:</p>
<div style="display:flex; flex-direction: column; align-items:center">
<iframe width="560" height="315" src="https://www.youtube.com/embed/vpVRQuBWctQ?start=89" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen=""></iframe>
</div>Try to discover things for yourself. Let yourself struggle - really struggle - before seeing the answer. Whether you successfully solve the problem yourself or not, the solution will be so much more satisfying.G13. Godel’s Proof (sketch)2021-09-26T00:00:00+00:002021-09-26T00:00:00+00:00http://blog.russelldmatt.com/2021/09/26/g13-godels-proof-sketch<style> .ul { white-space:nowrap; } </style>
<p>In this post, we present a sketch of Godel’s proof of his first
incompleteness theorem. As the word sketch suggests, we will lay out
the broad strokes of the proof without filling in many of the details.
I personally find this level of description best for giving me an
intuitive feel for the proof as a whole. Granted, that may only be
true because I’ve already spent many hours pouring over the details
and so I’m not troubled by the lack of rigor. Either way, I hope this
will give you a high level understanding of how the proof goes. At
the end, I will present links to further reading where you can fill in
the details for yourself.</p>
<p>To start, let’s fix on a particular formal system, which we will call
\(PM\). In the end, we will show that Godel’s proof applies to any
sufficiently strong formal system, but we can leave generalizations
for later.</p>
<p>First, Godel came up with an encoding scheme that can associate a
unique number to any formula within \(PM\). In a similar way, he was able
to associate a unique number with any proof (or derivation) within \(PM\).
A formula’s number is sometimes called its “Godel number” (abbreviated
as g.n.) and a proof’s number is called its “super g.n”.</p>
<p>Next, Godel constructed the following formula (with Godel number 42<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>):</p>
<p><strong>Formula 42</strong>: \(\lnot \exists m. \mathrm{Proof}(m, 42)\)</p>
<p>\(\mathrm{Proof}(m,n)\) is a relation between two numbers that is true
iff \(m\) encodes a \(PM\) derivation (proof) of formula with number \(n\).
Take it on faith, for now, that such a relation can be expressed
within \(PM\).</p>
<p>If we intepret formula 42, it says the following:</p>
<blockquote>
<p><em>It is not true that there exists a number m which encodes a \(PM\) proof
of formula 42</em>. In other words, <em>formula 42 is not provable within
\(PM\)</em>.</p>
</blockquote>
<p>The problems that follow from formula 42 are probably somewhat
self-evident, but for the sake of clarity let’s spell them out.</p>
<h2 id="the-semantic-argument">The semantic argument</h2>
<p>Say we found a proof of formula 42 within \(PM\). Then formula 42, when
interpreted, would make a false statement. It claims that no proof
exists, and yet we found one. This is bad; we just proved a false
statement within \(PM\). This means that \(PM\) is not sound, since a sound
system can only derive true statements. To summarize, if we can prove
formula 42, then \(PM\) is not sound. Contrapositively, if \(PM\) is sound,
then \(PM\) cannot prove formula 42.</p>
<p>Let’s assume \(PM\) is sound and therefore it cannot prove formula 42.
Now formula 42, when interpreted, makes a true statement! It claims
that no proof exists, and indeed we cannot find one. The trouble in
this case is that we’ve found a true statement that we cannot prove.
We cannot prove a true statement, and therefore \(PM\) is incomplete.</p>
<p>That is not strictly true. What if we could prove the negation of
formula 42? Wouldn’t that undermine the claim that \(PM\) is incomplete?
After all, incompleteness just means you can derive either \(\varphi\)
or \(\lnot \varphi\), for any formula \(\varphi\).</p>
<p>We can show, using similar arguments, that we cannot prove the
negation of formula 42. We have already shown that if \(PM\) is sound,
then it cannot prove formula 42, which means that formula 42 is true.
That implies that the negation of formula 42 is false. A sound formal
system can only prove true statements, so again - assuming \(PM\) is
sound - it cannot prove the negation of formula 42 either.</p>
<p>This completes a sketch of what’s called <strong>the semantic argument</strong>.
It says that if a formal system is sound and sufficiently expressive,
then it is incomplete. Of course we’ve left out all the details. We
haven’t shown why a sufficiently expressive formal system can indeed
express the \(\mathrm{Proof}\) relation. Even after that, it takes
considerable mental acrobatics to construct formula 42 such that its
own Godel number happens to be the same number for which it claims
there is no proof. Lastly, we have not shown Godel’s numbering
scheme, although that part is relatively straightforward.</p>
<p>Notice that the semantic argument derives incompleteness from a <em>sound
and sufficiently expressive</em> formal system, whereas the Godel’s
incompleteness theorem claims that <em>consistent and sufficiently
strong</em> formal systems are incomplete. The argument that derives
incompleteness from a <em>consistent and sufficiently strong</em> formal
system is called <strong>the syntactic argument</strong>. It’s considerably more
involved, so it’s worth pausing to reflect that the semantic argument
should be quite satisfying! Any formal system that we hope to use as
a foundation for all of mathematics had better be sound. Otherwise,
it’s able to prove formulas that make false statements, which doesn’t
sound like a great fit. So, if you try to follow the syntactic
argument below and find that your brain is left looking like a
pretzel, you can rest easy knowing that the semtantic argument is
“good enough”.</p>
<h2 id="the-syntactic-argument">The syntactic argument</h2>
<p>The syntactic argument does not assume that \(PM\) is sound, only that it
is consistent. Consistency is a weaker requirement than soundness,
which is what makes this argument more impressive. However, in
weaking one requirement, it needs to strengthen another. Instead of
requiring that \(PM\) is sufficiently expressive, it requires that it’s
sufficiently strong. Remember that sufficiently strong means that \(PM\)
can not only express all primitive recursive relations, but that it
can capture them. Here is a refresher on what it means to capture a
property (or relation) \(P\).</p>
<p>A formal system \(T\) can <em>capture</em> a property \(P\) by the open formula
\(\varphi(x)\) iff, for any \(n\):</p>
<ul>
<li>if \(n\) has the property \(P\), then \(T \vdash \varphi(\bar{n})\), and</li>
<li>if \(n\) does not have the property \(P\), then \(T \vdash \lnot \varphi(\bar{n})\)</li>
</ul>
<h4 id="if-pm-is-consistent-and-sufficiently-strong-then-pm-cannot-prove-formula-42">If \(PM\) is consistent (and sufficiently strong), then \(PM\) cannot prove formula 42</h4>
<p>Say we found a proof of formula 42 within \(PM\). Let’s compute the super
g.n. of that proof and call it \(p\). We can now derive the formula
\(\mathrm{Proof}(p, 42)\) within \(PM\). It’s easy to miss, but here is
where we needed the requirement that \(PM\) is <em>sufficiently strong</em> and
therefore can <em>capture</em> any primitive recursive relation. If \(PM\) can
capture the \(\mathrm{Proof}\) relation, then \(PM \vdash
\mathrm{Proof}(p, 42)\).</p>
<p>Formula 42 states \(\lnot \exists m. \mathrm{Proof}(m, 42)\) which is
equivalent to \(\forall m \lnot \mathrm{Proof}(m, 42)\). If we
instantiate the \(\forall m\) quantifier with the number \(p\), we get
\(\lnot \mathrm{Proof}(p, 42)\).</p>
<p>In summary, if we find a proof of formula 42 with super g.n. \(p\), \(PM\)
can derive both \(\mathrm{Proof}(p, 42)\) and \(\lnot \mathrm{Proof}(p,
42)\). In other words, if we can find a proof of formula 42, then \(PM\)
is inconsistent. Contrapositively, if \(PM\) is consistent (and
sufficiently strong), then \(PM\) cannot prove formula 42.</p>
<h4 id="if-pm-is-consistent-then-pm-cannot-prove-lnot-formula-42">If \(PM\) is consistent, then \(PM\) cannot prove \(\lnot\) formula 42</h4>
<p>In order to derive incompleteness, we also need to show that \(PM\) cannot
derive the negation of formula 42. In the semantic argument, this was
easy. We relied on the fact that if \(PM\) could not prove formula 42,
then formula 42 was true, and therefore \(\lnot\) formula 42 was false,
and a sound formal system cannot prove false statements. QED.</p>
<p>In the syntactic argument, it’s not so easy. This is where the
details get particularly subtle. We need to take a small digression
to explain the idea of \(\omega\)-consistency.</p>
<p><strong>\(\omega\)-inconsistency</strong>:</p>
<blockquote>
<p>A theory T is \(\omega\)-inconsistent iff, for some open formula
\(\varphi(x)\), \(T \vdash \exists \varphi(x)\) and yet for every number
\(m\) we have \(T \vdash \lnot \varphi(m)\).</p>
</blockquote>
<p>\(\omega\)-inconsistency is a Very Bad Thing (TM). It basically says
that you can prove something is not true for every single number, but
also you can prove that there exists “some” number for which it’s
true. In the same way that any useful formal system should be
consistent, it should also be \(\omega\)-consistent. Note that
\(\omega\)-consistency is a stronger requirement than plain
consistency; \(\omega\)-consistency implies plain consistency, but
plain consistency does not imply \(\omega\)-consistency.</p>
<p>We will now try to finish the syntactic argument, using the stronger
assumption that \(PM\) is \(\omega\)-consistent.</p>
<h4 id="if-pm-is-omega-consistent-then-pm-cannot-prove-lnot-formula-42">If \(PM\) is \(\omega\)-consistent, then \(PM\) cannot prove \(\lnot\) formula 42</h4>
<p>Say that \(PM\) is \(\omega\)-consistent and we can find a proof of
\(\lnot\) formula 42. If \(PM\) is \(\omega\)-consistent, then it is also
consistent, meaning that it cannot prove formula 42.</p>
<p>If \(PM\) cannot prove formula 42, we know that, for any \(m\), \(\lnot
\mathrm{Proof}(m, 42)\), otherwise we’ve found the proof of formula 42
within PM.</p>
<p>Recall the definition of formula 42 is \(\lnot \exists
m. \mathrm{Proof}(m, 42)\). So \(\lnot\) formula 42 is equivalent to
\(\exists m. \mathrm{Proof}(m, 42)\).</p>
<p>Now let’s bring the argument home. Say that \(PM\) can prove \(\lnot\)
formula 42, which is equivalent to \(\exists m. \mathrm{Proof}(m, 42)\).
If \(PM\) is consistent, then it cannot also prove formula 42, which means
that for any \(m\), it can prove \(\lnot \mathrm{Proof}(m, 42)\). The
ability to prove \(\lnot \mathrm{Proof}(m, 42)\), for any \(m\), as well
as \(\exists m. \mathrm{Proof}(m, 42)\) would mean that \(PM\) is
\(\omega\)-inconsistent. Contrapositively, if \(PM\) is
\(\omega\)-consistent, then it cannot prove \(\lnot\) formula 42.</p>
<p>This completes a sketch of the <strong>the syntactic argument</strong>. We’ve
demonstrated if \(PM\) is \(\omega\)-consistent and sufficiently strong then it
cannot derive either formula 42 or the negation of formula 42, making
it incomplete.</p>
<p>If you think that a bait-and-switch just occurred where we promised to
derive incompleteness from consistency but instead assumed
\(\omega\)-consistency, you’re 100% correct. Godel’s 1931 proof did in
fact require \(\omega\)-consistency to complete the syntactic argument.
However, in 1936, John Barkley Rosser proved <a href="https://en.wikipedia.org/wiki/Rosser%27s_trick" title="Rosser's
trick">Rosser’s
trick</a>, which showed that the requirement for \(\omega\)-consistency
may be weakened to consistency.</p>
<h2 id="filling-in-the-details">Filling in the details</h2>
<p>In the proof sketch above, I asked you to take a few things on faith.
One, that we could associate a unique number with any formula or proof
within PM. Next, that the \(\mathrm{Proof}\) relation could be
expressed and even captured within PM. Finally, that we can construct
“formula 42” (usually called the Godel sentence \(G\)) such that it’s
own Godel number is 42 <em>and</em> it claims that there is no proof of the
formula with number 42.</p>
<p>Originally, I had planned on filling in each of these details with
additional posts. However, filling in these details in a rigorous way
requires a full book (which I read, it’s called <a href="https://www.amazon.com/Introduction-Theorems-Cambridge-Introductions-Philosophy/dp/0521674530">An Introduction to
Gödel’s
Theorems</a>
by Peter Smith). So instead of trying to explain these details myself at some
intermediate level of rigor, which may or may not satisfy you, I’m
going to reference you to specific points at the book that explain
each point. Not only will that likely be a better explanation, but if
it doesn’t make sense or you require more background, you’ll have an
entire book to fall back on. Without further ado:</p>
<ol>
<li>
<p>Two formalized arithmetics</p>
<p>Everything we’ve talked about up to now has talked about “formal
systems” in general, but I found it quite helpful to see the
specifics of a few concrete formal theories of arithmetic. In
<a href="/assets/by-post/books/pdfs/Introduction-to-Godels-Theorems.pdf#page=76">chapter
10</a>,
Smith introduces \(BA\) (Baby Arithmetic) and then \(Q\). A few
chapters later, Smith introduces <a href="/assets/by-post/books/pdfs/Introduction-to-Godels-Theorems.pdf#page=106">First-order Peano
Arithmetic</a>
or \(PA\) (Note: \(PA\) and the \(PM\) system which I refer to
above are essentially the same).</p>
</li>
<li>
<p><a href="/assets/by-post/books/pdfs/Introduction-to-Godels-Theorems.pdf#page=150">Godel Numbering</a></p>
<p>For whatever reason, I find this detail relatively straightforward. It’s nice to see the details worked out, though.</p>
</li>
<li>
<p>The \(\mathrm{Proof}\) relation can be expressed</p>
<p>This is done in two steps. First, you show that the \(\mathrm{Proof}\) relation is primitive recursive (<a href="/assets/by-post/books/pdfs/Introduction-to-Godels-Theorems.pdf#page=154&zoom=100,90,100">19.4</a> and <a href="/assets/by-post/books/pdfs/Introduction-to-Godels-Theorems.pdf#page=163&zoom=100,90,533">20.4</a>). Then, you show that your formal system can express <em>all</em> primitive recursive relations (<a href="/assets/by-post/books/pdfs/Introduction-to-Godels-Theorems.pdf#page=127">15</a>).</p>
</li>
<li>
<p>The \(\mathrm{Proof}\) relation can be captured</p>
<p>I found this part of the proof to be, by far, the hardest to follow. I wish I could give you a two sentence explanation of the key insight here, but I’m not really sure what it is. Sometimes you just have to sit and stare at something until it finally clicks.</p>
<p>In <a href="/assets/by-post/books/pdfs/Introduction-to-Godels-Theorems.pdf#page=138">chapter 17</a>, Smith shows that any primitive recursive function or relation (including the \(\mathrm{Proof}\) relation) can be captured by \(Q\), and hence in \(PA\). Most of the heavy lifting of this proof is done by invoking <a href="/assets/by-post/books/pdfs/Introduction-to-Godels-Theorems.pdf#page=93&zoom=100,90,510">Theorem 11.5</a> that states that \(Q\) is \(\Sigma_1\)-complete.</p>
<p>Here’s my best attempt at describing what clicked for me: The proof of Theorem 11.5 shows that \(Q\) is \(\Sigma_1\)-complete, which means that every \(\Sigma_1\) formula can be either proven or disproven in \(Q\). I think I was expecting the proof of this statement to show me <em>how</em> to prove any arbitrary \(\Sigma_1\) formula, but that’s not what’s being claimed. How you show that a true \(\Sigma_1\) formula can be derived is extremely unsatisfying. If you have a true formula that says “there exists some \(x\) for which \(P(x)\) is true”, you can prove that by finding a specific \(x\) for which it’s true and then adding the existential quantifier at the front. But finding that specific \(x\) for which \(P(x)\) is true might be insanely hard.</p>
<p>Let’s say that Goldbach’s conjecture is false. Namely, “there exists some even number greater than two that is <em>not</em> the sum of two primes”. If that statement is true, then it’s provable within \(PA\). How would you prove it? Step 1: find an even number greater than two that is not the sum of two primes. Step 2: From that instance, derive the existential statement. Step 1 may take you a while.</p>
</li>
<li>
<p>Constructing Godel’s sentence</p>
<p>The final coup d’etat comes in <a href="/assets/by-post/books/pdfs/Introduction-to-Godels-Theorems.pdf#page=166">Chapter 21: \(PA\) is incomplete</a>. In this chapter, Smith shows how to construct Godel’s version of “formula 42” which claims (about itself) that it has no proof.</p>
</li>
</ol>
<p>And with that, I conclude my series on Godel’s First Incompleteness Theorem!</p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>His actual formula most certainly did not have a Godel number of 42. Normally, people refer to the number of his formula with the letter \(G\), but I find having a concrete number makes my brain hurt just a bit less. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>G12. Understanding Godel’s First Incompleteness Theorem - A Summary2021-07-18T00:00:00+00:002021-07-18T00:00:00+00:00http://blog.russelldmatt.com/2021/07/18/g12-understanding-godels-first-incompleteness-theorem<style> .ul { white-space:nowrap; } </style>
<p>If you’ve read through the entire Godel series thus far, you now have the prerequisite knowledge to precisely understand that statement that Godel’s first incompleteness theorem is making.</p>
<p>Here, again, is Godel’s first incompleteness theorem:</p>
<div id="theorem">
<p>Any <span class="consistency">consistent</span> <span class="formal-systems">formal system</span> F that is <span class="strong">sufficiently strong</span> is <span class="completeness">incomplete</span>; i.e., there are statements of the language of F which can neither be proved nor disproved in F.</p>
</div>
<p>This theorem pertains only to <span class="formal-systems"><em>formal
systems</em></span>, which are very rigid systems in which one starts from
a set of axioms and transforms them using the predefined
transformation rules of the system to derive more theorems. The
theorem is not making a statement about informal proofs which are
allowed to use any assumptions or leaps of logic that seem “obviously
true”. Formal systems in isolation can be somewhat meaningless, but
they are usually designed with an interpretation in mind. One can use
such an interpretation to translate formulas within a formal system
into mathematical statements.</p>
<p>Furthermore, this theorem is dealing with <span class="strong"><em>sufficiently strong</em></span> formal systems, by which
we mean that the formal system can <em>capture</em> all primitive recursive
relations.</p>
<p>The theorem says that if such a formal system is <span class="consistency"><em>consistent</em></span>, meaning that there are no
formulas \(\varphi\) for which it can derive both \(\varphi\) and its
negation \(\lnot \varphi\), then it is <span class="completeness"><em>incomplete</em></span>, meaning that there exists
some formula \(\varphi\) for which it cannot derive either \(\varphi\)
or \(\lnot \varphi\).</p>
<p>The important implication of this is that one of \(\varphi\) or
\(\lnot \varphi\) must be true, so if the formal system can derive
neither, then <strong>there exists a true statement that the formal system
can express, but not derive</strong>.</p>
<h3 id="why-is-that-so-surprising">Why is that so surprising?</h3>
<p>I personally find it quite intuitive (although apparently wrong) that
“mathematical truth” and “provable” are two ways of saying the same
thing. Before studying Godel, if someone had told me that there was a
true statement which was “unprovable”, I’d be pretty confused as to
what they meant by “true”. How are you sure it’s true if you can’t
prove it?</p>
<p>Now having a deeper understanding of Godel’s work, I understand there
are a few problems with that line of thought. For one, provable using
what initial assumptions? Every proof has to start <em>somewhere</em>, and
those are the starting axioms that you assume to be true.</p>
<p>I also now know that one way to show that a “true” statement is
unprovable is by showing that neither it nor it’s negation can be
proven. The beauty of this technique is that sidesteps the problem of
having to somehow show that a statement is true without being able to
prove it. Instead, all you need to agree on is that one of X or “not
X” must be true. If a system can prove neither, then there exists <em>a</em>
true statement that it cannot prove.</p>
<p>I want to quickly clarify that Godel’s theorem does not say that there
are true statements that cannot be proven under <em>any</em> formal system.
Just that, for a given formal system, there exist true statements that
cannot be proven. It’s a subtle distinction, but an important one.
<strong>It means that we cannot find a <em>single</em> set of initial assumptions
(and transformation rules) from which to derive all mathematical
truths.</strong> We may be able to derive all truths, but different truths
may need different starting assumptions. You have to admit, something
feels very unsatisfying about that.</p>
<p>If you find this extremely counterintuitive, you’re not alone. As
explained in <a href="{ %post_url
2021-07-05-g3-why-do-we-care-about-formal-systems.html %}">Why do we care about formal systems?</a>, finding a
single set of initial assumptions from which to derive all
mathematical truths was not some unrealistic ideal that no mathematicians
found plausible. On the contrary, in the early 1900’s many of the
world’s foremost mathematicians were <a href="https://en.wikipedia.org/wiki/Hilbert%27s_program">explicitly working towards this
goal</a>.</p>
<p>So, when Godel published his paper <em>On Formally Undecidable
Propositions of Principia Mathematica And Related Systems</em>, it
seriously shocked the mathematical community. In a single instant,
the goal of finding a single formal system on which all math could be
based - likely the life’s work of many mathematicians at the time -
was shown to be unattainable. Scientifically minded people often say
that being proven wrong is a gift because it’s in those moments when
you learn the most. Even so, I suspect that this was a tough pill for
some to swallow.</p>
<h3 id="the-proof">The proof</h3>
<p>In a future series of posts we will outline how Godel actually went
about proving this theorem.</p>
<script src="/assets/js/rough-notation.js"></script>
<script defer="">
MathJax.Hub.Queue(function () {
let blue = "#5680E9";
let lblue = "#84CEEB";
let teal = "#5AB9EA";
let grey = "#C1C8E4";
let purple = "#8860D0";
let pastel_red = "#FF6961";
class_to_color = {
"consistency": lblue,
"formal-systems": pastel_red,
"strong": purple,
"completeness": blue,
}
for (class_name in class_to_color) {
let color = class_to_color[class_name];
let elts = document.getElementsByClassName(class_name);
for (el of elts) {
RoughNotation.annotate(el, { type: "underline", color: color }).show();
}
}
let theorem = document.getElementById("theorem");
RoughNotation.annotate(theorem, {
type: 'bracket',
color: pastel_red,
brackets: ['left', 'right'],
animate: false
}).show();
(function () {
let elts = Array.from(document.getElementsByClassName("ul"));
for (el of elts) {
RoughNotation.annotate(el, { type: "underline", color: "red" }).show();
};
})();
});
</script>G11. Primitive Recursive Functions2021-07-14T00:00:00+00:002021-07-14T00:00:00+00:00http://blog.russelldmatt.com/2021/07/14/g11-primitive-recursive-functions<style> .ul { white-space:nowrap; } </style>
<p>Godel’s incompleteness theorem talks about formal systems that are <span class="ul">“sufficiently strong”</span>.
In this post, we will clarify what exactly is meant by that phrase.</p>
<p>Primitive recursive functions are a very large class of functions
that, very roughly speaking, correspond to functions that you can
compute “with only for loops”. The constraint of using only for loops
(as opposed to while loops) means that these functions cannot create
infinite loops and will therefore <span class="ul">always complete in a finite number of steps</span>. Furthermore, the number of steps it will take to complete is <span class="ul">bounded</span> (since
for loops have a predetermined length). We will define them much more
precisely later, but first we should talk about why we’re interested
in them.</p>
<h3 id="motivation">Motivation</h3>
<p>As we mentioned in the previous post, Godel defined what is known as “the Godel sentence” which can be interpreted as “this statement cannot be derived within this formal system”. At first glance, it’s not obvious that such a statement can be constructed within the formal system that Godel was using. However, Godel meticulously built up a series of relations - helper functions if you will - that he used in building his sentence. Furthermore, he showed that each one is “primitive recursive”. Lastly, he showed that his formal system can express (actually capture) <em>any</em> primitive recursive relation. Through this chain of logic, he showed that his formal system <span class="ul">can express the Godel sentence</span>.</p>
<p>This chain of logic actually says a bit more. It shows that <em>any</em> formal system, not just the one that Godel was using, that can express all primitive recursive relations can express a version of “the Godel sentence”. This is what is meant by a <strong>sufficiently expressive</strong> formal system: a formal system that can express all primitive recursive relations. The stronger version of this is a <strong>sufficiently strong</strong> formal system, which is a formal system that can <em>capture</em> all primitive recursive relations.</p>
<blockquote>
<p>Remember that expressing a relation between two numbers \(x\) and \(y\) with an open formula \(\varphi(x, y)\) means that the formula is <em>true</em> iff \(x\) and \(y\) have that particular relation, while capturing the relation means that \(\varphi(x, y)\) is derivable within the formal system if \(x\) and \(y\) have that particular relation and \(\lnot \varphi(x, y)\) is derivable if not.</p>
<p>Expressing is to truth as capturing is to derivability.</p>
</blockquote>
<p>Hopefully that sufficiently motivates the desire to understand what a primitive recursive relation (or function) actually is. So let’s get started.</p>
<h3 id="definition">Definition</h3>
<p>As with all of my posts, there probably exist better explanations out there on the internet. In this case, however, I think I’ve found one. The first 4 videos of <a href="https://www.youtube.com/playlist?list=PLC-8dKj3F0NUnR8LeBGH3utAI9aQjFbi5">this 5-video YouTube playlist</a> do an excellent job at defining and explaining primitive recursive functions. I will attempt to explain it myself below, but I highly recommend watching those videos.</p>
<p>We will start with the precise, but incredibly abstract definition and then work through a series of examples.</p>
<div class="aside">
<p>A quick note about notation before we start. Functions that take \(n\) arguments are called \(n\)-ary functions. One notational method to make it clear that a function takes \(n\) arguments to write it like so \(f(x_1, \ldots, x_n)\). This is clear and intuitive, but long - especially when composing \(k\) functions each with \(m\) arguments. A different method would be to notate each \(n\)-ary function with an \(n\) superscript, like so: \(f^n\). We will use both methods below.</p>
</div>
<div class="like-blockquote">
<p>The basic primitive recursive functions are given by these axioms:</p>
<ol>
<li><strong>Constant function</strong>: The 0-ary constant function \(Z^0 = 0\) is primitive recursive.</li>
<li><strong>Successor function</strong>: The 1-ary successor function \(S^1\), which returns the successor of its argument, is primitive recursive. That is, \(S^1(k) = k + 1\).</li>
<li><strong>Projection function</strong>: For every \(n≥1\) and each \(i\) with \(1≤i≤n\), the \(n\)-ary projection function \(P^n_i\), which returns its \(i\)-th argument, is primitive recursive. For example, \(P^3_2(x,y,z) = y\).</li>
</ol>
<p>More complex primitive recursive functions can be obtained by applying the operations given by these axioms:</p>
<ol>
<li>
<p><strong>Composition</strong>: Given a \(k\)-ary primitive recursive function \(f^k\), and \(k\) many \(m\)-ary primitive recursive functions \(g^m_1,\ldots,g^m_k\), the composition of \(f^k\) with \(g^m_1,\ldots,g^m_k\), i.e. the \(m\)-ary function
\(h^m(x_1,\ldots,x_m) = f^k(g^m_1(x_1,\ldots,x_m),\ldots,g^m_k(x_1,\ldots,x_m))\) is primitive recursive.</p>
</li>
<li>
<p><strong>Primitive recursion operator</strong>: Given \(f^k\), a \(k\)-ary primitive recursive function, and \(g^{k+2}\), a \((k+2)\)-ary primitive recursive function, the primitive recursion of \(f^k\) and \(g^{k+2}\) is defined as the \((k+1)\)-ary function \(h^{k+1}\) constructed as follows:
\(\begin{aligned}
h^{k+1} (0, x_1, \ldots, x_k) &= f^k (x_1, \ldots, x_k) \\
h^{k+1} (S(y), x_1, \ldots, x_k) &= g^{k+2} (y, h (y, x_1, \ldots, x_k), x_1, \ldots, x_k)\end{aligned}\)</p>
</li>
</ol>
<p>We will use the symbol \(Pr^{k+1}(f^k,g^{k+2})\) to indicate the primitive recursion of \(f^k\) and \(g^{k+2}\).</p>
<p>The <strong>primitive recursive</strong> functions are the basic functions and those obtained from the basic functions by applying composition and primitive recursion a finite number of times.</p>
</div>
<h3 id="interpretation-of-the-primitive-recursion-operator">Interpretation of the Primitive Recursion Operator</h3>
<p>In a rare turn of events, Wikipedia gives a (somewhat) intuitive way to think about the primitive recursive operator as a for loop:</p>
<blockquote>
<p>Interpretation. The function \(h\) acts as a for loop from 0 up to the value of its first argument. The rest of the arguments for \(h\), denoted here with \(x_i\)’s \((i = 1, \ldots, k)\), are a set of initial conditions for the for loop which may be used by it during calculations but which are immutable by it. The functions \(f\) and \(g\) on the right side of the equations which define \(h\) represent the body of the loop, which performs calculations. Function \(f\) is only used once to perform initial calculations. Calculations for subsequent steps of the loop are performed by \(g\). The first parameter of \(g\) is the “current” value of the for loop’s index. The second parameter of \(g\) is the result of the for loop’s previous calculations, from previous steps. The rest of the parameters for \(g\) are those immutable initial conditions for the for loop mentioned earlier. They may be used by \(g\) to perform calculations but they will not themselves be altered by \(g\).</p>
</blockquote>
<h3 id="examples">Examples</h3>
<p>The only way that I was able to really understand primitive recursion was seeing many examples and then working through a few myself. Let’s start with an easy one.</p>
<div class="brkt-l">
<h4 id="add2x--x--2">Add2(x) = x + 2</h4>
<p>To implement \(Add2^1(x) = x + 2\), we just need to apply the successor function \(S^1\) twice, which we can do via composition. Since the successor function is primitive recursive and composition is also primitive recursive, then the resulting \(Add2^1\) function is also primitive recursive.</p>
<p>\(Add2^1(x) = S^1(S^1(x)) = x + 2\)</p>
</div>
<div class="brkt-l">
<h4 id="zerox--0">Zero(x) = 0</h4>
<p>Notice that the 0-ary zero function \(Z^0\) is given to us as an axiom, but not the 1-ary zero function \(Z^1(x) = 0\). We can define it ourselves using primitive recursion:</p>
<p>\(\begin{aligned}
Z^1(0) &= f^0() = Z^0 \\
Z^1(y+1) &= g^2(y, Z^1(y)) = P^2_2(y,Z^1(y)) = Z^1(y) \\
Z^1 &= Pr(f^0, g^2)
\end{aligned}\)</p>
</div>
<p>Try to manually compute \(Z^1(2)\) using the definition above. Once you’re done, click <a href="/assets/by-post/g11-primitive-recursive-functions/Z2.jpeg">here</a> to check your work.</p>
<div class="brkt-l">
<h4 id="addxy--x--y">Add(x,y) = x + y</h4>
<p>\(\begin{aligned}
Add^2(0, y) &= f^1(y) = P^1_1(y) = y \\
Add^2(x+1, y) &= g^3(x, Add^2(x,y), y) = S(P^3_2) = Add^2(x,y) + 1 = x + y + 1 \\
Add^2 &= Pr(f^1, g^3)
\end{aligned}\)</p>
</div>
<div class="brkt-l">
<h4 id="multxy--x--y">Mult(x,y) = x * y</h4>
<p>\(\begin{aligned}
Mult^2(0, y) &= f^1(y) = Z^1(y) = 0 \\
Mult^2(x+1, y) & = g^3(x,Mult(x,y), y) = Add^2(P^3_2, P^3_3) = Add^2(Mult(x,y),y) = x \cdot y + y \\
Mult^2 &= Pr(f^1, g^3)
\end{aligned}\)</p>
</div>
<p>Notice the role of the projection functions in the examples thus far. They serve a critical, but trivial role. They allow you to select which arguments to pass to another function. In \(Add^2\), the composition of \(S(P^3_2)\) is just the function \(g(x,y,z) = S(y)\), since \(P^3_2\) is a function which takes 3 arguments and returns the 2nd. In \(Mult^2\), \(Add^2(P^3_2, P^3_3)\) is really just a confusing way to write \(g(x,y,z) = Add^2(y, z)\). From now on, for the sake of readability, I will omit the projection functions and allow myself to select and reorder arguments with the knowledge that we can make this rigorous via the use of projection functions if we need.</p>
<p>Also notice that I used \(Add^2\) to define \(Mult^2\). This is perfectly acceptable since primitive recursive functions are the basic functions and those obtained from the basic functions by applying composition and primitive recursion <em>a finite number of times</em>. So, we can use any primitive recursive function in the definition of another primitive recursive function.</p>
<p>Practice is critical in order to intuitively grasp how primitive recursion is like a for loop. Try to manually compute \(Mult^2(3,5)\). Once you’re done, click <a href="/assets/by-post/g11-primitive-recursive-functions/Mult35.jpeg">here</a> to check your work.</p>
<div class="brkt-l">
<h4 id="pown-x--xn">Pow(n, x) = \(x^n\)</h4>
<p>\(\begin{aligned}
Pow^2(0, x) &= f^1(x) = S(x) \\
Pow^2(n+1, x) &= g^3(n, Pow^2(n, x), x) = Mult^2(x, Pow^2(n,x)) = x \cdot x^n \\
Pow^2 &= Pr(f^1, g^3)
\end{aligned}\)</p>
</div>
<p>Are you getting the hang of these yet? If you can define a base case for when the first argument is equal to 0, and a recursive case that computes \(f(x+1,y)\) based on some combination of \(x\), \(y\), and \(f(x,y)\), then you can combine these using primitive recursion to define your function for any \(x\) and \(y\).</p>
<h3 id="a-goal-the-div-function">A Goal: The Div function</h3>
<p>We’re going to now head towards an ambitious goal. I want to define the following primitive recursive function: \(Div(x,y)\) which equals \(1\) if \(x\) is divisible by \(y\) and \(0\) otherwise. To do this, we’re going to need to build up a series of simpler primitive recursive functions to help.</p>
<div class="brkt-l">
<h4 id="sgnx--1-if-x--0-else-0">Sgn(x) = 1 if x > 0 else 0</h4>
<p>\(\begin{aligned}
Sgn^1(0) &= f^0() = Z^0 = 0 \\
Sgn^1(x+1) &= g^2(x, Sgn(x)) = S(Z^2) = 1 \\
Sgn^1 &= Pr(f^0, g^2)
\end{aligned}\)</p>
</div>
<div class="brkt-l">
<h4 id="predx--max0-x---1">Pred(x) = max(0, x - 1)</h4>
<p>Note that our “predecessor” function can never be negative, because primitive recursive functions only deal with the natural numbers, so \(Pred(0) = 0\).</p>
\[\begin{aligned}
Pred^1(0) &= f^0() = Z^0 = 0 \\
Pred^1(x+1) &= g^2(x, Pred^1(x)) = x \\
Pred^1 &= Pr(f^0, g^2)
\end{aligned}\]
</div>
<div class="brkt-l">
<h4 id="subxy--max0-y---x">Sub(x,y) = max(0, y - x)</h4>
<p>Note that our subtraction function can never be negative, like \(Pred\). Also note that \(Sub(x,y)\) is \(max(0, y - x)\) not \(max(0, x - y)\).</p>
\[\begin{aligned}
Sub^2(0, y) &= f^1(y) = y \\
Sub^2(x+1, y) &= g^3(x, Sub^2(x,y), y) = Pred^1(Sub^2(x,y)) \\
Sub^2 &= Pr(f^1, g^3)
\end{aligned}\]
</div>
<div class="brkt-l">
<h4 id="absdiffxy---x---y">Absdiff(x,y) = \(| x - y|\)</h4>
<p>\(Absdiff^2(x,y) = Add^2(Sub^2(x,y),Sub^2(y,x))\)</p>
</div>
<div class="brkt-l">
<h4 id="neqxy--1-if-x-neq-y-else-0">Neq(x,y) = 1 if \(x \neq y\) else 0</h4>
<p>\(Neq^2(x,y) = Sgn^1(Absdiff^2(x,y))\)</p>
</div>
<div class="brkt-l">
<h4 id="eqxy--1-if-x--y-else-0">Eq(x,y) = 1 if \(x = y\) else 0</h4>
<p>\(Eq(x,y) = Sub^2(Neq^2(x,y), S^1(Z^2)) = 1 - Neq^2(x,y)\)</p>
</div>
<div class="brkt-l">
<h4 id="remxy--x--y">Rem(x,y) = x % y</h4>
\[\begin{aligned}
Rem^2(0, y) &= f^1(x) = Z^1(x) = 0 \\
Rem^2(x+1, y) &= g^3(x, Rem^2(x, y), y) = Neq(Rem^2(x,y) + 1,y) \cdot (Rem^2(x,y) + 1) \\
Rem^2 &= Pr(f^1, g^3)
\end{aligned}\]
<p>The recursive case (\(g^3\)) is a little unintuitive. It basically says, if the (remainder of x / y) + 1 = y then 0 else (remainder of x / y) + 1.</p>
</div>
<div class="brkt-l">
<h4 id="divxy--1-if-x-is-divisible-by-y-else-0">Div(x,y) = 1 if x is divisible by y else 0</h4>
\[\begin{aligned}
Div^2(x,y) = Eq^2(Rem^2(x,y),Z^2(x,y)) = Eq^2(Rem^2(x,y), 0)
\end{aligned}\]
<p>The last step was refreshingly simple. No primitive recursion, just simple function composition.</p>
</div>
<h3 id="whats-not-primitive-recursive">What’s not primitive recursive?</h3>
<p>When introducing a property, it’s helpful to show a few examples of
things that <em>do not</em> have that property. In this case, what functions
<em>are not</em> primitive recursive?</p>
<p>Let’s take a step back and define three broad classes of functions:</p>
<ul>
<li>Primitive Recursive Functions</li>
<li>Computable functions that are not primitive recursive</li>
<li>Uncomputable functions</li>
</ul>
<p>Intuitively, the primitive recursive functions are the set of
functions that can be computed using “only for loops” which means they
must terminate (there are no infinite while loops) <em>and</em> the number of
steps can be bounded (since for loops have a predetermined length).</p>
<p>What, then, are computable functions? Intuitively, computable
functions are the set of functions that can be computed by a computer
(e.g. a Turing machine) given unlimited amounts of time and space.
Essentially, we remove the restriction that it must only use for loops
which means that the number of steps it takes to compute the result is
no longer necessarily bounded. The most classic example of such a
function is the <a href="https://en.wikipedia.org/wiki/Ackermann_function">Ackermann
function</a>.</p>
<p>What, then, are the uncomputable functions!? An unhelpful (but
accurate) definition is that they are the set of functions that are
not… computable. A more intuitive definition is that there is no
<em>finite</em> procedure (or algorithm) that can compute the function. The
most famous example of such a problem is <a href="https://en.wikipedia.org/wiki/Halting_problem">the Halting
problem</a>. A simpler
example is <a href="https://en.wikipedia.org/wiki/Busy_beaver">the Busy
beaver</a>. I will give a
slapdash explanation of the busy beaver problem here and why it’s
uncomputable.</p>
<p>An “nth Busy beaver” is binary-alphabet Turing machine with \(n\)
states that reads a tape initially consisting of all zeros. The Turing
machine will run and must eventually halt. At the point of halting,
the tape must contain as many or more 1’s on it than any other \(n\)
state Turing machine would produce under the same scenario.</p>
<p>The Busy beaver function takes as input \(n\) and returns the number
of 1’s that the “nth Busy beaver” would produce. Wikipedia states
that determining whether an arbitrary Turing machine is a busy beaver
is undecidable.</p>
<p>You may think: why not just enumerate all possible \(n\)-state
binary-alphabet Turing machines, run them all, and see which one
produces the most 1’s after they all halt? I <em>think</em> the problem with
this is that some of those Turing machines may run forever. Consider
a Turing machine that you’re attempting to “test” that has run for one
millions steps so far. How would you reliably decide whether or not
it will eventually halt? This seems like the Halting problem, which
is also uncomputable. Take this entire paragraph with a large grain
of salt as these are my own speculations and I’m relatively new to
these ideas.</p>
<p>One interesting addendum is that most functions are uncomputable,
which is unintuitive given every function we run into on a daily basis
is probably computable. This reminds me of the fact that almost all
real numbers are trancendental, but I bet you can’t name more than 2.</p>
<script src="/assets/js/rough-notation.js"></script>
<script defer="">
MathJax.Hub.Queue(function () {
(function () {
let elts = Array.from(document.getElementsByClassName("ul"));
for (el of elts) {
RoughNotation.annotate(el, { type: "underline", color: "red" }).show();
};
})();
(function () {
let elts = Array.from(document.getElementsByClassName("brkt-l"));
for (el of elts) {
RoughNotation.annotate(el, { type: 'bracket', color: 'red', padding: [0, 5], brackets: ['left'] }).show();
};
})();
});
</script>G10. Expressibility and Capturability2021-07-13T00:00:00+00:002021-07-13T00:00:00+00:00http://blog.russelldmatt.com/2021/07/13/g10-expressibility-and-capturability<style> .ul { white-space:nowrap; } </style>
<p>A critical step in Godel’s proof is his construction of “the Godel sentence” which, when interpreted, means <span class="ul">“this statement cannot be derived within this formal system”</span>. The formal system in which he constructed this statement is one that deals with the natural numbers, first order logic, and elementary arithmetic such as the successor function.</p>
<p><span class="ul">How in the world, then, did he express such a statement?</span> It is certainly not obvious that it’s possible. After all, it is not true that any formal system can express any statement, so the ability to write down a formula that has the above meaning is not a given. It takes a lot of work to demonstrate that such a statement can be expressed.</p>
<p>We’re not going to demonstrate how Godel expressed this statement in this post, but rather talk about the notion of expressibility in general. We want to know, precisely, what it means to be able to express something in a formal system, as well as the stronger property of capturing (or representing<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>).</p>
<p>First, let’s recall statements that cannot be expressed within particular formal systems. We’ve seen examples of this already. In the Add system, we could not express (true) statements of addition that dealt with negative numbers. When discussing first vs second order logic, we noted that first order logic could not quantify over sets. So, first order logic would not be able to express the statement “there exists a property \(P\) that has 3 elements”.</p>
<p>Now let’s see a few examples of formulas that <em>do</em> express familiar properties:</p>
<h4 id="evenness">Evenness</h4>
\[\exists v (2 \times v = x)\]
<p>The above formula is an <em>open</em> formula that expresses the property of being even. A formula is open when there are one or more variables that are not bound. In this case, \(v\) is bound by the quantifier \(\exists\), but \(x\) is not. So this formula has one free variable: \(x\). This open formula expresses the property of being even because for any number \(x\), this formula is true iff \(x\) is even. Put another way, this open formula has the set of even numbers as its extension.</p>
<h4 id="primeness">Primeness</h4>
\[(x \neq 1 \land \forall u \forall v (u \times v = x \supset (u = 1 \lor v = 1)))\]
<p>The above open formula expresses the property of being prime. In words, it says that \(x \neq 1\) and for all two numbers \(u\) and \(v\), if \(u \times v = x\) then either \(u = 1\) or \(v = 1\).</p>
<h4 id="definition">Definition</h4>
<p>An open formula \(\varphi(x)\) can <em>express</em> a property \(P\) iff, for any \(n\):</p>
<ul>
<li>if \(n\) has the property \(P\), then \(\varphi(\bar{n})\) is true, and</li>
<li>if \(n\) does not have the property \(P\), then \(\lnot \varphi(\bar{n})\) is true.</li>
</ul>
<p>This definition can be extended to many-place relations (not just one-place properties) in the obvious way.</p>
<h2 id="capturing-relations">Capturing Relations</h2>
<p>There is a stronger version of expressing a property (or relation) which is <em>capturing</em> a property (or relation).</p>
<p>A formal system \(T\) can <em>capture</em> a property \(P\) by the open formula \(\varphi(x)\) iff, for any \(n\):</p>
<ul>
<li>if \(n\) has the property \(P\), then \(T \vdash \varphi(\bar{n})\), and</li>
<li>if \(n\) does not have the property \(P\), then \(T \vdash \lnot \varphi(\bar{n})\)</li>
</ul>
<p>This definition can be extended to many-place relations (not just one-place properties) in the obvious way.</p>
<p>Expressing a property with a formula \(\varphi(x)\) means that the \(\varphi(x)\) is <span class="ul"><em>true</em></span> iff \(x\) has the relevant property, capturing a property means that \(\varphi(x)\) is <span class="ul">derivable</span> in the formal system iff \(x\) has the relevant property.</p>
<p><span class="ul"><em>Expressing is to truth as capturing is to derivability.</em></span></p>
<script src="/assets/js/rough-notation.js"></script>
<script defer="">
MathJax.Hub.Queue(function () {
let elts = Array.from(document.getElementsByClassName("ul"));
for (el of elts) {
RoughNotation.annotate(el, { type: "underline", color: "red" }).show();
}
});
</script>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>In another example of Naming Is Hard™, the notion of capturability goes by other names, most notably representability (which is used by <a href="https://www.amazon.com/G%C3%B6del-Escher-Bach-Eternal-Golden/dp/0465026567">GEB</a>). <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>