There’s a lot to say about Bayes theorem. And most of it’s been said. In my little echo chamber of the internet, just about every YouTuber that I watch or blogger that I read has talked about Bayes theorem. So the critical question is: what do I have to add?

Just this for now. Almost every introduction to Bayes theorem introduces the formula like so:

\[P(A|B) = \frac{P(B|A)P(A)}{P(B)}\]

I think the reason that this formulation is so common is that when you want to use Bayes theorem, you’re probably trying to compute \(P(A|B)\). The fact that this formula has isolated \(P(A|B)\) on one side makes it practical to use.

However, I’d argue that an introduction to a topic should prioritize giving the reader an intuition for why the formula is true. If possible, the reader should come away thinking that it’s so obvious that it’s almost uninteresting. I think the following formulation does a better job at that.

\[P(A|B) P(B) = P(A \& B) = P(B \& A) = P(B|A) P(A)\]

Derivation

One way to compute the probability of two events \(A\) and \(B\) happening is as follows:

\[P(A \& B) = P(A|B) P(B)\]

To make this concrete, let’s say \(A\) is “I go to the park tomorrow” and \(B\) is “it rains tomorrow”. The probability that it rains tomorrow and I go to the park is [the probability that it rains] times [the probability that I go to the park given it rains]. Stated that way, it seems kind of obvious.

From there it’s a small leap to notice that \(P(A\&B) = P(B \& A)\), so we could also write the probability like so:

\[P(A \& B) = P(B|A) P(A)\]

And the very last step is to combine the two equations:

\[\begin{align} P(A|B) P(B) &= P(A \& B) = P(B|A) P(A) \\ P(A|B) P(B) &= P(B|A) P(A) \end{align}\]

Divide each side by \(P(B)\) and you will get the standard formulation of Bayes theorem.

I think this route to deriving Bayes theorem makes the connection to \(P(A \& B)\) more clear and, at least in my opinion, the fact that you can rewrite \(P(A\&B)\) in two ways is a nice intuition for why Bayes theorem works.