There’s a lot to say about Bayes theorem. And most of it’s been said. In my little echo chamber of the internet, just about every YouTuber that I watch or blogger that I read has talked about Bayes theorem. So the critical question is: what do I have to add?

Just this for now. Almost every introduction to Bayes theorem introduces the formula like so:

P(A|B)=P(B|A)P(A)P(B)

I think the reason that this formulation is so common is that when you want to use Bayes theorem, you’re probably trying to compute P(A|B). The fact that this formula has isolated P(A|B) on one side makes it practical to use.

However, I’d argue that an introduction to a topic should prioritize giving the reader an intuition for why the formula is true. If possible, the reader should come away thinking that it’s so obvious that it’s almost uninteresting. I think the following formulation does a better job at that.

P(A|B)P(B)=P(A&B)=P(B&A)=P(B|A)P(A)

Derivation

One way to compute the probability of two events A and B happening is as follows:

P(A&B)=P(A|B)P(B)

To make this concrete, let’s say A is “I go to the park tomorrow” and B is “it rains tomorrow”. The probability that it rains tomorrow and I go to the park is [the probability that it rains] times [the probability that I go to the park given it rains]. Stated that way, it seems kind of obvious.

From there it’s a small leap to notice that P(A&B)=P(B&A), so we could also write the probability like so:

P(A&B)=P(B|A)P(A)

And the very last step is to combine the two equations:

P(A|B)P(B)=P(A&B)=P(B|A)P(A)P(A|B)P(B)=P(B|A)P(A)

Divide each side by P(B) and you will get the standard formulation of Bayes theorem.

I think this route to deriving Bayes theorem makes the connection to P(A&B) more clear and, at least in my opinion, the fact that you can rewrite P(A&B) in two ways is a nice intuition for why Bayes theorem works.