Bayes Theorem: p(A|B) = p(A) * p(B|A) / p(B) — But why?

A student recently asked me if I could show them why Bayes Theorem looks the way it does. If we are trying to determine the probability A given B, how did we arrive at the theorem in the title?

Let’s create some data and see if we can sort it out. We will simulate a 2×2 table, similar to one we might find in sports medicine journals when looking at some type of test (positive or negative) and some type of outcome (disease or no disease).

dat_tbl <- data.frame(
  test = c("positive", "negative", "total"),
  disease = c(25, 5, 30),
  no_disease = c(3, 40, 43),
  total = c(28, 45, 73)
)

dat_tbl

A logical question we’d like to answer here is, “What is the probability of disease given a positive test.” Written in probability notation we are asking, p(Disease | Positive).

Using the data in the table, we can quickly compute this as 25 people who were positive and had the disease divided by 28 total positive tests. 25/28 = 89.3%

Of course, we could also compute this using Bayes Theorem:

p(Disease | Positive) = p(Disease) * p(Positive | Disease) / p(Positive)

We store the necessary values in R objects and then compute the result

### What we want to know: p(Disease | Positive)
# p(Disease | Positive) = p(Disease) * p(Positive | Disease) / p(Positive)
# p(A | B) = p(A) * p(B|A) / p(B)

p_disease <- 30/73
p_positive_given_disease <- 25/30
p_positive <- 28/73

p_no_disease <- 43/73
p_positive_given_no_disease <- 3/43

p_disease_given_positive <- (p_disease * p_positive_given_disease) / (p_disease * p_positive_given_disease + p_no_disease * p_positive_given_no_disease) 
p_disease_given_positive

This checks out. We get the exact same result as when we did 25/28.

Okay, how did we get here? Why does it work out like that?

The math works out because we start with two different joint probabilities, p(A n B) and p(B n A). Or, in our case, p(Disease n Positive) and p(Positive n Disease). Formally, we can write these as follows:

We’ve already stored the necessary probabilities in specific elements, above. But here it what they both look like using our 2×2. First I’ll calculate it with the counts directly from the table and then calculate it with the R elements that we stored. You’ll see the answers are the same.

## Joint Probability 1: p(Positive n Disease) = p(Positive | Disease) * p(Disease)

25/30 * 30/73

p_positive_given_disease * p_disease

## Joint Probability 2: p(Disease n Positive) = p(Disease | Positive) * p(Positive)

25/28 * 28/73

p_disease_given_positive * p_positive

Well, would you look at that! The two joint probabilities are equal to each other. We can formally test that they are equal to each other by setting up a logical equation in R.

## These two joint probabilities are equal!
#  p(Positive n Disease) = p(Disease n Positive)

(p_positive_given_disease * p_disease) == (p_disease_given_positive * p_positive)

So, if they are equal, what we are saying is this:

p(Disease | Positive) * p(Positive) = p(Positive | Disease) * p(Disease)

Now, with some algebra, we can divide the right side of the equation by p(Positive) and we are left with Bayes Theorem for our problem:

p(Disease | Positive) = p(Disease) * p(Positive | Disease) / p(Positive)

Putting it altogether, it looks like this:

So, all we’ve done is taken two joint probabilities and used some algebra to arrange the terms in order to get us to the conditional probability we were interested in, p(Disease | Positive) and we end up with Bayes Theorem.