MathIsimple

Lesson 3-1: Bayes' Theorem & Conditional Probability

We build intuition first, then derive formulas, then practice with real data. All formulas are rendered with KaTeX and kept on a single line for readability.

Learning Objectives

  • Interpret events, sample spaces, and conditional probability.
  • Apply the Law of Total Probability to compute evidence.
  • Use Bayes' theorem to convert prior beliefs into posterior probabilities.
  • Model simple text classification using a Naive Bayes assumption.

Core Formulas

Conditional Probability

P(AB)=P(AB)P(B)P(A\mid B) = \dfrac{P(A\cap B)}{P(B)}

Law of Total Probability

P(B)=iP(BAi)P(Ai)P(B) = \sum_{i} P(B\mid A_i)\,P(A_i)

Bayes' Theorem

P(AiB)=P(BAi)P(Ai)jP(BAj)P(Aj)P(A_i\mid B) = \dfrac{P(B\mid A_i)\,P(A_i)}{\sum_j P(B\mid A_j)\,P(A_j)}

Worked Example 1 — Medical Test

Suppose a disease prevalence is P(D)=0.01P(D)=0.01, test sensitivity is P(+D)=0.95P(+\mid D)=0.95, and specificity is P(¬D)=0.95P(-\mid \neg D)=0.95. If a patient tests positive, what is P(D+)P(D\mid +)?

Step 1: Evidence

P(+)=P(+D)P(D)+P(+¬D)P(¬D)=0.950.01+0.050.99=0.059P(+) = P(+\mid D)P(D) + P(+\mid \neg D)P(\neg D) = 0.95\cdot 0.01 + 0.05\cdot 0.99 = 0.059

Step 2: Posterior

P(D+)=P(+D)P(D)P(+)=0.950.010.0590.161P(D\mid +) = \dfrac{P(+\mid D)P(D)}{P(+)} = \dfrac{0.95\cdot 0.01}{0.059} \approx 0.161
Interpretation: despite a positive result, the probability of actually having the disease is only about 16.1% due to the low prior.

Worked Example 2 — Email Classification (Naive Bayes)

Classes: Spam (S), Work (W), Personal (P). Priors: P(S)=0.4, P(W)=0.35, P(P)=0.25P(S)=0.4,\ P(W)=0.35,\ P(P)=0.25. Let event UU denote that the token "urgent" appears. Likelihoods:P(US)=0.1, P(UW)=0.8, P(UP)=0.05P(U\mid S)=0.1,\ P(U\mid W)=0.8,\ P(U\mid P)=0.05.

Step 1: Evidence

P(U)=0.10.4+0.80.35+0.050.25=0.3325P(U) = 0.1\cdot 0.4 + 0.8\cdot 0.35 + 0.05\cdot 0.25 = 0.3325

Step 2: Posterior

P(SU)=0.10.40.33250.120P(S\mid U) = \dfrac{0.1\cdot 0.4}{0.3325} \approx 0.120
P(WU)=0.80.350.33250.842P(W\mid U) = \dfrac{0.8\cdot 0.35}{0.3325} \approx 0.842
P(PU)=0.050.250.33250.038P(P\mid U) = \dfrac{0.05\cdot 0.25}{0.3325} \approx 0.038
Result: with the token “urgent”, the email is most likely Work (≈84.2%).

Practice Problems

Problem 1

Given P(A)=0.3P(A)=0.3, P(BA)=0.8P(B\mid A)=0.8, P(B¬A)=0.4P(B\mid \neg A)=0.4, find P(AB)P(A\mid B).

P(B)=0.80.3+0.40.7=0.52,P(AB)=0.80.30.520.462P(B)=0.8\cdot 0.3 + 0.4\cdot 0.7 = 0.52,\quad P(A\mid B)=\dfrac{0.8\cdot 0.3}{0.52}\approx 0.462

Problem 2

Partition the sample space into three disjoint events A1,A2,A3A_1,A_2,A_3 with priors (0.5,0.3,0.2)(0.5,0.3,0.2) and likelihoods for evidence BB: (0.2,0.5,0.7)(0.2,0.5,0.7). Find posteriors.

P(B)=0.39, P(A1B)0.256, P(A2B)0.385, P(A3B)0.359P(B)=0.39,\ P(A_1\mid B)\approx 0.256,\ P(A_2\mid B)\approx 0.385,\ P(A_3\mid B)\approx 0.359

Key Takeaways

  • Posterior = Likelihood × Prior ÷ Evidence. The evidence is a weighted average over competing hypotheses.
  • Always check priors: rare events with imperfect tests often yield counterintuitive posteriors.
  • Keep formulas on a single line for readability and use KaTeX consistently.
Continue to Lesson 3-2 for hypothesis testing with two samples.

Common Misconceptions

  • Confusing P(AB)P(A\mid B) with P(BA)P(B\mid A).
  • Ignoring prior probabilities, leading to counterintuitive conclusions.
  • Forgetting to use the Law of Total Probability to calculate evidence P(B)P(B).

Real-World Applications

  • Medical screening: interpreting positive test results with low disease prevalence.
  • Spam classification: using Naive Bayes with keyword likelihoods.
  • Risk assessment: updating beliefs with new evidence and historical priors.