Lesson 3-1: Bayes' Theorem & Conditional Probability

We build intuition first, then derive formulas, then practice with real data. All formulas are rendered with KaTeX and kept on a single line for readability.

Learning Objectives

Interpret events, sample spaces, and conditional probability.
Apply the Law of Total Probability to compute evidence.
Use Bayes' theorem to convert prior beliefs into posterior probabilities.
Model simple text classification using a Naive Bayes assumption.

Core Formulas

Conditional Probability

P(A\mid B) = \dfrac{P(A\cap B)}{P(B)}

Law of Total Probability

P(B) = \sum_{i} P(B\mid A_i)\,P(A_i)

Bayes' Theorem

P(A_i\mid B) = \dfrac{P(B\mid A_i)\,P(A_i)}{\sum_j P(B\mid A_j)\,P(A_j)}

Worked Example 1 — Medical Test

Suppose a disease prevalence is $P(D)=0.01$ , test sensitivity is $P(+\mid D)=0.95$ , and specificity is $P(-\mid \neg D)=0.95$ . If a patient tests positive, what is $P(D\mid +)$ ?

Step 1: Evidence

P(+) = P(+\mid D)P(D) + P(+\mid \neg D)P(\neg D) = 0.95\cdot 0.01 + 0.05\cdot 0.99 = 0.059

Step 2: Posterior

P(D\mid +) = \dfrac{P(+\mid D)P(D)}{P(+)} = \dfrac{0.95\cdot 0.01}{0.059} \approx 0.161

Interpretation: despite a positive result, the probability of actually having the disease is only about 16.1% due to the low prior.

Worked Example 2 — Email Classification (Naive Bayes)

Classes: Spam (S), Work (W), Personal (P). Priors: $P(S)=0.4,\ P(W)=0.35,\ P(P)=0.25$ . Let event $U$ denote that the token "urgent" appears. Likelihoods: $P(U\mid S)=0.1,\ P(U\mid W)=0.8,\ P(U\mid P)=0.05$ .

Step 1: Evidence

P(U) = 0.1\cdot 0.4 + 0.8\cdot 0.35 + 0.05\cdot 0.25 = 0.3325

Step 2: Posterior

P(S\mid U) = \dfrac{0.1\cdot 0.4}{0.3325} \approx 0.120

P(W\mid U) = \dfrac{0.8\cdot 0.35}{0.3325} \approx 0.842

P(P\mid U) = \dfrac{0.05\cdot 0.25}{0.3325} \approx 0.038

Result: with the token “urgent”, the email is most likely Work (≈84.2%).

Practice Problems

Problem 1

Given $P(A)=0.3$ , $P(B\mid A)=0.8$ , $P(B\mid \neg A)=0.4$ , find $P(A\mid B)$ .

P(B)=0.8\cdot 0.3 + 0.4\cdot 0.7 = 0.52,\quad P(A\mid B)=\dfrac{0.8\cdot 0.3}{0.52}\approx 0.462

Problem 2

Partition the sample space into three disjoint events $A_1,A_2,A_3$ with priors $(0.5,0.3,0.2)$ and likelihoods for evidence $B$ : $(0.2,0.5,0.7)$ . Find posteriors.

P(B)=0.39,\ P(A_1\mid B)\approx 0.256,\ P(A_2\mid B)\approx 0.385,\ P(A_3\mid B)\approx 0.359

Key Takeaways

Posterior = Likelihood × Prior ÷ Evidence. The evidence is a weighted average over competing hypotheses.
Always check priors: rare events with imperfect tests often yield counterintuitive posteriors.
Keep formulas on a single line for readability and use KaTeX consistently.

Continue to Lesson 3-2 for hypothesis testing with two samples.

Common Misconceptions

Confusing $P(A\mid B)$ with $P(B\mid A)$ .
Ignoring prior probabilities, leading to counterintuitive conclusions.
Forgetting to use the Law of Total Probability to calculate evidence $P(B)$ .

Real-World Applications

Medical screening: interpreting positive test results with low disease prevalence.
Spam classification: using Naive Bayes with keyword likelihoods.
Risk assessment: updating beliefs with new evidence and historical priors.

← Back to Unit 3 Next: Lesson 3-2 →