Classification problems have a reality that's often overlooked: the cost of being wrong isn't always equal.
Letting a spam email into your inbox? Annoying, but manageable. Sending an important work email to spam? Could cost you a deal.
Misdiagnosing a healthy person as sick? They get extra tests, some worry. Misdiagnosing a sick person as healthy? They might miss their treatment window, with serious consequences.
Traditional classifiers only care about "right vs. wrong," but the real world demands: make decisions that minimize risk, considering the cost of each type of error.
This is exactly what Bayesian Decision Theory addresses.
The Setup: Medical Diagnosis
Imagine you're a doctor deciding whether a patient is infected with a disease. You have two classes:
- Class 1 (Healthy): Patient is not infected
- Class 2 (Sick): Patient is infected
Your task: make a diagnosis based on the patient's symptoms.
Historical Patient Data
Here are 10 confirmed cases from your hospital's records:
| ID | Temp (°C) | Cough | Headache | Fatigue | Diagnosis |
|---|---|---|---|---|---|
| 1 | 37.2 | No | Mild | No | Healthy |
| 2 | 38.5 | Yes | Severe | Yes | Sick |
| 3 | 36.8 | No | None | No | Healthy |
| 4 | 39.1 | Yes | Moderate | Yes | Sick |
| 5 | 37.0 | No | Mild | No | Healthy |
| 6 | 38.8 | Yes | Severe | Yes | Sick |
| 7 | 37.5 | Yes | Moderate | No | Healthy |
| 8 | 39.5 | Yes | Severe | Yes | Sick |
| 9 | 36.9 | No | None | No | Healthy |
| 10 | 38.2 | Yes | Moderate | Yes | Sick |
Summary: 5 healthy (50%), 5 sick (50%)
A New Patient: How Do We Diagnose?
A new patient arrives with these symptoms:
| Feature | Value |
|---|---|
| Temperature | 38.3°C |
| Persistent Cough | Yes |
| Headache | Moderate |
| Fatigue | No |
Some symptoms suggest sickness (fever, cough, headache), but others suggest health (no fatigue). Based on the historical data and these symptoms, you estimate the posterior probabilities:
- P(Healthy | symptoms) = 0.3 (30% chance healthy)
- P(Sick | symptoms) = 0.7 (70% chance sick)
If we only looked at probability, the answer is simple: 70% is greater than 30%, diagnose as sick.
But wait—we still need to consider the cost of being wrong.
Misclassification Loss: Pricing Your Errors
We use to denote the cost of classifying a sample that's truly class j as class i:
| Decision → Reality ↓ | Diagnose "Healthy" | Diagnose "Sick" |
|---|---|---|
| Actually Healthy | λ₁₁ = 0 (correct) | λ₂₁ = 10 (false alarm) |
| Actually Sick | λ₁₂ = 100 (missed diagnosis) | λ₂₂ = 0 (correct) |
Notice the asymmetry:
- Missed diagnosis (λ₁₂ = 100): Diagnosing a sick patient as healthy could delay treatment, with serious consequences
- False alarm (λ₂₁ = 10): Diagnosing a healthy patient as sick just means extra tests—relatively minor
This 10:1 ratio reflects medical reality—better to over-test than to miss a diagnosis.
Conditional Risk: Do the Math Before You Decide
The core formula: if you decide to classify a patient as class i, what's the expected risk?
In plain English:
Risk of diagnosing as i = sum of (misclassification cost × probability) for all possible true classes
Complete Calculation: Risk of Each Decision
Back to our patient (30% healthy, 70% sick).
Decision 1: Diagnose as "Healthy"
Expected risk = 70
Interpretation: 30% chance we're correct (no cost), but 70% chance we miss a diagnosis (cost 100). Average expected loss: 70.
Decision 2: Diagnose as "Sick"
Expected risk = 3
Interpretation: 30% chance of false alarm (cost 10), 70% chance we're correct (no cost). Average expected loss: only 3.
The Optimal Decision
| Decision | Conditional Risk |
|---|---|
| Diagnose Healthy | 70 |
| Diagnose Sick ✓ | 3 |
Optimal decision: Diagnose as sick, order further testing.
A Counter-Intuitive Case
What if the probabilities were different? Consider another patient:
- P(Healthy | symptoms) = 0.8 (80% healthy)
- P(Sick | symptoms) = 0.2 (20% sick)
By probability alone, we should diagnose as healthy. But let's check the risks:
Diagnose "Healthy"
R = 0 × 0.8 + 100 × 0.2 = 20
Diagnose "Sick"
R = 10 × 0.8 + 0 × 0.2 = 8
Result: Even with 80% probability of being healthy, the optimal decision is still to diagnose as sick (risk 8 < 20)!
This is the essence of Bayesian decision theory: we don't pick the most likely answer—we pick the answer with the lowest expected risk.
Where's the Threshold?
At what probability does "diagnose healthy" become optimal? Let P(Healthy) = p:
- Risk of "healthy": R₁ = 100(1-p)
- Risk of "sick": R₂ = 10p
"Healthy" is better when R₁ < R₂:
100 - 100p < 10p
100 < 110p
p > 0.909
Only when the probability of being healthy exceeds 90.9% should we diagnose as healthy. This threshold is far above 50% precisely because of the asymmetric costs (100:10).
Special Case: 0-1 Loss
If all errors cost the same (wrong = 1, correct = 0), the cost matrix becomes symmetric and the conditional risk simplifies to:
The optimal decision is simply to pick the class with highest posterior probability—this is the familiar "maximum a posteriori" (MAP) classification.
So the traditional "pick the most probable class" approach is actually a special case of Bayesian decision theory, valid only when all errors are equally costly.
Key Takeaways
- Misclassification loss λᵢⱼ: The cost of classifying true class j as class i—different errors have different costs
- Posterior probability P(cⱼ|x): The probability of each class given the observed data
- Conditional risk R(cᵢ|x): Expected loss of classifying as i = Σ (cost × probability)
- Optimal decision: Pick the class with minimum conditional risk, not necessarily maximum probability
- Asymmetric costs shift thresholds: When missed diagnoses are costly, the threshold for "healthy" becomes very high