MathIsimple
Article
15 min read

Naive Bayes Explained with a 20-Patient Flu Diagnosis Example

A step-by-step walkthrough of priors, likelihoods, and posterior probabilities

Naive Bayes
Bayesian
Classification
Probability
Machine Learning Basics

Naive Bayes has a reputation for being both surprisingly simple and surprisingly useful.

If you have a small symptom table, a handful of categories, and you need a transparent classifier instead of a black box, Naive Bayes is often the first model worth trying.

The name sounds intimidating, but the core idea is ordinary probability: start with how common each class is, then keep updating that belief with evidence.

A 20-Patient Flu Dataset

Suppose a clinic recorded 20 historical cases with four categorical features:

  • Temperature: normal, fever, high fever
  • Cough: none, mild, severe
  • Headache: yes or no
  • Fatigue: yes or no

The final diagnosis was:

  • Flu: 13 patients
  • Not Flu: 7 patients
FeatureValueCount in FluCount in Not Flu
Temperaturenormal07
Temperaturefever80
Temperaturehigh fever50
Coughnone04
Coughmild43
Coughsevere90
Headacheyes131
Headacheno06
Fatigueyes111
Fatigueno26

The Bayes Rule Behind the Model

Naive Bayes estimates the posterior probability of each class:

P(yx)=P(xy)P(y)P(x)P(y \mid x) = \frac{P(x \mid y) P(y)}{P(x)}

The "Naive" Assumption

Instead of estimating one giant joint probability for all symptoms, the model assumes features are independent and multiplies individual conditional probabilities:

P(xy)P(x1y)P(x2y)P(x3y)P(x4y)P(x \mid y) \approx P(x_1 \mid y) P(x_2 \mid y) P(x_3 \mid y) P(x_4 \mid y)

Step 1: Priors

Before seeing any symptoms, the prior class probabilities are:

  • P(Flu) = 13 / 20 = 0.65
  • P(Not Flu) = 7 / 20 = 0.35

Step 2: A New Patient

Now a new patient arrives with:

  • Temperature: fever
  • Cough: severe
  • Headache: yes
  • Fatigue: yes

Step 3: Laplace-Smoothed Likelihoods

Why Laplace Smoothing?

If we use raw counts, the "not flu" class gets zero probability because no non-flu patient in the training data had fever or severe cough. That would make the entire product zero — eliminating the class completely. Laplace smoothing adds 1 to each count to fix this.

Temperature and cough each have 3 categories. Headache and fatigue each have 2 categories.

LikelihoodP(· | Flu)P(· | Not Flu)
Fever(8+1)/(13+3) = 9/16(0+1)/(7+3) = 1/10
Severe Cough(9+1)/(13+3) = 10/16(0+1)/(7+3) = 1/10
Headache = yes(13+1)/(13+2) = 14/15(1+1)/(7+2) = 2/9
Fatigue = yes(11+1)/(13+2) = 12/15(1+1)/(7+2) = 2/9

Step 4: Unnormalized Posterior Scores

Score(Flu)=1320×916×1016×1415×12150.1706Score(Flu) = \frac{13}{20} \times \frac{9}{16} \times \frac{10}{16} \times \frac{14}{15} \times \frac{12}{15} \approx 0.1706Score(Not  Flu)=720×110×110×29×290.00017Score(Not\;Flu) = \frac{7}{20} \times \frac{1}{10} \times \frac{1}{10} \times \frac{2}{9} \times \frac{2}{9} \approx 0.00017

The flu score is orders of magnitude larger, so the predicted class is Flu.

What This Example Teaches

  • Priors matter: common classes start with an advantage.
  • Likelihoods matter more: when certain symptoms are highly concentrated in one class.
  • Laplace smoothing matters: whenever rare combinations would otherwise create zero probabilities.
  • Interpretability is the big win: every prediction can be traced back to simple counts.

That's why Naive Bayes still shows up in text classification, email filtering, triage systems, and small tabular problems. It's fast, explainable, and often much more competitive than its "naive" label suggests.

Want to Explore More ML Fundamentals?

Dive into Bayesian Decision Theory, generative models, and beyond in our comprehensive Machine Learning course.

Ask AI ✨