Simple yet powerful probabilistic classifier with attribute independence assumption
The Naive Bayes Classifier is one of the simplest yet most effective probabilistic classifiers. Its "naive" name comes from a strong assumption: that all attributes are conditionally independentgiven the class label.
In Bayes' theorem, estimating the class-conditional probability is the main obstacle. When sample contains multiple attributes, the joint probability is difficult to estimate from limited training samples, leading to combinatorial explosion and data sparsity problems.
Assume that given class , all attributes are conditionally independent. This allows us to decompose the joint probability into a product of individual attribute probabilities.
Starting with Bayes' theorem:
Under the attribute independence assumption, the class-conditional probability can be decomposed:
where is the number of attributes and is the value of attribute
Substituting into Bayes' theorem and ignoring the evidence (same for all classes):
The Naive Bayes classifier decision rule is:
Estimate the prior probability using class frequencies:
where is the number of samples of class in training set , and is the total number of samples
Let denote the set of samples in where attribute takes value :
Typically assume a Gaussian (normal) distribution. If, the probability density function is:
where and are the mean and variance of attribute for class , estimated using MLE from training samples.
If a certain attribute value never appears with a class in the training set, the estimated probability. Since Naive Bayes uses multiplication, this causes the entire posterior probability to become zero, "erasing" information from other attributes.
If training data never shows "敲声=清脆" (sound=crisp) for "好瓜=是" (good melon=yes), then. When encountering a test sample with "敲声=清脆", regardless of how good other attributes look, it will be classified as "not good melon".
Laplacian correction (also called smoothing) adds a small constant to the numerator and denominator to avoid zero probabilities:
Corrected prior probability:
where is the number of possible classes in training set
Corrected class-conditional probability:
where is the number of possible values for attribute
Laplacian correction assumes a uniform distribution over attribute values and classes, introducing some bias. However, it effectively solves the zero probability problem and improves model robustness, especially with small datasets.
Step-by-step Naive Bayes classification with mixed discrete and continuous attributes
Test Sample: 色泽=青绿 (Color=Green), 根蒂=蜷缩 (Stem=Curled), 敲声=浊响 (Sound=Dull), 纹理=清晰 (Texture=Clear), 脐部=凹陷 (Navel=Sunken), 触感=硬滑 (Touch=Hard-smooth), 密度=0.697 (Density=0.697), 含糖率=0.460 (Sugar=0.460)
Goal: Classify as "好瓜=是" (Good Melon=Yes) or "好瓜=否" (Good Melon=No)
Assume training set has 8 good melons and 9 bad melons, total 17 samples:
色泽=青绿 (Color=Green):
根蒂=蜷缩 (Stem=Curled):
敲声=浊响 (Sound=Dull):
纹理=清晰 (Texture=Clear):
脐部=凹陷 (Navel=Sunken):
触感=硬滑 (Touch=Hard-smooth):
For continuous attributes, assume Gaussian distribution. After calculating mean and variance from training data:
密度=0.697 (Density=0.697):
含糖率=0.460 (Sugar=0.460):
For "好瓜=是" (Good Melon=Yes):
For "好瓜=否" (Good Melon=No):
Since , the Naive Bayes classifier classifies the test sample as:
好瓜=是 (Good Melon=Yes)
If training data never shows "敲声=清脆" (Sound=Crisp) for "好瓜=是" (Good Melon=Yes), then:
This causes , regardless of other attributes, leading to incorrect classification.
Corrected prior probabilities:
Corrected conditional probabilities:
Strategy: Pre-compute all probability estimates, then use direct "lookup" during prediction. Naive Bayes training mainly involves counting frequencies and computing means/variances. Once complete, prediction requires only simple multiplication and comparison—extremely fast.
Strategy: No training required. Compute estimates on-the-fly when prediction requests arrive (lazy learning). Suitable for data streams or scenarios requiring real-time model updates, avoiding frequent retraining.
Strategy: Based on existing estimates, update only probability estimates involving new samples (incremental learning). Naive Bayes easily supports incremental learning by updating relevant counters or statistics without reprocessing all historical data.
Application: Email spam detection, document categorization, sentiment analysis. Naive Bayes works excellently for text because word independence assumption is often reasonable, and it handles high-dimensional feature spaces efficiently.