The attribute independence assumption in Naive Bayes is often too strong in practice. Real-world attributes frequently exhibit dependencies. Semi-Naive Bayes classifiers aim to appropriately consider some attribute dependencies to improve classification performance.
How to model dependencies without falling into the combinatorial explosion problem? We need methods that balance between the simplicity of Naive Bayes and the complexity of full dependency modeling.
Assume each attribute depends on at most one other attribute besides the class. This dependent attribute is called the parent attribute .
where is the parent attribute of attribute
The main challenge is determining which attribute should be the parent for each attribute. Different methods (SPODE, TAN, AODE) use different strategies to address this.
SPODE assumes all attributes depend on the same attribute, called the super-parent. This attribute serves as the parent for all other attributes.
For SPODE with super-parent :
The best super-parent is determined through cross-validation or other model selection methods. We try each attribute as the super-parent and choose the one that gives the best performance.
SPODE simplifies the dependency structure while still being more flexible than Naive Bayes. It's easier to implement and train than more complex methods like TAN.
where is determined by the tree structure (either the class or another attribute)
TAN builds a tree structure that captures the most important attribute dependencies while maintaining computational efficiency. It balances between model complexity and performance.
AODE is a more powerful ODE method proposed by Geoff Webb. Instead of choosing a single super-parent, it averages over multiple SPODE models:
where is the set of samples where attribute equals , and is a threshold constant to ensure sufficient training data support
Joint probability:
Conditional probability:
where is the set of samples with class , attribute equals , and attribute equals
By averaging multiple SPODE models, AODE reduces the risk of choosing a poor super-parent. It improves robustness and accuracy compared to a single SPODE model, while still being more efficient than full Bayesian networks.
Class is the parent of all attributes . Attributes are independent given the class.
Class is parent of all attributes. (super-parent) is also parent of all other attributes.
Class is parent of all attributes. Attributes form a tree structure representing dependencies.
Applying TAN to model customer attribute dependencies
We want to classify customers into segments (High-Value, Medium-Value, Low-Value) based on attributes:
Step 1: Calculate conditional mutual information between attributes given the class. We might find that Income and Purchase Frequency are strongly dependent.
Step 2: Build maximum weighted spanning tree. The tree might show:
Class → Age → Income → Purchase Frequency → Average Order Value
Step 3: Use this structure for classification, where each attribute depends on the class and its parent in the tree.
Can we improve generalization performance by considering higher-order dependencies between attributes?
We could extend ODE to kDE (k-Dependent Estimator) by replacing the parent attribute with a set of attributes:
where now contains attributes
As increases, the number of samples needed to estimate grows exponentially.