Learn advanced decision trees that use linear combinations of features for oblique splits— creating more expressive and accurate decision boundaries
Traditional decision trees (univariate/axis-parallel) test one feature at a time at each node (e.g., "income > 50k?" or "age ≤ 30?"). This creates decision boundaries parallel to feature axes.Multivariate decision trees (also called oblique decision trees) test linear combinations of multiple features at each node (e.g., "2×income + 3×spending > 250?"), creating diagonal (oblique) decision boundaries that can better fit complex data patterns.
Each node tests a single feature. Splits are perpendicular to feature axes, creating rectangular regions.
Limitations: May need many splits to approximate diagonal boundaries; less expressive.
Each node tests a weighted combination of features. Splits can be diagonal, creating more flexible regions.
Advantages: More compact trees; better for correlated features; can model complex boundaries directly.
Imagine classifying customers based on income and spending. An axis-parallel tree might create many rectangular regions with rules like "if income > 50k AND spending > 30k...". A multivariate tree could capture the same pattern more elegantly with a single oblique split: "if 0.5×income + 0.5×spending > 40k, then premium customer". This is especially powerful when features are naturally correlated.
Linear discriminants at tree nodes
In a multivariate decision tree, each internal node performs a test of the form:
w₁x₁ + w₂x₂ + ... + wₐxₐ ≤ t
or more compactly: wTx ≤ t
Where:
Test: x₁ ≤ t (e.g., income ≤ 50k)
In 2D feature space (income, spending), this creates a vertical line at income=50k. Decision boundary is perpendicular to the x-axis.
Weights: w = [1, 0]
Equation: 1×income + 0×spending = 50
Test: 0.6×income + 0.4×spending ≤ 50
Creates a diagonal line through the 2D space. Decision boundary has non-zero slope, capturing correlations between features.
Weights: w = [0.6, 0.4]
Rearranged: spending = -1.5×income + 125
Separating Premium vs Budget customers based on income (in $1000s) and monthly spending (in $100s):
Split 1: income ≤ 60 → majority Budget
Split 2: income > 60 AND spending ≤ 50 → Standard
Split 3: income > 60 AND spending > 50 → Premium
Needs 3 splits to separate classes; creates staircase boundary
Split 1: 0.7×income + 0.3×spending ≤ 55 → Budget
Otherwise → Premium/Standard (one more split)
Single oblique split captures the diagonal trend; more compact tree
The challenge in multivariate trees is determining the weight vector w and thresholdt. Unlike univariate trees where you test each feature independently, here you must search a continuous space of possible linear combinations. Several methods exist:
Use LDA to find the direction that maximizes class separability. LDA finds weights that maximize the ratio of between-class variance to within-class variance.
Process:
Train a linear SVM at each node to find the hyperplane that best separates classes with maximum margin.
Process:
Use optimization algorithms to search for weights that minimize impurity (Gini or entropy).
Process:
Generate many random weight vectors, evaluate each, and keep the best (used in some ensemble methods).
Process:
Computational Cost: Finding optimal multivariate splits is NP-hard and much more expensive than univariate splits. For d features, univariate trees evaluate O(d) candidates per node. Multivariate trees must search the continuous space of all possible linear combinations—exponentially harder. This is why multivariate trees are less common in practice, though they can achieve better accuracy with fewer nodes.
An e-commerce company wants to segment customers into Budget, Standard, and Premium tiers based on their behavior:
| ID | Income ($k) | Spending ($k) | Age | Loyalty (yrs) | Segment |
|---|---|---|---|---|---|
| 1 | 45 | 30 | 28 | 2 | Budget |
| 2 | 85 | 70 | 35 | 5 | Premium |
| 3 | 55 | 40 | 42 | 3 | Standard |
| 4 | 95 | 80 | 38 | 6 | Premium |
| 5 | 35 | 25 | 25 | 1 | Budget |
| 6 | 70 | 55 | 45 | 4 | Standard |
| 7 | 50 | 45 | 31 | 3 | Standard |
| 8 | 40 | 28 | 29 | 2 | Budget |
Root:
income ≤ 60?
Left → mostly Budget, Right → Standard/Premium mixed
Right child:
spending ≤ 50?
Left → Standard, Right → Premium
Statistics:
Root:
0.6×income + 0.4×spending ≤ 48?
Left → Budget, Right → Standard/Premium
Right child:
0.7×spending + 0.3×loyalty ≤ 60?
Left → Standard, Right → Premium
Statistics:
| Aspect | Univariate (Axis-Parallel) | Multivariate (Oblique) |
|---|---|---|
| Decision Boundary | Axis-parallel (perpendicular to features) | Oblique (any angle, more flexible) |
| Tree Size | Larger (more splits needed) | Smaller (more expressive splits) |
| Training Time | Fast (O(d×n log n) per node) | Slow (optimization at each node) |
| Interpretability | Very high (simple rules) | Moderate (linear combinations) |
| Accuracy | Good for independent features | Better for correlated features |
| Implementation | Simple, widely available | Complex, limited libraries |
| Best For | General use, high-dimensional data | Correlated features, low dimensions |
A: Standard scikit-learn only supports univariate trees. However, you can find implementations in specialized libraries: obliquetree (Python package), OC1 (Oblique Classifier 1, classic algorithm), and some research implementations on GitHub. Alternatively, you can implement LDA-based splits yourself by wrapping scikit-learn's DecisionTreeClassifier with custom splitting logic. For most applications, standard univariate trees or ensemble methods (Random Forest, XGBoost) perform well enough.
A: Three main reasons: (1) Computational cost—finding optimal linear combinations is much slower than testing single features, especially with many features. (2) Interpretability loss—rules like "0.73×income + 0.41×age - 0.19×spending > 42" are harder to explain than "income > 50k". (3) Diminishing returns with ensembles—modern ensemble methods (Random Forests, Gradient Boosting) combine many univariate trees to implicitly approximate oblique boundaries, achieving similar or better accuracy without the complexity.
A: Yes! A hybrid approach can evaluate both univariate and multivariate candidates at each node, choosing whichever provides better impurity reduction. This balances expressiveness with computational cost—use oblique splits only where they significantly help. Some implementations test a few random linear combinations alongside all single-feature splits, keeping the best overall. This adds moderate computational overhead while capturing some benefits of multivariate trees.
A: Standard Random Forests use univariate trees. However, there are research variants calledOblique Random Forests that use multivariate splits at each node. These can outperform standard RF on certain datasets but are computationally expensive and rarely used in practice. Interestingly, ensembles of univariate trees can approximate oblique boundaries through stacking—different trees split on different features, and the ensemble vote effectively creates non-axis-parallel boundaries at the meta-level.
A: Yes, strongly recommended! Since weights combine features, different scales affect the linear combination. If income (0-200k) and age (18-100) are on different scales, the weight on income will naturally be smaller to compensate. Normalizing (z-score or min-max) to similar ranges ensures weights reflect true feature importance, not just their scales. Methods like LDA and SVM handle this internally, but for custom implementations or gradient-based optimization, normalize first.
Multivariate trees use linear combinations: w₁x₁ + w₂x₂ + ... ≤ t
Create oblique (diagonal) decision boundaries, more flexible than axis-parallel
More compact trees (fewer nodes) for same accuracy when features correlated
Finding weights is computationally expensive: LDA, SVM, or optimization algorithms
Trade-off: Better accuracy vs longer training time and lower interpretability
Best for low-dimensional data with correlated features
Univariate trees remain default due to speed, interpretability, ensemble success
Limited library support; mostly research/specialized implementations