Gini Index Explained: How CART Decision Trees Pick the Best Split

Decision trees might be the most intuitive algorithm in machine learning. They work exactly like how you make decisions—through a series of questions that progressively narrow down the answer.

Imagine you're a bank loan officer. Faced with a loan application, you might ask:

Does the applicant have a stable job?
What's their credit score?
Is their debt-to-income ratio too high?

Each question divides applicants into different groups. Eventually, you can give a clear judgment for each group: approve or reject.

This is the essence of decision trees—using a series of questions to "split" data into increasingly pure chunks.

But there's a key question: which question should we ask first?

Ask the right one, and the data becomes clear immediately. Ask the wrong one, and things remain a mess.

Entropy: The First "Purity Meter"

In a previous article, we introduced Entropy—it uses a logarithmic formula to measure how "chaotic" your data is:

H(D) = -\sum p_k \log_2(p_k)

Higher entropy means more mixed data; lower entropy means purer data. The ID3 and C4.5 decision tree algorithms use entropy to select the best splitting feature.

But entropy has a small drawback: logarithm operations add computational overhead.

Is there a simpler alternative?

Gini Index: Another Purity Measure

The Gini Index is the purity metric used by CART decision trees. Its idea is more intuitive:

If you randomly pick two samples from a group, what's the probability they belong to different classes?

Higher probability = more impure. Lower probability = purer.

The formula:

Gini(D) = 1 - \sum (p_k)^2

p_k: proportion of samples in class k
Σ(p_k)²: sum of squared proportions for each class
1 minus this sum: probability of picking two different classes

Gini Index range (for binary classification):

Gini = 0

Perfectly pure (100% one class)

Gini = 0.5

Maximum impurity (50/50 split)

Compared to entropy, Gini has no logarithm operations and computes faster—which is why Scikit-learn uses it by default.

Complete Example: Building a Decision Tree with Gini

Let's walk through a complete dataset and demonstrate how CART uses Gini to select splitting features.

Scenario: Bank loan approval. The goal is to predict whether an applicant will default.

Dataset (20 records)

ID	Age	Home Owner	Credit	Income	Default?
1	Young	No	Fair	Low	Yes
2	Young	No	Fair	Medium	Yes
3	Young	Yes	Fair	Low	No
4	Young	Yes	Good	Low	No
5	Young	No	Good	Medium	No
6	Middle	No	Fair	Low	Yes
7	Middle	No	Fair	Medium	No
8	Middle	Yes	Fair	Medium	No
9	Middle	Yes	Good	High	No
10	Middle	Yes	Good	Medium	No
11	Senior	Yes	Good	High	No
12	Senior	Yes	Good	Medium	No
13	Senior	No	Fair	High	Yes
14	Senior	No	Fair	Medium	Yes
15	Senior	Yes	Fair	Low	Yes
16	Senior	No	Good	High	No
17	Senior	No	Good	Medium	No
18	Young	No	Fair	High	Yes
19	Middle	No	Good	High	No
20	Young	Yes	Good	Medium	No

Summary: 20 records, 8 defaults (Yes), 12 non-defaults (No)

Step 1: Calculate Initial Gini Index

Before any split, what's the impurity of the entire dataset?

$p_{Yes} = 8/20 = 0.40$

$p_{No} = 12/20 = 0.60$

Gini(D) = 1 - (0.40^2 + 0.60^2) = 1 - 0.52 = 0.48

Initial Gini = 0.48 (moderate impurity)

Step 2: Evaluate Each Feature

We'll test all 4 features to find which produces the lowest weighted Gini Index after splitting.

Feature 1: Age

Age	Count	Yes	No	Gini
Young	7	3	4	0.49
Middle	6	1	5	0.28
Senior	7	4	3	0.49

Weighted average: $(7/20)(0.49) + (6/20)(0.28) + (7/20)(0.49) = 0.428$

Feature 2: Home Owner

Home Owner	Count	Yes	No	Gini
Yes	8	1	7	0.22
No	12	7	5	0.49

Weighted average: $(8/20)(0.22) + (12/20)(0.49) = 0.382$

Feature 3: Credit Rating

Credit	Count	Yes	No	Gini
Fair	10	7	3	0.42
Good	10	1	9	0.18

Weighted average: $(10/20)(0.42) + (10/20)(0.18) = 0.30$

Feature 4: Income Level

Income	Count	Yes	No	Gini
Low	5	3	2	0.48
Medium	9	3	6	0.44
High	6	2	4	0.44

Weighted average: $(5/20)(0.48) + (9/20)(0.44) + (6/20)(0.44) = 0.45$

Step 3: Select the Best Feature

Feature	Weighted Gini
Age	0.428
Home Owner	0.382
Credit Rating ✓	0.30
Income	0.45

Credit Rating has the lowest weighted Gini (0.30), so it becomes the root node.

Step 4: Recursive Splitting

Next, for the "Credit = Fair" branch (10 samples) and "Credit = Good" branch (10 samples), we repeat the same process to select the next best splitting feature...

This continues until:

Leaf nodes are completely pure (Gini = 0), or
A stopping condition is reached (max depth, min samples, etc.)

Gini Index vs. Entropy

Metric	Formula	Range	Notes
Entropy	-Σ p log₂(p)	0 ~ 1	Information theory basis, units in "bits"
Gini Index	1 - Σ p²	0 ~ 0.5	No logarithms, faster computation

In practice, they produce nearly identical trees. Unless you have specific theoretical requirements, either works well.

Key Takeaways

Decision trees split data using a series of questions, aiming for increasingly pure groups
Gini Index measures the probability of picking two samples of different classes
Feature selection: choose the feature that minimizes weighted Gini after splitting
Gini vs. Entropy: different formulas, nearly identical results in practice

Ready to learn more?

Our Decision Trees course covers advanced topics like pruning strategies, handling overfitting, and when decision trees outperform other algorithms.