MathIsimple
Article
14 min read

Gini Index Explained: How CART Decision Trees Pick the Best Split

A plain-English guide to understanding impurity and feature selection

2026-01-18
Gini Index
CART
Decision Trees
Classification
Machine Learning

Decision trees might be the most intuitive algorithm in machine learning. They work exactly like how you make decisions—through a series of questions that progressively narrow down the answer.

Imagine you're a bank loan officer. Faced with a loan application, you might ask:

  • Does the applicant have a stable job?
  • What's their credit score?
  • Is their debt-to-income ratio too high?

Each question divides applicants into different groups. Eventually, you can give a clear judgment for each group: approve or reject.

This is the essence of decision trees—using a series of questions to "split" data into increasingly pure chunks.

But there's a key question: which question should we ask first?

Ask the right one, and the data becomes clear immediately. Ask the wrong one, and things remain a mess.

Entropy: The First "Purity Meter"

In a previous article, we introduced Entropy—it uses a logarithmic formula to measure how "chaotic" your data is:

H(D)=pklog2(pk)H(D) = -\sum p_k \log_2(p_k)

Higher entropy means more mixed data; lower entropy means purer data. The ID3 and C4.5 decision tree algorithms use entropy to select the best splitting feature.

But entropy has a small drawback: logarithm operations add computational overhead.

Is there a simpler alternative?

Gini Index: Another Purity Measure

The Gini Index is the purity metric used by CART decision trees. Its idea is more intuitive:

If you randomly pick two samples from a group, what's the probability they belong to different classes?

Higher probability = more impure. Lower probability = purer.

The formula:

Gini(D)=1(pk)2Gini(D) = 1 - \sum (p_k)^2
  • p_k: proportion of samples in class k
  • Σ(p_k)²: sum of squared proportions for each class
  • 1 minus this sum: probability of picking two different classes

Gini Index range (for binary classification):

Gini = 0

Perfectly pure (100% one class)

Gini = 0.5

Maximum impurity (50/50 split)

Compared to entropy, Gini has no logarithm operations and computes faster—which is why Scikit-learn uses it by default.

Complete Example: Building a Decision Tree with Gini

Let's walk through a complete dataset and demonstrate how CART uses Gini to select splitting features.

Scenario: Bank loan approval. The goal is to predict whether an applicant will default.

Dataset (20 records)

IDAgeHome OwnerCreditIncomeDefault?
1YoungNoFairLowYes
2YoungNoFairMediumYes
3YoungYesFairLowNo
4YoungYesGoodLowNo
5YoungNoGoodMediumNo
6MiddleNoFairLowYes
7MiddleNoFairMediumNo
8MiddleYesFairMediumNo
9MiddleYesGoodHighNo
10MiddleYesGoodMediumNo
11SeniorYesGoodHighNo
12SeniorYesGoodMediumNo
13SeniorNoFairHighYes
14SeniorNoFairMediumYes
15SeniorYesFairLowYes
16SeniorNoGoodHighNo
17SeniorNoGoodMediumNo
18YoungNoFairHighYes
19MiddleNoGoodHighNo
20YoungYesGoodMediumNo

Summary: 20 records, 8 defaults (Yes), 12 non-defaults (No)

Step 1: Calculate Initial Gini Index

Before any split, what's the impurity of the entire dataset?

pYes=8/20=0.40p_{Yes} = 8/20 = 0.40

pNo=12/20=0.60p_{No} = 12/20 = 0.60

Gini(D)=1(0.402+0.602)=10.52=0.48Gini(D) = 1 - (0.40^2 + 0.60^2) = 1 - 0.52 = 0.48

Initial Gini = 0.48 (moderate impurity)

Step 2: Evaluate Each Feature

We'll test all 4 features to find which produces the lowest weighted Gini Index after splitting.

Feature 1: Age

AgeCountYesNoGini
Young7340.49
Middle6150.28
Senior7430.49

Weighted average: (7/20)(0.49)+(6/20)(0.28)+(7/20)(0.49)=0.428(7/20)(0.49) + (6/20)(0.28) + (7/20)(0.49) = 0.428

Feature 2: Home Owner

Home OwnerCountYesNoGini
Yes8170.22
No12750.49

Weighted average: (8/20)(0.22)+(12/20)(0.49)=0.382(8/20)(0.22) + (12/20)(0.49) = 0.382

Feature 3: Credit Rating

CreditCountYesNoGini
Fair10730.42
Good10190.18

Weighted average: (10/20)(0.42)+(10/20)(0.18)=0.30(10/20)(0.42) + (10/20)(0.18) = 0.30

Feature 4: Income Level

IncomeCountYesNoGini
Low5320.48
Medium9360.44
High6240.44

Weighted average: (5/20)(0.48)+(9/20)(0.44)+(6/20)(0.44)=0.45(5/20)(0.48) + (9/20)(0.44) + (6/20)(0.44) = 0.45

Step 3: Select the Best Feature

FeatureWeighted Gini
Age0.428
Home Owner0.382
Credit Rating ✓0.30
Income0.45

Credit Rating has the lowest weighted Gini (0.30), so it becomes the root node.

Step 4: Recursive Splitting

Next, for the "Credit = Fair" branch (10 samples) and "Credit = Good" branch (10 samples), we repeat the same process to select the next best splitting feature...

This continues until:

  • Leaf nodes are completely pure (Gini = 0), or
  • A stopping condition is reached (max depth, min samples, etc.)

Gini Index vs. Entropy

MetricFormulaRangeNotes
Entropy-Σ p log₂(p)0 ~ 1Information theory basis, units in "bits"
Gini Index1 - Σ p²0 ~ 0.5No logarithms, faster computation

In practice, they produce nearly identical trees. Unless you have specific theoretical requirements, either works well.

Key Takeaways

  1. Decision trees split data using a series of questions, aiming for increasingly pure groups
  2. Gini Index measures the probability of picking two samples of different classes
  3. Feature selection: choose the feature that minimizes weighted Gini after splitting
  4. Gini vs. Entropy: different formulas, nearly identical results in practice

Ready to learn more?

Our Decision Trees course covers advanced topics like pruning strategies, handling overfitting, and when decision trees outperform other algorithms.

Ask AI ✨
Gini Index Explained: How CART Decision Trees Pick the Best Split | MathIsimple