MathIsimple

k-Nearest Neighbors (kNN)

Master the lazy learning algorithm that serves as a key application scenario for dimensionality reduction. Learn how kNN finds neighbors, makes predictions, and why its performance degrades in high dimensions.

Module 1 of 9
Intermediate Level
80-100 min

What is k-Nearest Neighbors?

k-Nearest Neighbors (kNN) is a lazy learning (also called instance-based learning) algorithm with no explicit training process. Given a test sample, kNN finds the k training samples closest to it based on some distance metric, then makes predictions based on these neighbors.

Key Characteristics

Lazy Learning

No training phase - all computation happens at prediction time

Non-Parametric

Makes no assumptions about data distribution

Distance-Based

Performance directly depends on distance metric quality

Sensitive to Dimensionality

Performance degrades severely in high-dimensional spaces (curse of dimensionality)

Advantages

  • • Simple and intuitive
  • • No training time
  • • Works for both classification and regression
  • • Can handle non-linear decision boundaries
  • • Naturally handles multi-class problems

Limitations

  • • Slow prediction (computes all distances)
  • • Sensitive to irrelevant features
  • • Requires good distance metric
  • • Performance degrades with high dimensions
  • • Memory intensive (stores all training data)

kNN Algorithm Steps

The kNN algorithm follows a simple three-step process:

Step 1

Compute Distances

For a test sample x, calculate distances to all training samples using a distance metric (typically Euclidean distance).

Details: Distance calculation: dist(x, x_i) = ||x - x_i|| for all training samples x_i.

Step 2

Find k Nearest Neighbors

Select the k training samples with smallest distances to x. These are the 'neighbors'.

Details: k is a hyperparameter. Common values: k=1 (nearest neighbor), k=3, k=5, or k=√n (n = number of training samples).

Step 3

Make Prediction

For classification: majority vote among k neighbors' labels. For regression: average of k neighbors' values.

Details: Classification: y_pred = mode({y_i | i ∈ k nearest neighbors}). Regression: y_pred = (1/k) Σ y_i.

Classification vs Regression

kNN can be used for both classification and regression tasks, with different prediction strategies:

kNN Classification

For classification tasks, kNN uses majority voting among the k nearest neighbors.

ypred=mode({yiik nearest neighbors})y_{\text{pred}} = \text{mode}(\{y_i \mid i \in \text{k nearest neighbors}\})

Select the most common class label among k neighbors

Example: k=3

Neighbors: [Class A, Class A, Class B]

Prediction: Class A (majority: 2 votes)

kNN Regression

For regression tasks, kNN uses averaging of the k nearest neighbors' target values.

ypred=1ki=1kyiy_{\text{pred}} = \frac{1}{k} \sum_{i=1}^{k} y_i

Average the target values of k neighbors

Example: k=3

Neighbors: [450000, 480000, 420000]

Prediction: $450,000 (average)

Error Rate Analysis

Understanding the theoretical error rate of nearest neighbor classifiers provides insight into why dimensionality reduction is crucial for kNN performance.

Error Rate for 1-NN (Nearest Neighbor)

For a test sample xx, let zz be its nearest neighbor. The error probability is:

P(err)=1cYP(cx)P(cz)P(\text{err}) = 1 - \sum_{c \in \mathcal{Y}} P(c \mid x) P(c \mid z)

Where Y\mathcal{Y} is the set of all class labels, and P(cx)P(c \mid x)is the probability of class cc given sample xx.

Key Theoretical Result

Under the assumption that samples are independent and identically distributed (i.i.d.), and training samples are sufficiently dense (any test sample has training samples nearby), the error rate can be approximated as:

P(err)1cYP2(cx)P(\text{err}) \simeq 1 - \sum_{c \in \mathcal{Y}} P^2(c \mid x)

Core Conclusion:

The generalization error rate of the nearest neighbor classifier is at most twice the error rate of the Bayes optimal classifier. This provides strong theoretical justification for kNN's effectiveness when distances are meaningful.

Why This Matters for Dimensionality Reduction

The theoretical guarantee assumes that:

  • • Training samples are dense (any test sample has nearby neighbors)
  • • Distance computations are meaningful (closer samples are more similar)

In high-dimensional spaces, both assumptions break down:

  • Sample sparsity: High dimensions make samples sparse, so "nearby" neighbors may not exist
  • Distance distortion: All pairwise distances become similar, making neighbor selection unreliable

This is why dimensionality reduction is essential for kNN: it restores sample density and makes distances meaningful again.

Housing Price Prediction Example (kNN Regression)

Apply kNN regression to predict house prices based on similar properties:

Dataset: House Sales

IDSqftBedroomsBathroomsYearLocationPrice
12,40042.520058.5/10$450,000
21,8003219987.2/10$320,000
33,2005320109.1/10$680,000
41,50021.519956.8/10$240,000
52,80042.520088.9/10$520,000
62,1003220027.5/10$380,000
73,60053.520129.5/10$750,000
81,9003220007/10$310,000

Features: Square footage (continuous), Bedrooms/Bathrooms (discrete), Year Built (ordinal), Location Score (1-10), Price (continuous target)

kNN Prediction Example (k=3)

To predict price for a new house with: 2600 sqft, 4 bedrooms, 2.5 bathrooms, built in 2007, location score 8.7:

1. Compute distances to all training houses
2. Find 3 nearest neighbors:

• House #1: 2400 sqft, 4 bed, 2.5 bath, 2005, loc 8.5 → Price: $450,000

• House #5: 2800 sqft, 4 bed, 2.5 bath, 2008, loc 8.9 → Price: $520,000

• House #6: 2100 sqft, 3 bed, 2 bath, 2002, loc 7.5 → Price: $380,000

Predicted Price:$450,000

Average of 3 nearest neighbors: ($450k + $520k + $380k) / 3 = $450,000

Connection to Dimensionality Reduction

kNN's performance directly depends on the quality of distance computations, making it a prime application scenario for dimensionality reduction techniques.

The Problem: High-Dimensional Data

In high-dimensional spaces, kNN faces critical challenges:

  • Sample sparsity: With d dimensions, samples become exponentially sparse. A "neighborhood" that contains k neighbors in 2D may be empty in 100D.
  • Distance concentration: All pairwise distances become similar, making it impossible to distinguish "near" from "far" neighbors.
  • Irrelevant features: High-dimensional data often contains many irrelevant features that add noise to distance computations.
  • Computational cost: Distance computation scales with dimensionality, making kNN slow for high-dimensional data.

The Solution: Dimensionality Reduction

Dimensionality reduction addresses these issues by:

  • Increasing sample density: Mapping to lower dimensions makes samples more densely distributed, ensuring nearby neighbors exist.
  • Preserving meaningful distances: Good dimensionality reduction methods preserve the structure that makes distances meaningful (e.g., PCA preserves variance, Isomap preserves geodesic distances).
  • Removing noise: Dimensionality reduction often filters out irrelevant dimensions, improving distance quality.
  • Accelerating computation: Lower dimensions mean faster distance calculations.

Practical Workflow

A common machine learning pipeline:

  1. 1. High-dimensional data: Start with d-dimensional features (e.g., d=1000 for images)
  2. 2. Dimensionality reduction: Apply PCA, Isomap, or other methods to reduce to d' dimensions (e.g., d'=50)
  3. 3. kNN in low dimensions: Run kNN on the reduced-dimensional data
  4. 4. Result: Better accuracy, faster computation, more interpretable results