MathIsimple – Simple, Friendly Math Tools & Learning

Learning Objectives

Understand multivariate data representation and data matrices

Review essential matrix algebra operations

Define random vectors, mean vectors, and covariance matrices

Compute sample statistics for multivariate data

Understand properties of linear combinations

Calculate and interpret Mahalanobis distance

Introduction to Multivariate Data

What is Multivariate Analysis?

Multivariate analysis is the statistical analysis of data involving multiple variables measured on the same observations. It considers relationships among three or more variables simultaneously.

Data Reduction

Simplify complex data (PCA, Factor Analysis)

Classification

Group observations (Discriminant, Cluster Analysis)

Data Matrix Representation

Multivariate data is organized in a data matrix where rows represent observations and columns represent variables:

\mathbf{X}_{n \times p} = \begin{pmatrix} x_{11} & x_{12} & \cdots & x_{1p} \\ x_{21} & x_{22} & \cdots & x_{2p} \\ \vdots & \vdots & \ddots & \vdots \\ x_{n1} & x_{n2} & \cdots & x_{np} \end{pmatrix}

Dimensions

$n$ = observations (rows), $p$ = variables (columns)

Row Vector

\mathbf{x}_i^T = (x_{i1}, x_{i2}, \ldots, x_{ip})

Matrix Algebra Review

Basic Matrix Operations

Transpose

(\mathbf{A}^T)_{ij} = a_{ji}

Trace

\text{tr}(\mathbf{A}) = \sum_{i=1}^n a_{ii}

Eigenvalues and Eigenvectors

\mathbf{A}\mathbf{e} = \lambda \mathbf{e}

If $\mathbf{e}$ is non-zero and satisfies this equation, then $\lambda$ is an eigenvalue and $\mathbf{e}$ is the corresponding eigenvector.

Key Properties

• Sum of eigenvalues = trace: $\sum_i \lambda_i = \text{tr}(\mathbf{A})$
• Product of eigenvalues = determinant: $\prod_i \lambda_i = |\mathbf{A}|$

Positive Definite Matrices

A symmetric matrix $\mathbf{A}$ is positive definite if:

\mathbf{x}^T\mathbf{A}\mathbf{x} > 0 \quad \text{for all } \mathbf{x} \neq \mathbf{0}

Equivalent: All eigenvalues $\lambda_i > 0$

Random Vectors & Covariance

Mean Vector

\boldsymbol{\mu} = E[\mathbf{X}] = \begin{pmatrix} \mu_1 \\ \mu_2 \\ \vdots \\ \mu_p \end{pmatrix}

Linear Transformation

E[\mathbf{A}\mathbf{X} + \mathbf{b}] = \mathbf{A}\boldsymbol{\mu} + \mathbf{b}

Covariance Matrix

\boldsymbol{\Sigma} = E[(\mathbf{X} - \boldsymbol{\mu})(\mathbf{X} - \boldsymbol{\mu})^T]

\boldsymbol{\Sigma} = \begin{pmatrix} \sigma_{11} & \sigma_{12} & \cdots & \sigma_{1p} \\ \sigma_{21} & \sigma_{22} & \cdots & \sigma_{2p} \\ \vdots & \vdots & \ddots & \vdots \\ \sigma_{p1} & \sigma_{p2} & \cdots & \sigma_{pp} \end{pmatrix}

Properties

• Symmetric: $\boldsymbol{\Sigma} = \boldsymbol{\Sigma}^T$
• Positive semi-definite

Transformation

\text{Cov}(\mathbf{AX}) = \mathbf{A}\boldsymbol{\Sigma}\mathbf{A}^T

Sample Statistics

Sample Mean Vector & Covariance Matrix

Sample Mean Vector

\bar{\mathbf{x}} = \frac{1}{n}\sum_{i=1}^n \mathbf{x}_i

Sample Covariance Matrix

\mathbf{S} = \frac{1}{n-1}\sum_{i=1}^n (\mathbf{x}_i - \bar{\mathbf{x}})(\mathbf{x}_i - \bar{\mathbf{x}})^T

Note: Division by $n-1$ (Bessel's correction) gives an unbiased estimator: $E[\mathbf{S}] = \boldsymbol{\Sigma}$

Mahalanobis Distance

The Mahalanobis distance measures distance accounting for correlations between variables:

d^2(\mathbf{x}, \boldsymbol{\mu}) = (\mathbf{x} - \boldsymbol{\mu})^T\boldsymbol{\Sigma}^{-1}(\mathbf{x} - \boldsymbol{\mu})

When $\boldsymbol{\Sigma} = \mathbf{I}$

Reduces to squared Euclidean distance

Applications

Outlier detection, classification, clustering

Spectral Decomposition

Spectral Theorem for Symmetric Matrices

Any symmetric matrix $\mathbf{A}$ can be decomposed as:

\mathbf{A} = \mathbf{P}\boldsymbol{\Lambda}\mathbf{P}^T = \sum_{i=1}^{p} \lambda_i \mathbf{e}_i\mathbf{e}_i^T

$\mathbf{P}$ is orthogonal

$\mathbf{P}^T\mathbf{P} = \mathbf{P}\mathbf{P}^T = \mathbf{I}$

$\boldsymbol{\Lambda}$ is diagonal

$\boldsymbol{\Lambda} = \text{diag}(\lambda_1, \ldots, \lambda_p)$

Square Root of Positive Definite Matrix

\mathbf{A}^{1/2} = \mathbf{P}\boldsymbol{\Lambda}^{1/2}\mathbf{P}^T \quad \text{where} \quad \boldsymbol{\Lambda}^{1/2} = \text{diag}(\sqrt{\lambda_1}, \ldots, \sqrt{\lambda_p})

Powers and Inverse via Spectral Decomposition

Matrix Powers

\mathbf{A}^k = \mathbf{P}\boldsymbol{\Lambda}^k\mathbf{P}^T

Matrix Inverse

\mathbf{A}^{-1} = \mathbf{P}\boldsymbol{\Lambda}^{-1}\mathbf{P}^T

Determinant via Eigenvalues

|\mathbf{A}| = \prod_{i=1}^{p} \lambda_i

The determinant equals the product of all eigenvalues.

Partitioned Matrices

Block Matrix Representation

A covariance matrix can be partitioned to analyze subsets of variables:

\boldsymbol{\Sigma} = \begin{pmatrix} \boldsymbol{\Sigma}_{11} & \boldsymbol{\Sigma}_{12} \\ \boldsymbol{\Sigma}_{21} & \boldsymbol{\Sigma}_{22} \end{pmatrix}

$\boldsymbol{\Sigma}_{11}$ : $p_1 \times p_1$

Covariance within first group

$\boldsymbol{\Sigma}_{12}$ : $p_1 \times p_2$

Cross-covariance between groups

Symmetry Property

$\boldsymbol{\Sigma}_{21} = \boldsymbol{\Sigma}_{12}^T$

Inverse of Partitioned Matrix

The inverse of a partitioned positive definite matrix involves the Schur complement:

Schur Complement

\boldsymbol{\Sigma}_{11 \cdot 2} = \boldsymbol{\Sigma}_{11} - \boldsymbol{\Sigma}_{12}\boldsymbol{\Sigma}_{22}^{-1}\boldsymbol{\Sigma}_{21}

Interpretation

The Schur complement $\boldsymbol{\Sigma}_{11 \cdot 2}$ is the conditional covariance matrix of $\mathbf{X}^{(1)}$ given $\mathbf{X}^{(2)}$ .

Generalized Variance

Measures of Total Variability

Total Variance

\text{tr}(\boldsymbol{\Sigma}) = \sum_{i=1}^{p} \sigma_{ii} = \sum_{i=1}^{p} \lambda_i

Sum of all variances

Generalized Variance

|\boldsymbol{\Sigma}| = \prod_{i=1}^{p} \lambda_i

Product of eigenvalues

Geometric Interpretation

The generalized variance $|\boldsymbol{\Sigma}|$ is proportional to the squared volume of the concentration ellipsoid. It measures how "spread out" the data is in all directions simultaneously.

When $|\boldsymbol{\Sigma}| = 0$

The variables are linearly dependent (at least one eigenvalue is zero). The data lies in a lower-dimensional subspace.

Concentration Ellipsoid

For multivariate data, the concentration ellipsoid generalizes the concept of confidence intervals:

\{\mathbf{x}: (\mathbf{x} - \boldsymbol{\mu})^T\boldsymbol{\Sigma}^{-1}(\mathbf{x} - \boldsymbol{\mu}) \leq c^2\}

Principal Axes

Directions given by eigenvectors $\mathbf{e}_i$

Semi-axis Lengths

Proportional to $c\sqrt{\lambda_i}$

Linear Combinations

Properties of Linear Combinations

For $Y = \mathbf{a}^T\mathbf{X} = a_1X_1 + a_2X_2 + \cdots + a_pX_p$ :

Mean

E[Y] = \mathbf{a}^T\boldsymbol{\mu}

Variance

\text{Var}(Y) = \mathbf{a}^T\boldsymbol{\Sigma}\mathbf{a}

Multiple Linear Combinations

For $\mathbf{Y} = \mathbf{A}\mathbf{X}$ :

\text{Cov}(\mathbf{Y}) = \mathbf{A}\boldsymbol{\Sigma}\mathbf{A}^T

Covariance Between Linear Combinations

For two linear combinations $Y_1 = \mathbf{a}^T\mathbf{X}$ and $Y_2 = \mathbf{b}^T\mathbf{X}$ :

\text{Cov}(Y_1, Y_2) = \mathbf{a}^T\boldsymbol{\Sigma}\mathbf{b}

Special Case: Uncorrelated Combinations

$Y_1$ and $Y_2$ are uncorrelated if $\mathbf{a}^T\boldsymbol{\Sigma}\mathbf{b} = 0$

Maximum Variance Direction

Among all unit-length linear combinations $Y = \mathbf{a}^T\mathbf{X}$ with $\|\mathbf{a}\| = 1$ :

\max_{\|\mathbf{a}\|=1} \text{Var}(\mathbf{a}^T\mathbf{X}) = \lambda_1

achieved when $\mathbf{a} = \mathbf{e}_1$ (first eigenvector)

Foundation for PCA

This result is the theoretical basis for Principal Component Analysis. The first PC is the direction of maximum variance.

Random Vectors and Sampling

Random Sampling

Sample of n observations from p-variate distribution:

\mathbf{X}_1, \mathbf{X}_2, \ldots, \mathbf{X}_n \sim \text{i.i.d. } F

Sample mean vector: $\bar{\mathbf{x}} = \frac{1}{n}\sum_{i=1}^n \mathbf{x}_i$

Practice Quiz

Test your understanding with 10 multiple-choice questions

Practice Quiz

10

Questions

0

Correct

0%

Accuracy

1

Given two matrices

\mathbf{A}_{2 \times 3}

and

\mathbf{B}_{3 \times 4}

, what is the dimension of the product

\mathbf{AB}

?

Not attempted

2

For a random vector

\mathbf{X} = (X_1, X_2, X_3)^T

, the covariance matrix

\boldsymbol{\Sigma}

has dimension:

Not attempted

3

The sample mean vector

\bar{\mathbf{x}}

for a dataset with

n

observations is computed as:

Not attempted

4

A matrix

\mathbf{A}

is positive definite if and only if:

Not attempted

5

The covariance matrix

\boldsymbol{\Sigma}

of a random vector satisfies which property?

Not attempted

6

For a linear combination

Y = \mathbf{a}^T\mathbf{X}

where

\mathbf{X}

has covariance

\boldsymbol{\Sigma}

, the variance of

Y

is:

Not attempted

7

The squared Mahalanobis distance between point

\mathbf{x}

and mean

\boldsymbol{\mu}

is:

Not attempted

8

The correlation matrix

\mathbf{R}

is related to covariance matrix

\boldsymbol{\Sigma}

by:

Not attempted

9

For centered data matrix

\mathbf{X}_{n \times p}

, the sample covariance matrix is:

Not attempted

10

If eigenvalues of a

3 \times 3

covariance matrix are

\lambda_1 = 5

,

\lambda_2 = 3

,

\lambda_3 = 2

, the total variance is:

Not attempted

Frequently Asked Questions

What is the difference between covariance and correlation matrices?

The covariance matrix contains raw covariances between variables, which depend on the scales of measurement. The correlation matrix standardizes these to values between -1 and 1, making it easier to compare relationships across variables with different scales. Correlation matrix has 1s on the diagonal.

Why is the Mahalanobis distance important?

Unlike Euclidean distance, Mahalanobis distance accounts for correlations between variables and scales by the variance structure. It measures how many standard deviations away a point is from the center, making it useful for outlier detection and classification in multivariate data.

Why must covariance matrices be positive semi-definite?

The covariance matrix must be positive semi-definite because variances cannot be negative. For any linear combination of variables, the variance must be non-negative: $\text{Var}(\mathbf{a}^T\mathbf{X}) = \mathbf{a}^T\boldsymbol{\Sigma}\mathbf{a} \geq 0$ . This property ensures the covariance matrix represents a valid probability distribution.

What do eigenvalues of a covariance matrix represent?

The eigenvalues of a covariance matrix represent the variance along the principal axes of the data distribution. Larger eigenvalues indicate directions with more variability. The sum of eigenvalues equals the total variance, and this concept is fundamental to Principal Component Analysis (PCA).

When should I use n vs n-1 in sample statistics?

Use n-1 (Bessel's correction) when computing the sample covariance matrix to get an unbiased estimator of the population covariance. Use n when you want the maximum likelihood estimate or when working with the entire population. Most statistical software uses n-1 by default.

Multivariate Statistics Fundamentals

Introduction to Multivariate Data

Matrix Algebra Review

Key Properties

Random Vectors & Covariance

Sample Statistics

Spectral Decomposition

Partitioned Matrices

Generalized Variance

Linear Combinations

Random Vectors and Sampling

Practice Quiz

Frequently Asked Questions