MathIsimple
Distribution Theory
5-7 Hours

Multivariate Normal Distribution

The cornerstone of multivariate analysis: properties, linear transformations, marginal and conditional distributions

Learning Objectives
Understand the multivariate normal PDF and its parameters
Apply linear transformation properties
Derive marginal and conditional distributions
Compute MLEs for mean and covariance
Understand the Wishart distribution
Apply Hotelling's T² for hypothesis testing

Definition & PDF

Multivariate Normal PDF

A pp-dimensional random vector X\mathbf{X} follows a multivariate normal distribution Np(μ,Σ)N_p(\boldsymbol{\mu}, \boldsymbol{\Sigma}) if its PDF is:

f(x)=1(2π)p/2Σ1/2exp(12(xμ)TΣ1(xμ))f(\mathbf{x}) = \frac{1}{(2\pi)^{p/2}|\boldsymbol{\Sigma}|^{1/2}} \exp\left(-\frac{1}{2}(\mathbf{x}-\boldsymbol{\mu})^T\boldsymbol{\Sigma}^{-1}(\mathbf{x}-\boldsymbol{\mu})\right)

Parameters

  • μ\boldsymbol{\mu}: mean vector (p×1p \times 1)
  • Σ\boldsymbol{\Sigma}: covariance matrix (p×pp \times p, positive definite)

Notation

XNp(μ,Σ)\mathbf{X} \sim N_p(\boldsymbol{\mu}, \boldsymbol{\Sigma})
Bivariate Normal Special Case

For p=2p = 2 with correlation ρ\rho:

f(x1,x2)=12πσ1σ21ρ2exp(Q2(1ρ2))f(x_1, x_2) = \frac{1}{2\pi\sigma_1\sigma_2\sqrt{1-\rho^2}} \exp\left(-\frac{Q}{2(1-\rho^2)}\right)

where Q=(x1μ1σ1)22ρ(x1μ1σ1)(x2μ2σ2)+(x2μ2σ2)2Q = \left(\frac{x_1-\mu_1}{\sigma_1}\right)^2 - 2\rho\left(\frac{x_1-\mu_1}{\sigma_1}\right)\left(\frac{x_2-\mu_2}{\sigma_2}\right) + \left(\frac{x_2-\mu_2}{\sigma_2}\right)^2

Key Properties

Linear Transformation

Theorem

If XNp(μ,Σ)\mathbf{X} \sim N_p(\boldsymbol{\mu}, \boldsymbol{\Sigma}) and Y=AX+b\mathbf{Y} = \mathbf{A}\mathbf{X} + \mathbf{b} where A\mathbf{A} is q×pq \times p, then:

YNq(Aμ+b,AΣAT)\mathbf{Y} \sim N_q(\mathbf{A}\boldsymbol{\mu} + \mathbf{b}, \mathbf{A}\boldsymbol{\Sigma}\mathbf{A}^T)

Implication

Linear combinations of multivariate normal vectors are also multivariate normal.

Marginal Distributions

Partition X=(X1T,X2T)T\mathbf{X} = (\mathbf{X}_1^T, \mathbf{X}_2^T)^T with corresponding partitions of μ\boldsymbol{\mu} and Σ\boldsymbol{\Sigma}:

X1Np1(μ1,Σ11)\mathbf{X}_1 \sim N_{p_1}(\boldsymbol{\mu}_1, \boldsymbol{\Sigma}_{11})

The marginal distribution of any subset of components is multivariate normal with the corresponding mean and covariance submatrices.

Conditional Distributions

Conditional Distribution

X1X2=x2Np1(μ12,Σ12)\mathbf{X}_1 | \mathbf{X}_2 = \mathbf{x}_2 \sim N_{p_1}(\boldsymbol{\mu}_{1|2}, \boldsymbol{\Sigma}_{1|2})

Conditional Mean

μ12=μ1+Σ12Σ221(x2μ2)\boldsymbol{\mu}_{1|2} = \boldsymbol{\mu}_1 + \boldsymbol{\Sigma}_{12}\boldsymbol{\Sigma}_{22}^{-1}(\mathbf{x}_2 - \boldsymbol{\mu}_2)

Conditional Covariance

Σ12=Σ11Σ12Σ221Σ21\boldsymbol{\Sigma}_{1|2} = \boldsymbol{\Sigma}_{11} - \boldsymbol{\Sigma}_{12}\boldsymbol{\Sigma}_{22}^{-1}\boldsymbol{\Sigma}_{21}
Independence & Uncorrelatedness

Key Property

For multivariate normal: Uncorrelated ⟺ Independent

Important: This equivalence is unique to multivariate normal. For general distributions, uncorrelated does NOT imply independent.

Estimation & Sampling

Maximum Likelihood Estimation

MLE for Mean

μ^=xˉ=1ni=1nxi\hat{\boldsymbol{\mu}} = \bar{\mathbf{x}} = \frac{1}{n}\sum_{i=1}^n \mathbf{x}_i

MLE for Covariance

Σ^=1ni=1n(xixˉ)(xixˉ)T\hat{\boldsymbol{\Sigma}} = \frac{1}{n}\sum_{i=1}^n (\mathbf{x}_i - \bar{\mathbf{x}})(\mathbf{x}_i - \bar{\mathbf{x}})^T

Note: The MLE for covariance uses nn (biased). The unbiased estimator uses n1n-1.

Wishart Distribution

The multivariate generalization of chi-squared distribution:

(n1)S=i=1n(xixˉ)(xixˉ)TWp(n1,Σ)(n-1)\mathbf{S} = \sum_{i=1}^n (\mathbf{x}_i - \bar{\mathbf{x}})(\mathbf{x}_i - \bar{\mathbf{x}})^T \sim W_p(n-1, \boldsymbol{\Sigma})

Properties

  • E[W]=mΣE[\mathbf{W}] = m\boldsymbol{\Sigma} for Wp(m,Σ)W_p(m, \boldsymbol{\Sigma})
  • • When p=1p = 1: reduces to σ2χm2\sigma^2 \chi^2_m
Hotelling's T² Distribution
T2=n(xˉμ)TS1(xˉμ)T^2 = n(\bar{\mathbf{x}} - \boldsymbol{\mu})^T\mathbf{S}^{-1}(\bar{\mathbf{x}} - \boldsymbol{\mu})

Relationship to F-distribution

np(n1)pT2Fp,np\frac{n-p}{(n-1)p}T^2 \sim F_{p, n-p}

Moment Generating Function

MGF of Multivariate Normal

The moment generating function of XNp(μ,Σ)\mathbf{X} \sim N_p(\boldsymbol{\mu}, \boldsymbol{\Sigma}) is:

MX(t)=E[etTX]=exp(tTμ+12tTΣt)M_{\mathbf{X}}(\mathbf{t}) = E[e^{\mathbf{t}^T\mathbf{X}}] = \exp\left(\mathbf{t}^T\boldsymbol{\mu} + \frac{1}{2}\mathbf{t}^T\boldsymbol{\Sigma}\mathbf{t}\right)

First Term

tTμ\mathbf{t}^T\boldsymbol{\mu} captures the mean

Second Term

12tTΣt\frac{1}{2}\mathbf{t}^T\boldsymbol{\Sigma}\mathbf{t} captures the variance structure

Characteristic Function

The characteristic function (always exists, unlike MGF):

ϕX(t)=E[eitTX]=exp(itTμ12tTΣt)\phi_{\mathbf{X}}(\mathbf{t}) = E[e^{i\mathbf{t}^T\mathbf{X}}] = \exp\left(i\mathbf{t}^T\boldsymbol{\mu} - \frac{1}{2}\mathbf{t}^T\boldsymbol{\Sigma}\mathbf{t}\right)

Key Property

The characteristic function uniquely determines the distribution. Two random vectors with the same characteristic function have the same distribution.

Quadratic Forms

Chi-Squared from Quadratic Forms

For XNp(μ,Σ)\mathbf{X} \sim N_p(\boldsymbol{\mu}, \boldsymbol{\Sigma}) with Σ\boldsymbol{\Sigma} positive definite:

(Xμ)TΣ1(Xμ)χp2(\mathbf{X} - \boldsymbol{\mu})^T\boldsymbol{\Sigma}^{-1}(\mathbf{X} - \boldsymbol{\mu}) \sim \chi^2_p

Interpretation

This is the squared Mahalanobis distance from X\mathbf{X} to μ\boldsymbol{\mu}. It follows a chi-squared distribution with pp degrees of freedom.

Independence of Quadratic Forms

For normal X\mathbf{X}, quadratic forms XTAX\mathbf{X}^T\mathbf{A}\mathbf{X} and XTBX\mathbf{X}^T\mathbf{B}\mathbf{X} are independent if:

AΣB=0\mathbf{A}\boldsymbol{\Sigma}\mathbf{B} = \mathbf{0}

Application

This theorem explains why xˉ\bar{\mathbf{x}} and S\mathbf{S} are independent in normal sampling, which is crucial for deriving the distribution of Hotelling's T².

Checking Multivariate Normality

Graphical Methods

Q-Q Plots for Each Variable

Check marginal normality for each variable individually. Deviations suggest non-normality.

Chi-Square Plot

Plot sorted squared Mahalanobis distances against chi-square quantiles. Should be approximately linear.

Mahalanobis Distance Check

Compute di2=(xixˉ)TS1(xixˉ)d_i^2 = (\mathbf{x}_i - \bar{\mathbf{x}})^T\mathbf{S}^{-1}(\mathbf{x}_i - \bar{\mathbf{x}}) for each observation. Under normality, approximately 50% should be below χp,0.52\chi^2_{p,0.5}.

Formal Tests

Mardia's Test

Tests multivariate skewness and kurtosis. Large samples needed.

Henze-Zirkler Test

Based on empirical characteristic function. Good power properties.

Royston Test

Extension of Shapiro-Wilk to multivariate case.

Energy Test

Based on energy statistics. Consistent against all alternatives.

Caution

With large samples, tests may reject normality for minor departures that don't affect practical analyses. Use graphical methods alongside formal tests.

Large Sample Properties

Multivariate Central Limit Theorem

For i.i.d. random vectors with mean μ\boldsymbol{\mu} and covariance Σ\boldsymbol{\Sigma}:

n(Xˉnμ)dNp(0,Σ)\sqrt{n}(\bar{\mathbf{X}}_n - \boldsymbol{\mu}) \xrightarrow{d} N_p(\mathbf{0}, \boldsymbol{\Sigma})

Implication

Even if the original data is not normal, the sample mean vector is approximately multivariate normal for large n. This justifies asymptotic inference procedures.

Asymptotic Distribution of Sample Covariance

For the sample covariance matrix S\mathbf{S}:

Consistency

SpΣ\mathbf{S} \xrightarrow{p} \boldsymbol{\Sigma}

Asymptotic Normality

n(vec(S)vec(Σ))\sqrt{n}(\text{vec}(\mathbf{S}) - \text{vec}(\boldsymbol{\Sigma})) is asymptotically normal

Gaussian Copula

Copula Definition

A copula separates the marginal distributions from the dependence structure:

F(x1,,xp)=C(F1(x1),,Fp(xp))F(x_1, \ldots, x_p) = C(F_1(x_1), \ldots, F_p(x_p))

Gaussian Copula

Derived from multivariate normal with correlation matrix R\mathbf{R}

Application

Model dependence with non-normal marginals

Tail Dependence

The Gaussian copula has zero tail dependence:

λU=λL=0for ρ<1\lambda_U = \lambda_L = 0 \quad \text{for } |\rho| < 1

Implication: Extreme events in one variable don't increase probability of extremes in another. This was a key issue in the 2008 financial crisis when Gaussian copulas underestimated joint tail risk.

Simulation Methods

Cholesky Method

Generate XNp(μ,Σ)\mathbf{X} \sim N_p(\boldsymbol{\mu}, \boldsymbol{\Sigma}):

  1. Compute Cholesky decomposition: Σ=LLT\boldsymbol{\Sigma} = \mathbf{L}\mathbf{L}^T
  2. Generate ZNp(0,I)\mathbf{Z} \sim N_p(\mathbf{0}, \mathbf{I})
  3. Set X=μ+LZ\mathbf{X} = \boldsymbol{\mu} + \mathbf{L}\mathbf{Z}

Verification

E[X]=μE[\mathbf{X}] = \boldsymbol{\mu}, Cov(X)=LILT=Σ\text{Cov}(\mathbf{X}) = \mathbf{L}\mathbf{I}\mathbf{L}^T = \boldsymbol{\Sigma}

Eigenvalue Method

Alternative using spectral decomposition:

X=μ+PΛ1/2Z\mathbf{X} = \boldsymbol{\mu} + \mathbf{P}\boldsymbol{\Lambda}^{1/2}\mathbf{Z}

where

Σ=PΛPT\boldsymbol{\Sigma} = \mathbf{P}\boldsymbol{\Lambda}\mathbf{P}^T

Advantage

Works even if Σ\boldsymbol{\Sigma} is singular

Maximum Likelihood Estimation

Log-Likelihood Function

For n i.i.d. observations from Np(μ,Σ)N_p(\boldsymbol{\mu}, \boldsymbol{\Sigma}):

(μ,Σ)=np2ln(2π)n2lnΣ12i=1n(xiμ)TΣ1(xiμ)\ell(\boldsymbol{\mu}, \boldsymbol{\Sigma}) = -\frac{np}{2}\ln(2\pi) - \frac{n}{2}\ln|\boldsymbol{\Sigma}| - \frac{1}{2}\sum_{i=1}^n (\mathbf{x}_i - \boldsymbol{\mu})^T\boldsymbol{\Sigma}^{-1}(\mathbf{x}_i - \boldsymbol{\mu})
MLE Results

Mean MLE

μ^=xˉ=1ni=1nxi\hat{\boldsymbol{\mu}} = \bar{\mathbf{x}} = \frac{1}{n}\sum_{i=1}^n \mathbf{x}_i

Covariance MLE

Σ^=1ni=1n(xixˉ)(xixˉ)T\hat{\boldsymbol{\Sigma}} = \frac{1}{n}\sum_{i=1}^n (\mathbf{x}_i - \bar{\mathbf{x}})(\mathbf{x}_i - \bar{\mathbf{x}})^T

Note: The MLE for Σ\boldsymbol{\Sigma} divides by n, not n-1. The unbiased estimator uses S=nn1Σ^\mathbf{S} = \frac{n}{n-1}\hat{\boldsymbol{\Sigma}}.

Asymptotic Properties

Consistency

μ^Pμ\hat{\boldsymbol{\mu}} \xrightarrow{P} \boldsymbol{\mu}, Σ^PΣ\hat{\boldsymbol{\Sigma}} \xrightarrow{P} \boldsymbol{\Sigma}

Asymptotic Normality

n(μ^μ)dNp(0,Σ)\sqrt{n}(\hat{\boldsymbol{\mu}} - \boldsymbol{\mu}) \xrightarrow{d} N_p(\mathbf{0}, \boldsymbol{\Sigma})

Worked Examples

Example: Bivariate Normal Conditional

Given (X1,X2)N2((0,0)T,(10.60.61))(X_1, X_2) \sim N_2((0,0)^T, \begin{pmatrix} 1 & 0.6 \\ 0.6 & 1 \end{pmatrix}), find X1X2=2X_1 | X_2 = 2:

Conditional Mean

μ12=0+0.611(20)=1.2\mu_{1|2} = 0 + 0.6 \cdot 1^{-1} \cdot (2-0) = 1.2

Conditional Variance

σ122=10.62=0.64\sigma^2_{1|2} = 1 - 0.6^2 = 0.64

Result: X1X2=2N(1.2,0.64)X_1 | X_2 = 2 \sim N(1.2, 0.64)

Example: Mahalanobis Distance

Find percentage of data within Mahalanobis distance 2 for bivariate normal:

P(D24)=P(χ224)0.865P(D^2 \leq 4) = P(\chi^2_2 \leq 4) \approx 0.865

Interpretation: About 86.5% of observations fall within the ellipse defined by Mahalanobis distance ≤ 2.

Practice Quiz

Practice Quiz
10
Questions
0
Correct
0%
Accuracy
1
The PDF of a multivariate normal distribution Np(μ,Σ)N_p(\boldsymbol{\mu}, \boldsymbol{\Sigma}) involves which term in the exponent?
Not attempted
2
If XNp(μ,Σ)\mathbf{X} \sim N_p(\boldsymbol{\mu}, \boldsymbol{\Sigma}) and Y=AX+b\mathbf{Y} = \mathbf{A}\mathbf{X} + \mathbf{b}, then Y\mathbf{Y} follows:
Not attempted
3
For multivariate normal, if two components are uncorrelated (σij=0\sigma_{ij} = 0), they are also:
Not attempted
4
The marginal distribution of a subset of components from a multivariate normal is:
Not attempted
5
The MLE for the mean vector μ\boldsymbol{\mu} in multivariate normal is:
Not attempted
6
The sample covariance matrix (n1)S(n-1)\mathbf{S} follows which distribution when sampling from Np(μ,Σ)N_p(\boldsymbol{\mu}, \boldsymbol{\Sigma})?
Not attempted
7
Hotelling's T2T^2 statistic is used to test hypotheses about:
Not attempted
8
The conditional distribution of X1\mathbf{X}_1 given X2=x2\mathbf{X}_2 = \mathbf{x}_2 in a partitioned multivariate normal is:
Not attempted
9
The normalizing constant of the multivariate normal PDF involves:
Not attempted
10
For XNp(μ,Σ)\mathbf{X} \sim N_p(\boldsymbol{\mu}, \boldsymbol{\Sigma}), the quadratic form (Xμ)TΣ1(Xμ)(\mathbf{X} - \boldsymbol{\mu})^T\boldsymbol{\Sigma}^{-1}(\mathbf{X} - \boldsymbol{\mu}) follows:
Not attempted

Frequently Asked Questions

Why is the multivariate normal so important?

It's the foundation of most multivariate methods. Many techniques (PCA, factor analysis, discriminant analysis) assume or rely on multivariate normality. The Central Limit Theorem also extends to multivariate settings.

How do I check for multivariate normality?

Use Q-Q plots for each variable, chi-square plots of Mahalanobis distances, or formal tests like Mardia's test for multivariate skewness and kurtosis.

What if my data isn't multivariate normal?

Consider transformations (e.g., log, Box-Cox), use robust methods, or employ distribution-free (nonparametric) alternatives. Many methods are robust to mild departures from normality.

What's the relationship between Wishart and chi-squared?

Wishart is the multivariate generalization. When p=1, the Wishart distribution reduces to a scaled chi-squared distribution.

When should I use Hotelling's T² vs. regular t-test?

Use Hotelling's T² when testing hypotheses about mean vectors (multiple variables simultaneously). It accounts for correlations between variables and controls overall Type I error.

Ask AI ✨
MathIsimple – Simple, Friendly Math Tools & Learning