The cornerstone of multivariate analysis: properties, linear transformations, marginal and conditional distributions
A -dimensional random vector follows a multivariate normal distribution if its PDF is:
Parameters
Notation
For with correlation :
where
Theorem
If and where is , then:
Implication
Linear combinations of multivariate normal vectors are also multivariate normal.
Partition with corresponding partitions of and :
The marginal distribution of any subset of components is multivariate normal with the corresponding mean and covariance submatrices.
Conditional Distribution
Conditional Mean
Conditional Covariance
Key Property
For multivariate normal: Uncorrelated ⟺ Independent
Important: This equivalence is unique to multivariate normal. For general distributions, uncorrelated does NOT imply independent.
MLE for Mean
MLE for Covariance
Note: The MLE for covariance uses (biased). The unbiased estimator uses .
The multivariate generalization of chi-squared distribution:
Properties
Relationship to F-distribution
The moment generating function of is:
First Term
captures the mean
Second Term
captures the variance structure
The characteristic function (always exists, unlike MGF):
Key Property
The characteristic function uniquely determines the distribution. Two random vectors with the same characteristic function have the same distribution.
For with positive definite:
Interpretation
This is the squared Mahalanobis distance from to . It follows a chi-squared distribution with degrees of freedom.
For normal , quadratic forms and are independent if:
Application
This theorem explains why and are independent in normal sampling, which is crucial for deriving the distribution of Hotelling's T².
Q-Q Plots for Each Variable
Check marginal normality for each variable individually. Deviations suggest non-normality.
Chi-Square Plot
Plot sorted squared Mahalanobis distances against chi-square quantiles. Should be approximately linear.
Mahalanobis Distance Check
Compute for each observation. Under normality, approximately 50% should be below .
Mardia's Test
Tests multivariate skewness and kurtosis. Large samples needed.
Henze-Zirkler Test
Based on empirical characteristic function. Good power properties.
Royston Test
Extension of Shapiro-Wilk to multivariate case.
Energy Test
Based on energy statistics. Consistent against all alternatives.
Caution
With large samples, tests may reject normality for minor departures that don't affect practical analyses. Use graphical methods alongside formal tests.
For i.i.d. random vectors with mean and covariance :
Implication
Even if the original data is not normal, the sample mean vector is approximately multivariate normal for large n. This justifies asymptotic inference procedures.
For the sample covariance matrix :
Consistency
Asymptotic Normality
is asymptotically normal
A copula separates the marginal distributions from the dependence structure:
Gaussian Copula
Derived from multivariate normal with correlation matrix
Application
Model dependence with non-normal marginals
The Gaussian copula has zero tail dependence:
Implication: Extreme events in one variable don't increase probability of extremes in another. This was a key issue in the 2008 financial crisis when Gaussian copulas underestimated joint tail risk.
Generate :
Verification
,
Alternative using spectral decomposition:
where
Advantage
Works even if is singular
For n i.i.d. observations from :
Mean MLE
Covariance MLE
Note: The MLE for divides by n, not n-1. The unbiased estimator uses .
Consistency
,
Asymptotic Normality
Given , find :
Conditional Mean
Conditional Variance
Result:
Find percentage of data within Mahalanobis distance 2 for bivariate normal:
Interpretation: About 86.5% of observations fall within the ellipse defined by Mahalanobis distance ≤ 2.
It's the foundation of most multivariate methods. Many techniques (PCA, factor analysis, discriminant analysis) assume or rely on multivariate normality. The Central Limit Theorem also extends to multivariate settings.
Use Q-Q plots for each variable, chi-square plots of Mahalanobis distances, or formal tests like Mardia's test for multivariate skewness and kurtosis.
Consider transformations (e.g., log, Box-Cox), use robust methods, or employ distribution-free (nonparametric) alternatives. Many methods are robust to mild departures from normality.
Wishart is the multivariate generalization. When p=1, the Wishart distribution reduces to a scaled chi-squared distribution.
Use Hotelling's T² when testing hypotheses about mean vectors (multiple variables simultaneously). It accounts for correlations between variables and controls overall Type I error.