Master the foundational concepts of multivariate analysis: random vectors, covariance matrices, and sample statistics
Multivariate analysis is the statistical analysis of data involving multiple variables measured on the same observations. It considers relationships among three or more variables simultaneously.
Data Reduction
Simplify complex data (PCA, Factor Analysis)
Classification
Group observations (Discriminant, Cluster Analysis)
Multivariate data is organized in a data matrix where rows represent observations and columns represent variables:
Dimensions
= observations (rows), = variables (columns)
Row Vector
Transpose
Trace
If is non-zero and satisfies this equation, then is an eigenvalue and is the corresponding eigenvector.
A symmetric matrix is positive definite if:
Equivalent: All eigenvalues
Linear Transformation
Properties
Transformation
Sample Mean Vector
Sample Covariance Matrix
Note: Division by (Bessel's correction) gives an unbiased estimator:
The Mahalanobis distance measures distance accounting for correlations between variables:
When
Reduces to squared Euclidean distance
Applications
Outlier detection, classification, clustering
Any symmetric matrix can be decomposed as:
is orthogonal
is diagonal
Square Root of Positive Definite Matrix
Matrix Powers
Matrix Inverse
Determinant via Eigenvalues
The determinant equals the product of all eigenvalues.
A covariance matrix can be partitioned to analyze subsets of variables:
:
Covariance within first group
:
Cross-covariance between groups
Symmetry Property
The inverse of a partitioned positive definite matrix involves the Schur complement:
Schur Complement
Interpretation
The Schur complement is the conditional covariance matrix of given .
Total Variance
Sum of all variances
Generalized Variance
Product of eigenvalues
Geometric Interpretation
The generalized variance is proportional to the squared volume of the concentration ellipsoid. It measures how "spread out" the data is in all directions simultaneously.
When
The variables are linearly dependent (at least one eigenvalue is zero). The data lies in a lower-dimensional subspace.
For multivariate data, the concentration ellipsoid generalizes the concept of confidence intervals:
Principal Axes
Directions given by eigenvectors
Semi-axis Lengths
Proportional to
For :
Mean
Variance
Multiple Linear Combinations
For :
For two linear combinations and :
Special Case: Uncorrelated Combinations
and are uncorrelated if
Among all unit-length linear combinations with :
achieved when (first eigenvector)
Foundation for PCA
This result is the theoretical basis for Principal Component Analysis. The first PC is the direction of maximum variance.
Sample of n observations from p-variate distribution:
Sample mean vector:
Test your understanding with 10 multiple-choice questions
The covariance matrix contains raw covariances between variables, which depend on the scales of measurement. The correlation matrix standardizes these to values between -1 and 1, making it easier to compare relationships across variables with different scales. Correlation matrix has 1s on the diagonal.
Unlike Euclidean distance, Mahalanobis distance accounts for correlations between variables and scales by the variance structure. It measures how many standard deviations away a point is from the center, making it useful for outlier detection and classification in multivariate data.
The covariance matrix must be positive semi-definite because variances cannot be negative. For any linear combination of variables, the variance must be non-negative: . This property ensures the covariance matrix represents a valid probability distribution.
The eigenvalues of a covariance matrix represent the variance along the principal axes of the data distribution. Larger eigenvalues indicate directions with more variability. The sum of eigenvalues equals the total variance, and this concept is fundamental to Principal Component Analysis (PCA).
Use n-1 (Bessel's correction) when computing the sample covariance matrix to get an unbiased estimator of the population covariance. Use n when you want the maximum likelihood estimate or when working with the entire population. Most statistical software uses n-1 by default.