Multivariate Statistics

Multivariate Statistics Practice

Problems on PCA, factor analysis, discriminant analysis, MANOVA, clustering, and canonical correlation

8 Problems

Suggested: 2 hours

Instructions

• Try to solve each problem before viewing the solution
• Click "Show Solution" to reveal the answer and detailed explanation
• Focus on understanding the problem-solving methodology

1Multivariate Normal Distribution

Problem

Let $\mathbf{X} = (X_1, X_2, X_3)^T \sim N(\boldsymbol{\mu}, \Sigma)$ where:

\boldsymbol{\mu} = \begin{pmatrix} 1 \\ 2 \\ 3 \end{pmatrix}, \quad \Sigma = \begin{pmatrix} 4 & 2 & 0 \\ 2 & 9 & 1 \\ 0 & 1 & 1 \end{pmatrix}

(1) Find the distribution of X₁ + X₂.

(2) Find the conditional distribution of X₃ given X₁ = 2, X₂ = 3.

(3) Are X₁ and X₃ independent?

Answer Summary

Use linear-combination, conditioning, and covariance rules for Gaussian vectors, where zero covariance can imply independence.

2Principal Component Analysis Application

Problem

PCA is performed on a 5-variable dataset. The eigenvalues of the correlation matrix are: 2.8, 1.5, 0.4, 0.2, 0.1.

(1) How much variance is explained by the first PC?

(2) How many PCs would you retain using Kaiser's criterion?

(3) What percentage of variance is retained with 2 PCs?

Answer Summary

Compute explained variance from eigenvalues and use retention rules such as Kaiser or cumulative variance to interpret the component set.

3Hotelling's T² Test

Problem

Test H₀: $\boldsymbol{\mu} = \boldsymbol{\mu}_0$ for a 3-dimensional multivariate normal with n = 25 observations.

Given: $\bar{\mathbf{x}} = (5, 7, 9)^T$ , $\boldsymbol{\mu}_0 = (4, 6, 8)^T$ , and sample covariance S.

(1) Write the T² statistic formula.

(2) What is the null distribution of T²?

(3) How do you convert T² to an F-statistic?

Answer Summary

Set up the quadratic form for mean-vector testing, then convert it to the equivalent F statistic for inference.

4Discriminant Analysis

Problem

Two populations with equal covariance matrices:

\boldsymbol{\mu}_1 = \begin{pmatrix} 2 \\ 3 \end{pmatrix}, \quad \boldsymbol{\mu}_2 = \begin{pmatrix} 4 \\ 1 \end{pmatrix}, \quad \Sigma = \begin{pmatrix} 2 & 1 \\ 1 & 2 \end{pmatrix}

(1) Find Fisher's linear discriminant function.

(2) Classify a new observation $\mathbf{x}_0 = (3, 2)^T$ assuming equal priors.

Answer Summary

Build the discriminant direction from the group means and covariance structure, then classify the new point using the decision rule.

5Factor Analysis Rotation

Problem

After extracting 2 factors, the unrotated loadings are:

L = \begin{pmatrix} 0.8 & 0.4 \\ 0.7 & 0.5 \\ 0.6 & -0.3 \\ 0.5 & -0.6 \end{pmatrix}

(1) Why is rotation often applied?

(2) What is the goal of varimax rotation?

(3) Interpret the factor pattern (which variables load on which factors).

Answer Summary

Distinguish extraction from rotation and focus on how rotation sharpens interpretation without changing total explained structure.

6Cluster Analysis Validation

Problem

K-means clustering with k = 3 produces within-cluster sum of squares: WSS₁ = 50, WSS₂ = 45, WSS₃ = 40.

(1) Calculate total WSS.

(2) If total SS = 200, find the R² measure.

(3) Describe the silhouette coefficient and its interpretation.

Answer Summary

Judge the clustering result with validation metrics and structure diagnostics, not just the algorithm's raw partition output.

7Canonical Correlation

Problem

Two sets of variables: $\mathbf{X} = (X_1, X_2)$ and $\mathbf{Y} = (Y_1, Y_2)$ .

First canonical correlation ρ₁ = 0.85.

(1) What does canonical correlation measure?

(2) How many canonical correlations exist?

(3) Interpret ρ₁ = 0.85.

Answer Summary

Relate two variable sets by maximizing correlation between linear combinations and interpret both the number and size of canonical dimensions.

8MANOVA vs Multiple ANOVAs

Problem

Compare 3 treatment groups on 4 response variables.

(1) Why use MANOVA instead of 4 separate ANOVAs?

(2) State the null hypothesis in MANOVA.

(3) Name three MANOVA test statistics and their properties.

Answer Summary

Compare a joint multivariate test with separate univariate tests so the answer highlights correlation, Type I error, and overall effect structure.

Related Theory

Review Multivariate Analysis

Revisit the theory that supports these worked practice problems.

Review Multivariate Normal Distribution

Revisit the theory that supports these worked practice problems.

Review Inference on Mean Vectors

Revisit the theory that supports these worked practice problems.

Review Principal Component Analysis

Revisit the theory that supports these worked practice problems.