Problems on PCA, factor analysis, discriminant analysis, MANOVA, clustering, and canonical correlation
Instructions
Let where:
(1) Find the distribution of X₁ + X₂.
(2) Find the conditional distribution of X₃ given X₁ = 2, X₂ = 3.
(3) Are X₁ and X₃ independent?
Use linear-combination, conditioning, and covariance rules for Gaussian vectors, where zero covariance can imply independence.
PCA is performed on a 5-variable dataset. The eigenvalues of the correlation matrix are: 2.8, 1.5, 0.4, 0.2, 0.1.
(1) How much variance is explained by the first PC?
(2) How many PCs would you retain using Kaiser's criterion?
(3) What percentage of variance is retained with 2 PCs?
Compute explained variance from eigenvalues and use retention rules such as Kaiser or cumulative variance to interpret the component set.
Test H₀: for a 3-dimensional multivariate normal with n = 25 observations.
Given: , , and sample covariance S.
(1) Write the T² statistic formula.
(2) What is the null distribution of T²?
(3) How do you convert T² to an F-statistic?
Set up the quadratic form for mean-vector testing, then convert it to the equivalent F statistic for inference.
Two populations with equal covariance matrices:
(1) Find Fisher's linear discriminant function.
(2) Classify a new observation assuming equal priors.
Build the discriminant direction from the group means and covariance structure, then classify the new point using the decision rule.
After extracting 2 factors, the unrotated loadings are:
(1) Why is rotation often applied?
(2) What is the goal of varimax rotation?
(3) Interpret the factor pattern (which variables load on which factors).
Distinguish extraction from rotation and focus on how rotation sharpens interpretation without changing total explained structure.
K-means clustering with k = 3 produces within-cluster sum of squares: WSS₁ = 50, WSS₂ = 45, WSS₃ = 40.
(1) Calculate total WSS.
(2) If total SS = 200, find the R² measure.
(3) Describe the silhouette coefficient and its interpretation.
Judge the clustering result with validation metrics and structure diagnostics, not just the algorithm's raw partition output.
Two sets of variables: and .
First canonical correlation ρ₁ = 0.85.
(1) What does canonical correlation measure?
(2) How many canonical correlations exist?
(3) Interpret ρ₁ = 0.85.
Relate two variable sets by maximizing correlation between linear combinations and interpret both the number and size of canonical dimensions.
Compare 3 treatment groups on 4 response variables.
(1) Why use MANOVA instead of 4 separate ANOVAs?
(2) State the null hypothesis in MANOVA.
(3) Name three MANOVA test statistics and their properties.
Compare a joint multivariate test with separate univariate tests so the answer highlights correlation, Type I error, and overall effect structure.
Review Multivariate Analysis
Revisit the theory that supports these worked practice problems.
Review Multivariate Normal Distribution
Revisit the theory that supports these worked practice problems.
Review Inference on Mean Vectors
Revisit the theory that supports these worked practice problems.
Review Principal Component Analysis
Revisit the theory that supports these worked practice problems.