MathIsimple
Advanced Topic
5-7 Hours

Canonical Correlation Analysis

Analyze relationships between two sets of variables through canonical variates and correlations

Learning Objectives
Understand the goal of canonical correlation analysis
Derive canonical variates and correlations
Interpret canonical loadings and cross-loadings
Test significance of canonical correlations
Calculate and interpret redundancy
Apply CCA to real problems

CCA Framework

Problem Setup

Given two variable sets X(1)\mathbf{X}^{(1)} (p variables) and X(2)\mathbf{X}^{(2)} (q variables), find linear combinations:

U=aTX(1),V=bTX(2)U = \mathbf{a}^T\mathbf{X}^{(1)}, \quad V = \mathbf{b}^T\mathbf{X}^{(2)}

Objective

Maximize ρ=Corr(U,V)\rho = \text{Corr}(U, V)

Covariance Partition
Σ=(Σ11Σ12Σ21Σ22)\boldsymbol{\Sigma} = \begin{pmatrix} \boldsymbol{\Sigma}_{11} & \boldsymbol{\Sigma}_{12} \\ \boldsymbol{\Sigma}_{21} & \boldsymbol{\Sigma}_{22} \end{pmatrix}

Within-set Covariances

Σ11\boldsymbol{\Sigma}_{11} and Σ22\boldsymbol{\Sigma}_{22}

Between-set Covariance

Σ12=Σ21T\boldsymbol{\Sigma}_{12} = \boldsymbol{\Sigma}_{21}^T

Canonical Correlations

Eigenvalue Problem

Canonical correlations are square roots of eigenvalues of:

Σ111Σ12Σ221Σ21\boldsymbol{\Sigma}_{11}^{-1}\boldsymbol{\Sigma}_{12}\boldsymbol{\Sigma}_{22}^{-1}\boldsymbol{\Sigma}_{21}

Ordering

ρ1ρ2ρr0\rho_1 \geq \rho_2 \geq \cdots \geq \rho_r \geq 0

where r = min(p, q)

Testing Significance

Wilks' Lambda

Λk=i=kr(1ρi2)\Lambda_k = \prod_{i=k}^r (1 - \rho_i^2)

Test sequentially: First test if any correlations are significant, then if remaining ones are, etc.

Interpretation

Loadings & Redundancy

Canonical Loadings

Correlations between original variables and their own canonical variates. Interpret like factor loadings.

Cross-Loadings

Correlations between variables and the other set's canonical variates. Show direct relationships.

Redundancy Index

Proportion of variance in one set explained by canonical variates of the other set. More interpretable than canonical correlation alone.

Redundancy Calculation

Redundancy for set 1 given set 2:

RYX=i=1r(1pj=1plij2)ρi2R_{Y|X} = \sum_{i=1}^r \left(\frac{1}{p}\sum_{j=1}^p l_{ij}^2\right) \rho_i^2

lij2l_{ij}^2

Squared loadings (variance extracted by variate)

ρi2\rho_i^2

Squared canonical correlation (shared variance)

Computing CCA

Computational Approach

Steps to compute canonical correlations:

  1. Compute sample covariance matrices S11,S22,S12\mathbf{S}_{11}, \mathbf{S}_{22}, \mathbf{S}_{12}
  2. Form the matrix S111S12S221S21\mathbf{S}_{11}^{-1}\mathbf{S}_{12}\mathbf{S}_{22}^{-1}\mathbf{S}_{21}
  3. Find eigenvalues λi\lambda_i; canonical correlations are ρi=λi\rho_i = \sqrt{\lambda_i}
  4. Eigenvectors give canonical weight vectors
Sample vs Population

Sample Canonical Correlations

Biased upward, especially with small n or many variables

Shrinkage Adjustment

Use adjusted or cross-validated estimates for interpretation

Connections to Other Methods

Special Cases

Multiple Regression

When one set has one variable, ρ1=R\rho_1 = R (multiple correlation)

Hotelling's T²

When one set is a group indicator

MANOVA

One set is group membership coded as dummies

Discriminant Analysis

Canonical variates become discriminant functions

Assumptions & Considerations

Key Assumptions

Linear Relationships

CCA only captures linear associations between sets

Multivariate Normality

Required for significance tests

Sample Size

n should be much larger than p + q

No Multicollinearity

Within-set covariance matrices must be invertible

Worked Example

Example: Two Sets of Variables

Set 1: Academic measures (Math, Reading). Set 2: Motivation measures (Interest, Effort).

Results

ρ1=0.72,ρ2=0.35\rho_1 = 0.72, \rho_2 = 0.35

Test

Λ=0.42\Lambda = 0.42, p < 0.001

Interpretation: First canonical correlation is strong (0.72), suggesting academic and motivation measures share substantial linear relationship.

Hypothesis Testing in CCA

Testing All Correlations

Test H0:ρ1=ρ2==ρs=0H_0: \rho_1 = \rho_2 = \cdots = \rho_s = 0

Λ=i=1s(1ρi2)\Lambda = \prod_{i=1}^s (1 - \rho_i^2)

Chi-Square Approximation

[n112(p+q+1)]lnΛχpq2-[n - 1 - \frac{1}{2}(p+q+1)]\ln\Lambda \sim \chi^2_{pq}

Interpretation

Small Λ (close to 0) → reject H₀ → significant relationship

Sequential Testing

Test remaining correlations after removing first k:

Λk=i=k+1s(1ρi2)\Lambda_k = \prod_{i=k+1}^s (1 - \rho_i^2)

Procedure: Test all, then test 2nd through sth, etc. Stop when test is not significant.

Interpretation Guidelines

Canonical Loadings

Correlations between original variables and canonical variates:

Structure Coefficients

Correlations r(Xj,Ui)r(X_j, U_i) and r(Yk,Vi)r(Y_k, V_i)

Use

Identify which variables contribute most to each dimension

Rule of thumb: Focus on loadings with |r| > 0.3 or 0.4 for interpretation.

Redundancy Analysis

Proportion of variance in one set explained by the other:

RdYX=i=1sρi2Var(Vi explained by Y)qRd_{Y|X} = \sum_{i=1}^s \rho_i^2 \cdot \frac{\text{Var}(V_i \text{ explained by } Y)}{q}

Interpretation

Average variance explained across canonical dimensions

Note

Redundancy is asymmetric: Rd(Y|X) ≠ Rd(X|Y)

Assumptions and Diagnostics

Key Assumptions

Linearity

CCA detects only linear relationships

Multivariate Normality

Required for hypothesis tests; less critical for descriptive use

No Multicollinearity

Variables within sets should not be too highly correlated

Sample Size

n should be at least 10× total number of variables

Applications

Common Use Cases

Psychology

Relating personality traits to behavioral outcomes

Ecology

Species composition vs environmental variables

Education

Academic performance vs motivation/study habits

Neuroimaging

Brain activity patterns vs behavioral measures

Connections to Other Methods

Method Relationships

Multiple Regression

CCA with q=1 equals multiple regression (R = ρ₁)

PCA

CCA with X=Y yields principal components

Discriminant Analysis

LDA is CCA with one set being group indicators

PLS

Partial Least Squares maximizes covariance instead of correlation

Software Implementation

Common Software

R

cancor(), CCA package, vegan::cca()

Python

sklearn.cross_decomposition.CCA

SAS

PROC CANCORR

SPSS

Use MANOVA syntax or macros

Practical Considerations

Data Requirements

Sample Size

n should exceed 10(p+q); unstable with small samples

Variable Selection

Include theoretically relevant variables; avoid redundancy

Outliers

Check multivariate outliers; can strongly influence results

Multicollinearity

High collinearity within sets causes instability

Reporting Results
  • • Report canonical correlations and their significance
  • • Include standardized canonical coefficients
  • • Present structure coefficients (loadings)
  • • Report redundancy indices for each set
  • • Interpret meaningful canonical dimensions

Extensions and Variants

Regularized CCA

For high-dimensional data or when n < p+q:

Ridge CCA

Add penalty to within-set covariance matrices

Sparse CCA

L1 penalty for variable selection (LASSO-type)

Kernel CCA

Capture nonlinear relationships via kernel trick:

Idea

Map variables to high-dimensional space where relationships are linear

Kernels

Polynomial, RBF (Gaussian), sigmoid

Deep CCA

Neural network-based approach for complex relationships:

Learn nonlinear transformations of X and Y that maximize correlation; useful for multi-modal data (e.g., image + text)

Comprehensive Example

Education Study Example

Research Question: How do cognitive abilities relate to academic achievement?

Set X (Cognitive)

Verbal reasoning, Spatial ability, Processing speed, Working memory (p=4)

Set Y (Academic)

Math score, Reading score, Science score (q=3)

Results (n=200)

ρ1=0.68\rho_1 = 0.68 (p < 0.001): General cognitive ability ↔ Overall achievement

ρ2=0.42\rho_2 = 0.42 (p < 0.01): Verbal vs Spatial ↔ Reading vs Math/Science

ρ3=0.18\rho_3 = 0.18 (p > 0.05): Not significant

Interpretation

Two meaningful dimensions: (1) General ability-achievement link explains most shared variance; (2) Specific verbal-spatial pattern relates to reading vs STEM performance

Common Pitfalls

Mistakes to Avoid

Over-interpretation

Don't interpret all canonical dimensions; focus on significant ones

Ignoring Loadings

Coefficients alone can be misleading; examine structure coefficients

Small Sample

CCA unstable with n < 10(p+q); correlations can be spuriously high

Confusing with Regression

CCA is symmetric; neither set is "dependent"

Practical Workflow for CCA

Step-by-Step Analysis
  1. Data preparation: Check for multivariate normality, outliers, and missing data
  2. Examine correlations: Within-set and between-set correlation matrices
  3. Center and standardize: If variables have different scales
  4. Compute canonical variates: Using eigendecomposition or SVD
  5. Test significance: Use Wilks Lambda for each canonical correlation
  6. Interpret structure: Examine canonical loadings and cross-loadings
  7. Assess redundancy: Calculate variance explained in each set
  8. Validate results: Cross-validate on holdout sample if possible
Reporting Results

Essential information to include:

Canonical Correlations

Report values and significance tests

Redundancy Analysis

Variance explained in each variable set

Structure Coefficients

Canonical loadings for interpretation

Sample Size

Report n, p, q, and ratio

Comprehensive Example

Academic Performance Study

Scenario: Examine relationship between academic skills and achievement

Set 1 (Skills): Reading comprehension, Math reasoning, Verbal ability

Set 2 (Achievement): GPA, Test scores, Assignment grades

First Canonical Pair

r1=0.82,p<0.001r_1 = 0.82, \, p < 0.001

Overall academic ability

Redundancy

Skills explain 45% of achievement variance

Interpretation

Strong skills-achievement relationship

CCA vs Other Multivariate Methods

CCA vs Multiple Regression

Multiple Regression: Multiple predictors → single outcome

CCA: Multiple predictors ↔ multiple outcomes (symmetric)

Use CCA when: Multiple DVs all equally important

CCA vs MANOVA

MANOVA: Categorical IVs → multiple continuous DVs

CCA: Continuous variables in both sets

Use CCA when: All variables continuous

CCA vs PCA

PCA: Reduce single set of variables

CCA: Find relationships between two sets

Use CCA when: Two distinct variable sets to relate

CCA vs SEM

SEM: Test specific structural model with latent variables

CCA: Exploratory symmetric relationships

Use SEM when: Theory-driven confirmatory analysis

Assumptions and Diagnostics

Key Assumptions

1. Linearity

Relationships should be linear; check scatter plots

2. Multivariate Normality

Use Mardia's test or Q-Q plots

3. Homoscedasticity

Constant variance across canonical variates

4. No Multicollinearity

Check VIF; avoid near-perfect correlations

Diagnostic Checks
  • Outliers: Use Mahalanobis distance for multivariate outliers
  • Influential cases: Examine Cook's D for canonical variates
  • Stability: Bootstrap canonical correlations if n is moderate
  • Cross-validation: Split sample or use k-fold CV to assess generalizability

Mathematical Derivation

Optimization Problem

The first canonical correlation maximizes:

maxa,bCorr(aTX(1),bTX(2))=maxa,baTΣ12baTΣ11abTΣ22b\max_{\mathbf{a}, \mathbf{b}} \text{Corr}(\mathbf{a}^T\mathbf{X}^{(1)}, \mathbf{b}^T\mathbf{X}^{(2)}) = \max_{\mathbf{a}, \mathbf{b}} \frac{\mathbf{a}^T\boldsymbol{\Sigma}_{12}\mathbf{b}}{\sqrt{\mathbf{a}^T\boldsymbol{\Sigma}_{11}\mathbf{a}}\sqrt{\mathbf{b}^T\boldsymbol{\Sigma}_{22}\mathbf{b}}}

Subject to normalization constraints:

aTΣ11a=1,bTΣ22b=1\mathbf{a}^T\boldsymbol{\Sigma}_{11}\mathbf{a} = 1, \quad \mathbf{b}^T\boldsymbol{\Sigma}_{22}\mathbf{b} = 1

This reduces to finding eigenvectors of Σ111Σ12Σ221Σ21\boldsymbol{\Sigma}_{11}^{-1}\boldsymbol{\Sigma}_{12}\boldsymbol{\Sigma}_{22}^{-1}\boldsymbol{\Sigma}_{21} or equivalently Σ221Σ21Σ111Σ12\boldsymbol{\Sigma}_{22}^{-1}\boldsymbol{\Sigma}_{21}\boldsymbol{\Sigma}_{11}^{-1}\boldsymbol{\Sigma}_{12}

Lagrange Multiplier Approach

Using Lagrange multipliers with the normalization constraints:

L=aTΣ12bλ12(aTΣ11a1)λ22(bTΣ22b1)\mathcal{L} = \mathbf{a}^T\boldsymbol{\Sigma}_{12}\mathbf{b} - \frac{\lambda_1}{2}(\mathbf{a}^T\boldsymbol{\Sigma}_{11}\mathbf{a} - 1) - \frac{\lambda_2}{2}(\mathbf{b}^T\boldsymbol{\Sigma}_{22}\mathbf{b} - 1)

Taking derivatives and setting to zero yields:

Σ12b=λ1Σ11a\boldsymbol{\Sigma}_{12}\mathbf{b} = \lambda_1 \boldsymbol{\Sigma}_{11}\mathbf{a}Σ21a=λ2Σ22b\boldsymbol{\Sigma}_{21}\mathbf{a} = \lambda_2 \boldsymbol{\Sigma}_{22}\mathbf{b}

Key Result

At optimum, λ1=λ2=ρ\lambda_1 = \lambda_2 = \rho (the canonical correlation)

Solution

Combine equations to get eigenvalue problem

Eigenvalue Equations

The canonical weight vectors satisfy:

Σ111Σ12Σ221Σ21a=ρ2a\boldsymbol{\Sigma}_{11}^{-1}\boldsymbol{\Sigma}_{12}\boldsymbol{\Sigma}_{22}^{-1}\boldsymbol{\Sigma}_{21}\mathbf{a} = \rho^2 \mathbf{a}Σ221Σ21Σ111Σ12b=ρ2b\boldsymbol{\Sigma}_{22}^{-1}\boldsymbol{\Sigma}_{21}\boldsymbol{\Sigma}_{11}^{-1}\boldsymbol{\Sigma}_{12}\mathbf{b} = \rho^2 \mathbf{b}

Note: Both matrices have the same non-zero eigenvalues ρ12,ρ22,,ρr2\rho_1^2, \rho_2^2, \ldots, \rho_r^2 where r = min(p, q). The canonical correlations are the square roots of these eigenvalues.

Step-by-Step Numerical Example

Example Setup

Consider two variable sets with correlation structure:

Set 1: X₁, X₂ (p=2)

Set 2: Y₁, Y₂ (q=2)

Correlation matrix:

R=(10.60.50.30.610.40.50.50.410.70.30.50.71)\mathbf{R} = \begin{pmatrix} 1 & 0.6 & 0.5 & 0.3 \\ 0.6 & 1 & 0.4 & 0.5 \\ 0.5 & 0.4 & 1 & 0.7 \\ 0.3 & 0.5 & 0.7 & 1 \end{pmatrix}

R₁₁

(10.60.61)\begin{pmatrix} 1 & 0.6 \\ 0.6 & 1 \end{pmatrix}

R₁₂

(0.50.30.40.5)\begin{pmatrix} 0.5 & 0.3 \\ 0.4 & 0.5 \end{pmatrix}

R₂₂

(10.70.71)\begin{pmatrix} 1 & 0.7 \\ 0.7 & 1 \end{pmatrix}
Computation Steps

Step 1: Compute the matrix product:

M=R111R12R221R21\mathbf{M} = \mathbf{R}_{11}^{-1}\mathbf{R}_{12}\mathbf{R}_{22}^{-1}\mathbf{R}_{21}

Step 2: Find eigenvalues of M:

λ1=0.49,λ2=0.09\lambda_1 = 0.49, \quad \lambda_2 = 0.09

Step 3: Canonical correlations:

ρ1=0.49=0.70,ρ2=0.09=0.30\rho_1 = \sqrt{0.49} = 0.70, \quad \rho_2 = \sqrt{0.09} = 0.30

First Dimension

ρ₁ = 0.70 captures main relationship between sets

Second Dimension

ρ₂ = 0.30 captures residual relationship

Significance Testing

Test overall relationship with n=100:

Λ=(10.702)(10.302)=0.51×0.91=0.46\Lambda = (1 - 0.70^2)(1 - 0.30^2) = 0.51 \times 0.91 = 0.46
χ2=[100112(2+2+1)]ln(0.46)=96.5×(0.777)=75.0\chi^2 = -[100 - 1 - \frac{1}{2}(2+2+1)]\ln(0.46) = -96.5 \times (-0.777) = 75.0

Degrees of Freedom

df=pq=2×2=4df = pq = 2 \times 2 = 4

Decision

χ² = 75.0, p < 0.001 → Reject H₀

Conclusion: Significant relationship between the two variable sets exists.

Interpreting Canonical Variates

Canonical Weights vs Loadings

Canonical Weights (Coefficients)

Vectors a,b\mathbf{a}, \mathbf{b} that define the linear combinations

May be difficult to interpret due to multicollinearity

Canonical Loadings (Structure)

Correlations r(Xj,U)r(X_j, U) and r(Yk,V)r(Y_k, V)

Preferred for interpretation; analogous to factor loadings

Best Practice: Report both weights and loadings. Use loadings with |r| > 0.3 or 0.4 for substantive interpretation of what each canonical variate represents.

Example Interpretation

Suppose first canonical pair has loadings:

U₁ Loadings (Set 1)

Verbal: 0.85, Math: 0.80, Spatial: 0.45

→ General cognitive ability

V₁ Loadings (Set 2)

GPA: 0.90, Test: 0.85, Projects: 0.70

→ Overall academic success

Interpretation: First dimension (ρ₁ = 0.75) represents the relationship between general cognitive ability and overall academic performance.

Sample Size and Power Considerations

Sample Size Guidelines

Minimum Requirement

n > p + q (matrix invertibility)

Conservative Rule

n ≥ 10(p + q) for stable estimates

For Inference

n ≥ 20(p + q) for reliable hypothesis tests

Shrinkage Issue

Sample canonical correlations overestimate population values

Power Consideration: Power depends on population canonical correlations, sample size, and number of variables. Use simulation or pilot data to plan adequate sample size.

Addressing Small Samples
  • Variable selection: Reduce p and q to most theoretically important variables
  • Regularization: Use ridge or sparse CCA for n < p+q
  • Cross-validation: Assess stability with resampling methods
  • Bootstrap: Estimate confidence intervals for canonical correlations

Practice Quiz

Practice Quiz
10
Questions
0
Correct
0%
Accuracy
1
Canonical correlation analysis studies relationships between:
Not attempted
2
The first canonical correlation ρ1\rho_1 is:
Not attempted
3
Canonical variates are:
Not attempted
4
The number of canonical correlations equals:
Not attempted
5
Canonical correlations are always:
Not attempted
6
Wilks' Lambda in CCA tests:
Not attempted
7
Canonical loadings show:
Not attempted
8
Cross-loadings in CCA represent:
Not attempted
9
The redundancy index measures:
Not attempted
10
CCA reduces to multiple regression when:
Not attempted

FAQ

How does CCA relate to multiple regression?

When one set has a single variable, CCA reduces to multiple regression. The canonical correlation equals the multiple R.

What sample size is needed for CCA?

Rule of thumb: n should be at least 10 times the total number of variables. CCA is sensitive to sample size, especially with many variables.

Can I use CCA with categorical variables?

Standard CCA assumes continuous variables. For categorical data, consider correspondence analysis or use dummy coding (with caution about interpretation).

How do I interpret canonical loadings?

Loadings >0.3 are considered meaningful. They show the correlation between original variables and canonical variates. Focus on structure coefficients for interpretation.

What is redundancy analysis in CCA?

Redundancy measures the proportion of variance in one variable set explained by the canonical variates of the other set. It provides practical effect size.

Ask AI ✨
MathIsimple – Simple, Friendly Math Tools & Learning