MathIsimple
Latent Variables
6-8 Hours

Factor Analysis

Discover latent factors underlying observed correlations through factor models, rotation, and interpretation

Learning Objectives
Understand the orthogonal factor model
Estimate factor loadings using different methods
Apply and interpret factor rotation
Calculate communalities and uniqueness
Compute and use factor scores
Compare factor analysis with PCA

The Factor Model

Orthogonal Factor Model
Xμ=LF+ϵ\mathbf{X} - \boldsymbol{\mu} = \mathbf{L}\mathbf{F} + \boldsymbol{\epsilon}

Lp×m\mathbf{L}_{p \times m}

Factor loadings matrix

Fm×1\mathbf{F}_{m \times 1}

Common factors (latent)

ϵp×1\boldsymbol{\epsilon}_{p \times 1}

Specific (unique) factors

Covariance Structure
Σ=LLT+Ψ\boldsymbol{\Sigma} = \mathbf{L}\mathbf{L}^T + \boldsymbol{\Psi}

Communality

hi2=j=1mlij2h_i^2 = \sum_{j=1}^m l_{ij}^2

Uniqueness

ψi=σiihi2\psi_i = \sigma_{ii} - h_i^2

Factor Rotation

Rotation Methods

Orthogonal (Varimax, Quartimax)

Factors remain uncorrelated. Varimax simplifies columns (factors); Quartimax simplifies rows (variables).

Oblique (Promax, Oblimin)

Factors can correlate. More flexible but interpretation requires factor correlation matrix.

Goal: Achieve "simple structure" where each variable loads highly on few factors and each factor has few high-loading variables.

Varimax Criterion

Varimax maximizes the variance of squared loadings within each factor:

V=j=1m[1pi=1plij4(1pi=1plij2)2]V = \sum_{j=1}^m \left[ \frac{1}{p}\sum_{i=1}^p l_{ij}^4 - \left(\frac{1}{p}\sum_{i=1}^p l_{ij}^2\right)^2 \right]

Effect

Pushes loadings toward 0 or ±1, making factors easier to interpret by highlighting which variables "belong" to each factor.

Estimation Methods

Principal Component Method

Uses eigenvalue decomposition of correlation matrix:

L^=[λ^1e^1,,λ^me^m]\hat{\mathbf{L}} = [\sqrt{\hat{\lambda}_1}\hat{\mathbf{e}}_1, \ldots, \sqrt{\hat{\lambda}_m}\hat{\mathbf{e}}_m]

Advantages

Simple, always produces a solution, no iterations required

Disadvantages

Doesn't distinguish between common and unique variance

Maximum Likelihood Method

Maximizes the likelihood assuming multivariate normality:

maxL,ΨlogL(L,ΨS)\max_{\mathbf{L}, \boldsymbol{\Psi}} \log L(\mathbf{L}, \boldsymbol{\Psi} | \mathbf{S})

Advantages

Statistical inference possible, provides fit statistics (chi-square test)

Disadvantages

Requires normality assumption, may not converge (Heywood cases)

Principal Axis Factoring

Iteratively estimates communalities and extracts factors from the reduced correlation matrix:

  1. Replace diagonal of R\mathbf{R} with initial communality estimates
  2. Extract factors via eigendecomposition
  3. Update communality estimates from loadings
  4. Repeat until convergence

Factor Scores

Computing Factor Scores

Factor scores estimate the unobserved factor values for each observation:

Regression Method

F^=LTΣ1(Xμ)\hat{\mathbf{F}} = \mathbf{L}^T\boldsymbol{\Sigma}^{-1}(\mathbf{X} - \boldsymbol{\mu})

Bartlett Method

F^=(LTΨ1L)1LTΨ1(Xμ)\hat{\mathbf{F}} = (\mathbf{L}^T\boldsymbol{\Psi}^{-1}\mathbf{L})^{-1}\mathbf{L}^T\boldsymbol{\Psi}^{-1}(\mathbf{X} - \boldsymbol{\mu})

Applications: Use factor scores as input for regression, clustering, or other analyses when you want to work with the underlying constructs rather than observed variables.

Model Assessment

Goodness of Fit

Chi-Square Test (ML)

Tests if Σ=LLT+Ψ\boldsymbol{\Sigma} = \mathbf{L}\mathbf{L}^T + \boldsymbol{\Psi} fits the data. Sensitive to sample size.

RMSEA

Root Mean Square Error of Approximation. Values < 0.05 suggest good fit.

Residual Matrix

Compare S(L^L^T+Ψ^)\mathbf{S} - (\hat{\mathbf{L}}\hat{\mathbf{L}}^T + \hat{\boldsymbol{\Psi}}). Small residuals indicate good fit.

Tucker-Lewis Index

Compares model fit to null model. Values > 0.95 suggest good fit.

Determining Number of Factors

Scree Plot

Plot eigenvalues and look for the "elbow" where values level off

Parallel Analysis

Compare eigenvalues to those from random data; retain factors with eigenvalues above random

Kaiser Criterion

Retain factors with eigenvalue > 1 (for correlation matrix)

Interpretability

Consider whether factors are theoretically meaningful

FA vs PCA

Key Differences

Factor Analysis

  • • Model-based: latent factors cause correlations
  • • Separates common and unique variance
  • • Loadings are not uniquely determined (rotation)
  • • Statistical inference possible (ML)

PCA

  • • Descriptive: creates weighted composites
  • • Uses total variance
  • • Components uniquely determined
  • • No distributional assumptions

When to use FA: When you believe underlying latent constructs exist and cause the observed correlations. When to use PCA: When you simply want to reduce dimensionality or create composite scores.

Worked Example

Example: Two-Factor Model

Given 5 variables with the following factor structure:

L=(0.850.100.800.150.750.200.150.820.200.78)\mathbf{L} = \begin{pmatrix} 0.85 & 0.10 \\ 0.80 & 0.15 \\ 0.75 & 0.20 \\ 0.15 & 0.82 \\ 0.20 & 0.78 \end{pmatrix}

Factor 1 Interpretation

Variables 1-3 load highly → label based on what these measure

Factor 2 Interpretation

Variables 4-5 load highly → separate construct

Communalities

h12=0.852+0.102=0.73h_1^2 = 0.85^2 + 0.10^2 = 0.73 (73% of variable 1's variance explained by factors)

Confirmatory Factor Analysis

EFA vs CFA

Exploratory FA (EFA)

  • • No prior hypotheses about structure
  • • All variables can load on all factors
  • • Used for theory development

Confirmatory FA (CFA)

  • • Tests specific factor structure
  • • Loadings fixed to 0 or estimated
  • • Used for theory testing

CFA Fit Indices: Chi-square, CFI (>0.95), TLI (>0.95), RMSEA (<0.06), SRMR (<0.08)

Practical Considerations

Sample Size Guidelines

Minimum Sample

At least 100-200 observations; ideally 10-20 per variable

Communality-based

Higher communalities allow smaller samples

KMO Test: Kaiser-Meyer-Olkin measure should be >0.6 to proceed with FA. Bartlett's test should be significant.

Heywood Cases

Problematic solutions where estimated uniqueness becomes negative (communality > 1):

Causes

Too few factors, too many factors, small sample, multicollinearity

Solutions

Try different number of factors, use different estimation method, constrain uniqueness

Factor Score Methods

Regression Method (Thomson)

Estimate factor scores using regression:

f^=LTΣ1x=LT(LLT+Ψ)1x\hat{\mathbf{f}} = \mathbf{L}^T\boldsymbol{\Sigma}^{-1}\mathbf{x} = \mathbf{L}^T(\mathbf{L}\mathbf{L}^T + \boldsymbol{\Psi})^{-1}\mathbf{x}

Properties

Minimizes MSE, scores are correlated even for orthogonal factors

When to Use

Most common method, good for prediction

Bartlett Method

Weighted least squares approach:

f^=(LTΨ1L)1LTΨ1x\hat{\mathbf{f}} = (\mathbf{L}^T\boldsymbol{\Psi}^{-1}\mathbf{L})^{-1}\mathbf{L}^T\boldsymbol{\Psi}^{-1}\mathbf{x}

Property: Produces unbiased estimates with uncorrelated scores for orthogonal factors, but higher variance than regression method.

Anderson-Rubin Method

Produces orthogonal scores regardless of factor correlation:

Advantage

Scores are exactly uncorrelated and standardized

Use Case

When uncorrelated scores are required for subsequent analysis

Model Fit Assessment

Goodness of Fit Statistics

Chi-Square Test

Tests H0:Σ=LLT+ΨH_0: \boldsymbol{\Sigma} = \mathbf{L}\mathbf{L}^T + \boldsymbol{\Psi}

RMSEA

Root Mean Square Error of Approximation; <0.05 good, <0.08 acceptable

TLI/CFI

Comparative fit indices; >0.95 indicates good fit

Residual Matrix

Check SΣ^\mathbf{S} - \hat{\boldsymbol{\Sigma}} for large residuals

Reproduced Correlation Matrix

The model-implied correlation matrix:

R^=LΦLT+Ψ\hat{\mathbf{R}} = \mathbf{L}\boldsymbol{\Phi}\mathbf{L}^T + \boldsymbol{\Psi}

Assessment: Residual correlations rijr^ijr_{ij} - \hat{r}_{ij} should be small (typically <0.05) for good fit.

Interpretation Guidelines

Reading Loading Matrices

Salient Loadings

Focus on loadings with |λ| > 0.3 or 0.4

Simple Structure

Each variable loads highly on one factor, low on others

Cross-loadings

Variables loading on multiple factors may indicate complex structure

Factor Naming

Label factors based on common theme of high-loading variables

Communality Interpretation
hi2=j=1mlij2h_i^2 = \sum_{j=1}^m l_{ij}^2

High Communality (>0.6)

Variable well-explained by the factors

Low Communality (<0.4)

Variable may not fit the factor model; consider removal

Common Pitfalls

Mistakes to Avoid

Over-extraction

Extracting too many factors leads to unstable, uninterpretable solutions

Under-extraction

Too few factors may miss important structure

Ignoring Rotation

Unrotated solutions are often uninterpretable

Small Sample

Results unstable with n < 100 or ratio < 5:1

Best Practice: Always report multiple criteria for factor number, use multiple rotations, and cross-validate with independent samples when possible.

Applications

Common Use Cases

Psychology

Personality traits (Big Five), intelligence (g-factor)

Marketing

Brand perception, customer satisfaction dimensions

Finance

Risk factors, asset pricing models

Education

Test validation, learning style dimensions

Confirmatory Factor Analysis (CFA)

EFA vs CFA

Exploratory FA (EFA)

No prior structure; all loadings estimated freely; data-driven

Confirmatory FA (CFA)

Pre-specified structure; some loadings fixed to 0; theory-driven

Typical workflow: Use EFA on one sample to develop model, then CFA on independent sample to confirm structure.

CFA Fit Indices

Chi-Square (χ²)

Exact fit test; sensitive to sample size

CFI/TLI

Comparative fit; should be > 0.95

RMSEA

Approximate fit; should be < 0.06

SRMR

Standardized residuals; should be < 0.08

Advanced Factor Models

Higher-Order Factor Models

First-order factors load on second-order (general) factors:

Structure

Variables → Group Factors → General Factor

Example

Intelligence: Verbal, Spatial → g-factor

Bifactor Models

Variables load on both general and specific factors simultaneously:

Advantage

Separates general from domain-specific variance

Use Case

When items share method variance or have multidimensional structure

Software Implementation

Common Software

R

psych::fa(), lavaan (CFA)

Python

factor_analyzer, semopy

SPSS

Analyze → Dimension Reduction → Factor

Mplus

Specialized for SEM and FA

Reliability and Validity

Factor Reliability

Cronbach's Alpha

α=kk1(1σi2σT2)\alpha = \frac{k}{k-1}\left(1 - \frac{\sum \sigma_i^2}{\sigma_T^2}\right)

Should be > 0.7 for adequate reliability

McDonald's Omega

Better for congeneric measures; accounts for different loadings

Validity Considerations

Content Validity

Do items adequately represent the construct?

Construct Validity

Does factor structure match theory?

Convergent Validity

High loadings on intended factor (>0.5)

Discriminant Validity

Low cross-loadings; factors not too correlated

Practical Workflow

Step-by-Step Analysis
  1. Check sample adequacy (KMO > 0.6, Bartlett's test significant)
  2. Examine correlation matrix for suitable structure
  3. Extract initial factors (eigenvalue > 1, scree plot)
  4. Use parallel analysis to confirm number of factors
  5. Apply appropriate rotation (oblique if correlations expected)
  6. Evaluate fit (ML: chi-square, RMSEA, CFI)
  7. Interpret factor loadings and name factors
  8. Compute factor scores if needed
  9. Assess reliability (alpha/omega)
  10. Report and visualize results

EFA vs CFA Comparison

Key Differences

EFA (Exploratory)

  • • All variables can load on all factors
  • • Data-driven approach
  • • Discovery of factor structure
  • • Used in scale development

CFA (Confirmatory)

  • • Pre-specified loading pattern
  • • Theory-driven approach
  • • Testing hypothesized structure
  • • Used in scale validation
Research Workflow
  1. Conduct EFA on Sample 1 to discover structure
  2. Refine measurement model based on results
  3. Conduct CFA on Sample 2 to confirm structure
  4. Evaluate model fit and make modifications if needed
  5. Test measurement invariance across groups

Model Modification and Improvement

Improving Model Fit

Modification Indices

Estimate improvement in chi-square if parameter is freed

Residuals

Examine standardized residuals > 2.58 for misspecification

Correlated Errors

Allow when items share method variance or wording

Cross-loadings

Free if theoretically justified and improves fit substantially

Caution: Don't over-modify based solely on data. Each change should be theory-driven and validated on new sample.

Measurement Invariance

Testing Across Groups

Sequential tests to ensure factor structure is equivalent across groups:

  1. Configural invariance: Same factor structure
  2. Metric invariance: Equal factor loadings
  3. Scalar invariance: Equal intercepts
  4. Strict invariance: Equal residual variances

Practical rule: At minimum, metric invariance should hold to compare structural relationships. Scalar invariance needed to compare latent means.

Factor Scores

Computing Individual Scores

Factor scores estimate factor values for each observation:

F^i=WTxi\hat{F}_i = \mathbf{W}^T \mathbf{x}_i

Regression Method

Produces correlated scores even with orthogonal factors

Bartlett Method

Produces uncorrelated scores; minimizes uniqueness

Anderson-Rubin

Standardized and uncorrelated; unique solution

Uses of Factor Scores
  • • Use as predictors in regression/ANOVA
  • • Create composite variables for SEM
  • • Cluster analysis on reduced dimensions
  • • Visualization in scatter plots
  • • Identify high/low scoring individuals

Common Mistakes and Solutions

Mistake 1: Treating FA as PCA

Problem: Using PCA when FA is appropriate

Solution: Use FA when modeling latent constructs; use PCA for dimension reduction

Mistake 2: Ignoring KMO

Problem: Running FA on unsuitable data

Solution: Check KMO > 0.6; remove items with low MSA values

Mistake 3: Over-extraction

Problem: Extracting too many factors

Solution: Use parallel analysis; ensure each factor has at least 3 salient loadings

Mistake 4: Poor Interpretation

Problem: Naming factors without examining all loadings

Solution: Consider all salient loadings; name based on conceptual meaning

Advanced Topics in Factor Analysis

Multiple Group Factor Analysis

Test whether factor structure is equivalent across groups:

  1. Test configural invariance: same number of factors
  2. Test metric invariance: equal factor loadings
  3. Test scalar invariance: equal intercepts
  4. Compare groups on factor means if scalar holds
Longitudinal Factor Analysis

Model stability and change in factor structure over time:

Parallel Items Model

Same items measured at multiple times; test for stability

Growth Curve Model

Model latent trajectories over time with factor scores

Complete Example

Personality Assessment Study

Goal: Develop personality scale with 20 items

  1. Check KMO = 0.85, Bartlett p < 0.001 - suitable for FA
  2. Parallel analysis suggests 4 factors
  3. Extract 4 factors using ML estimation
  4. Apply oblique rotation (promax)
  5. 4 factors explain 62% of variance
  6. Name factors: Extraversion, Agreeableness, Conscientiousness, Neuroticism

Statistical Inference for Factor Models

Maximum Likelihood Test Statistic

Under multivariate normality, the ML estimators allow formal hypothesis testing:

H0:Σ=LLT+ΨH_0: \boldsymbol{\Sigma} = \mathbf{L}\mathbf{L}^T + \boldsymbol{\Psi}

The likelihood ratio test statistic is:

χ2=(n1)[tr(SΣ^1)lnSΣ^1p]\chi^2 = (n-1)\left[\text{tr}(\mathbf{S}\hat{\boldsymbol{\Sigma}}^{-1}) - \ln|\mathbf{S}\hat{\boldsymbol{\Sigma}}^{-1}| - p\right]

Degrees of Freedom

df=12[(pm)2pm]df = \frac{1}{2}[(p-m)^2 - p - m]

Interpretation

Large chi-square relative to df indicates poor fit; however, the test is sensitive to sample size

Identifiability and Uniqueness

The factor model has a fundamental indeterminacy: for any orthogonal matrix T, the transformation preserves the covariance structure:

L=LT,F=TTF\mathbf{L}^* = \mathbf{L}\mathbf{T}, \quad \mathbf{F}^* = \mathbf{T}^T\mathbf{F}LLT=LTTTLT=LLT\mathbf{L}^*\mathbf{L}^{*T} = \mathbf{L}\mathbf{T}\mathbf{T}^T\mathbf{L}^T = \mathbf{L}\mathbf{L}^T

Implication: Rotation does not change the fit but changes interpretation. This is why rotation is chosen for substantive interpretability rather than statistical optimality.

Detailed Numerical Example

Step-by-Step Factor Extraction

Given a correlation matrix for 4 variables:

R=(10.720.310.280.7210.350.300.310.3510.650.280.300.651)\mathbf{R} = \begin{pmatrix} 1 & 0.72 & 0.31 & 0.28 \\ 0.72 & 1 & 0.35 & 0.30 \\ 0.31 & 0.35 & 1 & 0.65 \\ 0.28 & 0.30 & 0.65 & 1 \end{pmatrix}

Step 1: Eigenvalues of R are 2.12, 1.45, 0.28, 0.15. Kaiser criterion suggests 2 factors.

Step 2: Extract loadings using principal axis method:

L=(0.820.120.850.080.250.780.220.80)\mathbf{L} = \begin{pmatrix} 0.82 & 0.12 \\ 0.85 & 0.08 \\ 0.25 & 0.78 \\ 0.22 & 0.80 \end{pmatrix}

Communalities

h12=0.822+0.122=0.69h_1^2 = 0.82^2 + 0.12^2 = 0.69

h32=0.252+0.782=0.67h_3^2 = 0.25^2 + 0.78^2 = 0.67

Interpretation

Variables 1-2 load on Factor 1; Variables 3-4 load on Factor 2. Clear simple structure achieved.

Practice Quiz

Practice Quiz
10
Questions
0
Correct
0%
Accuracy
1
The orthogonal factor model is written as:
Not attempted
2
The covariance structure implied by the factor model is:
Not attempted
3
Communality hi2h_i^2 represents:
Not attempted
4
Uniqueness ψi\psi_i equals:
Not attempted
5
Varimax rotation aims to:
Not attempted
6
Oblique rotation differs from orthogonal rotation because:
Not attempted
7
The principal component method for factor extraction:
Not attempted
8
Factor scores are:
Not attempted
9
Factor analysis differs from PCA primarily because:
Not attempted
10
To determine the number of factors, one can use:
Not attempted

FAQ

How many factors should I extract?

Use multiple criteria: eigenvalue > 1, scree plot, parallel analysis, interpretability, and fit indices for ML estimation.

When should I use oblique vs orthogonal rotation?

Use oblique when you expect factors to correlate (common in social sciences). Use orthogonal when independence is theoretically expected or for simpler interpretation.

What's the difference between FA and PCA?

FA is model-based, positing latent factors causing correlations. PCA is descriptive, creating composites that maximize variance. FA separates common and unique variance; PCA uses total variance.

What is communality?

Communality is the proportion of variance in an observed variable explained by all factors. High communality (>0.5) indicates the variable is well-represented by the factor solution.

How do I handle cross-loadings?

Items loading >0.32 on multiple factors are problematic. Consider: (1) delete the item, (2) respecify factors, (3) use oblique rotation, or (4) accept if loadings differ by >0.2.

Ask AI ✨
MathIsimple – Simple, Friendly Math Tools & Learning