Discover latent factors underlying observed correlations through factor models, rotation, and interpretation
Factor loadings matrix
Common factors (latent)
Specific (unique) factors
Communality
Uniqueness
Orthogonal (Varimax, Quartimax)
Factors remain uncorrelated. Varimax simplifies columns (factors); Quartimax simplifies rows (variables).
Oblique (Promax, Oblimin)
Factors can correlate. More flexible but interpretation requires factor correlation matrix.
Goal: Achieve "simple structure" where each variable loads highly on few factors and each factor has few high-loading variables.
Varimax maximizes the variance of squared loadings within each factor:
Effect
Pushes loadings toward 0 or ±1, making factors easier to interpret by highlighting which variables "belong" to each factor.
Uses eigenvalue decomposition of correlation matrix:
Advantages
Simple, always produces a solution, no iterations required
Disadvantages
Doesn't distinguish between common and unique variance
Maximizes the likelihood assuming multivariate normality:
Advantages
Statistical inference possible, provides fit statistics (chi-square test)
Disadvantages
Requires normality assumption, may not converge (Heywood cases)
Iteratively estimates communalities and extracts factors from the reduced correlation matrix:
Factor scores estimate the unobserved factor values for each observation:
Regression Method
Bartlett Method
Applications: Use factor scores as input for regression, clustering, or other analyses when you want to work with the underlying constructs rather than observed variables.
Chi-Square Test (ML)
Tests if fits the data. Sensitive to sample size.
RMSEA
Root Mean Square Error of Approximation. Values < 0.05 suggest good fit.
Residual Matrix
Compare . Small residuals indicate good fit.
Tucker-Lewis Index
Compares model fit to null model. Values > 0.95 suggest good fit.
Scree Plot
Plot eigenvalues and look for the "elbow" where values level off
Parallel Analysis
Compare eigenvalues to those from random data; retain factors with eigenvalues above random
Kaiser Criterion
Retain factors with eigenvalue > 1 (for correlation matrix)
Interpretability
Consider whether factors are theoretically meaningful
Factor Analysis
PCA
When to use FA: When you believe underlying latent constructs exist and cause the observed correlations. When to use PCA: When you simply want to reduce dimensionality or create composite scores.
Given 5 variables with the following factor structure:
Factor 1 Interpretation
Variables 1-3 load highly → label based on what these measure
Factor 2 Interpretation
Variables 4-5 load highly → separate construct
Communalities
(73% of variable 1's variance explained by factors)
Exploratory FA (EFA)
Confirmatory FA (CFA)
CFA Fit Indices: Chi-square, CFI (>0.95), TLI (>0.95), RMSEA (<0.06), SRMR (<0.08)
Minimum Sample
At least 100-200 observations; ideally 10-20 per variable
Communality-based
Higher communalities allow smaller samples
KMO Test: Kaiser-Meyer-Olkin measure should be >0.6 to proceed with FA. Bartlett's test should be significant.
Problematic solutions where estimated uniqueness becomes negative (communality > 1):
Causes
Too few factors, too many factors, small sample, multicollinearity
Solutions
Try different number of factors, use different estimation method, constrain uniqueness
Estimate factor scores using regression:
Properties
Minimizes MSE, scores are correlated even for orthogonal factors
When to Use
Most common method, good for prediction
Weighted least squares approach:
Property: Produces unbiased estimates with uncorrelated scores for orthogonal factors, but higher variance than regression method.
Produces orthogonal scores regardless of factor correlation:
Advantage
Scores are exactly uncorrelated and standardized
Use Case
When uncorrelated scores are required for subsequent analysis
Chi-Square Test
Tests
RMSEA
Root Mean Square Error of Approximation; <0.05 good, <0.08 acceptable
TLI/CFI
Comparative fit indices; >0.95 indicates good fit
Residual Matrix
Check for large residuals
The model-implied correlation matrix:
Assessment: Residual correlations should be small (typically <0.05) for good fit.
Salient Loadings
Focus on loadings with |λ| > 0.3 or 0.4
Simple Structure
Each variable loads highly on one factor, low on others
Cross-loadings
Variables loading on multiple factors may indicate complex structure
Factor Naming
Label factors based on common theme of high-loading variables
High Communality (>0.6)
Variable well-explained by the factors
Low Communality (<0.4)
Variable may not fit the factor model; consider removal
Over-extraction
Extracting too many factors leads to unstable, uninterpretable solutions
Under-extraction
Too few factors may miss important structure
Ignoring Rotation
Unrotated solutions are often uninterpretable
Small Sample
Results unstable with n < 100 or ratio < 5:1
Best Practice: Always report multiple criteria for factor number, use multiple rotations, and cross-validate with independent samples when possible.
Psychology
Personality traits (Big Five), intelligence (g-factor)
Marketing
Brand perception, customer satisfaction dimensions
Finance
Risk factors, asset pricing models
Education
Test validation, learning style dimensions
Exploratory FA (EFA)
No prior structure; all loadings estimated freely; data-driven
Confirmatory FA (CFA)
Pre-specified structure; some loadings fixed to 0; theory-driven
Typical workflow: Use EFA on one sample to develop model, then CFA on independent sample to confirm structure.
Chi-Square (χ²)
Exact fit test; sensitive to sample size
CFI/TLI
Comparative fit; should be > 0.95
RMSEA
Approximate fit; should be < 0.06
SRMR
Standardized residuals; should be < 0.08
First-order factors load on second-order (general) factors:
Structure
Variables → Group Factors → General Factor
Example
Intelligence: Verbal, Spatial → g-factor
Variables load on both general and specific factors simultaneously:
Advantage
Separates general from domain-specific variance
Use Case
When items share method variance or have multidimensional structure
R
psych::fa(), lavaan (CFA)
Python
factor_analyzer, semopy
SPSS
Analyze → Dimension Reduction → Factor
Mplus
Specialized for SEM and FA
Cronbach's Alpha
Should be > 0.7 for adequate reliability
McDonald's Omega
Better for congeneric measures; accounts for different loadings
Content Validity
Do items adequately represent the construct?
Construct Validity
Does factor structure match theory?
Convergent Validity
High loadings on intended factor (>0.5)
Discriminant Validity
Low cross-loadings; factors not too correlated
EFA (Exploratory)
CFA (Confirmatory)
Modification Indices
Estimate improvement in chi-square if parameter is freed
Residuals
Examine standardized residuals > 2.58 for misspecification
Correlated Errors
Allow when items share method variance or wording
Cross-loadings
Free if theoretically justified and improves fit substantially
Caution: Don't over-modify based solely on data. Each change should be theory-driven and validated on new sample.
Sequential tests to ensure factor structure is equivalent across groups:
Practical rule: At minimum, metric invariance should hold to compare structural relationships. Scalar invariance needed to compare latent means.
Factor scores estimate factor values for each observation:
Regression Method
Produces correlated scores even with orthogonal factors
Bartlett Method
Produces uncorrelated scores; minimizes uniqueness
Anderson-Rubin
Standardized and uncorrelated; unique solution
Problem: Using PCA when FA is appropriate
Solution: Use FA when modeling latent constructs; use PCA for dimension reduction
Problem: Running FA on unsuitable data
Solution: Check KMO > 0.6; remove items with low MSA values
Problem: Extracting too many factors
Solution: Use parallel analysis; ensure each factor has at least 3 salient loadings
Problem: Naming factors without examining all loadings
Solution: Consider all salient loadings; name based on conceptual meaning
Test whether factor structure is equivalent across groups:
Model stability and change in factor structure over time:
Parallel Items Model
Same items measured at multiple times; test for stability
Growth Curve Model
Model latent trajectories over time with factor scores
Goal: Develop personality scale with 20 items
Under multivariate normality, the ML estimators allow formal hypothesis testing:
The likelihood ratio test statistic is:
Degrees of Freedom
Interpretation
Large chi-square relative to df indicates poor fit; however, the test is sensitive to sample size
The factor model has a fundamental indeterminacy: for any orthogonal matrix T, the transformation preserves the covariance structure:
Implication: Rotation does not change the fit but changes interpretation. This is why rotation is chosen for substantive interpretability rather than statistical optimality.
Given a correlation matrix for 4 variables:
Step 1: Eigenvalues of R are 2.12, 1.45, 0.28, 0.15. Kaiser criterion suggests 2 factors.
Step 2: Extract loadings using principal axis method:
Communalities
Interpretation
Variables 1-2 load on Factor 1; Variables 3-4 load on Factor 2. Clear simple structure achieved.
Use multiple criteria: eigenvalue > 1, scree plot, parallel analysis, interpretability, and fit indices for ML estimation.
Use oblique when you expect factors to correlate (common in social sciences). Use orthogonal when independence is theoretically expected or for simpler interpretation.
FA is model-based, positing latent factors causing correlations. PCA is descriptive, creating composites that maximize variance. FA separates common and unique variance; PCA uses total variance.
Communality is the proportion of variance in an observed variable explained by all factors. High communality (>0.5) indicates the variable is well-represented by the factor solution.
Items loading >0.32 on multiple factors are problematic. Consider: (1) delete the item, (2) respecify factors, (3) use oblique rotation, or (4) accept if loadings differ by >0.2.