Lesson 4.3: Correlation vs. Causation

Related—But Did One Cause the Other?

Learn what correlation coefficient r means, why correlation ≠ causation, and how randomization helps support causal conclusions.

Learning Objectives

Interpret r (close to 1: strong positive; close to -1: strong negative; ~0: none)
Explain confounding variables and common cause
Design randomized experiments to infer causality
Use random sampling to generalize to populations

Worked Example

Daily steps vs. body weight shows $r=-0.75$ . What does it mean?

Strong negative correlation: more steps tend to be associated with lower weight. This does not prove that steps cause weight loss; other factors (diet, health) may play roles.

Confounding & Simpson’s Paradox

Confounding variable: a third factor influences both x and y, creating a misleading association.

Simpson’s paradox: a trend appears in several groups but reverses when the groups are combined due to different group sizes or confounders.

Experimental Design Essentials

Random sampling → generalize to population; Random assignment → infer causality.

Control group, blinding, and placebo reduce bias; replication increases reliability.

Blocking: group similar subjects then randomize within blocks to control variability.

Practice

1) Give an example of correlation due to a lurking variable.

Hint

Ice cream sales vs. drownings—temperature is the common cause.

2) Outline a randomized experiment to test if a study method increases test scores.

Sample Outline

Randomly assign students to method vs. control; pre/post tests; blind graders; analyze difference in means.

3) Describe a scenario showing Simpson’s paradox and why it happens.

Hint

Treatment success rates higher within each gender, but overall lower due to unequal gender proportions.

← Previous Lesson Back to Unit 4 Overview →