Lesson 4.2: Scatter Plots & Linear Models

Lines that Explain Relationships

Represent paired data, identify positive/negative or no association, and model relationships with linear functions for prediction.

Learning Objectives

Construct scatter plots for paired data
Identify association direction and strength
Fit linear models and interpret m, b
Use models for prediction and evaluate reasonableness

Worked Example

Study time $x$ (hours): 1,2,3,4; Scores $y$ : 45,50,55,60. Find model and predict $y$ when $x=5$ .

Slope $m=5$ , intercept $b=40$ → model: $\hat{y}=5x+40$

Prediction at $x=5$ → $\hat{y}=65$

Residuals & Least Squares

Residual of a point $(x_i,y_i)$ : $e_i=y_i-\hat{y}_i$ . Positive residual means the model underpredicts.

Least squares line minimizes $\sum e_i^2$ . Its slope/intercept can be computed (in advanced courses) from summary stats.

Residual plot: random scatter around 0 suggests a linear model is appropriate; patterns suggest nonlinearity or heteroscedasticity.

Correlation r and R²

Correlation coefficient $r$ measures linear association strength (−1 ≤ r ≤ 1). Sign gives direction; |r| close to 1 indicates strong linear association.

Coefficient of determination $R^2$ : fraction of variability in $y$ explained by the model; $R^2=r^2$ for simple linear regression.

Contextual interpretation: “About $100R^2\%$ of the variation in y can be explained by x via the linear model.”

Common Pitfalls

Extrapolation: predictions far beyond observed x are unreliable.
Leverage/outliers: extreme x values can unduly affect the fit; inspect residuals and influential points.
Nonlinear relationships: a high |r| is not guaranteed; check plots before fitting linear models.
Correlation ≠ causation: strong r does not imply cause-and-effect.

Advanced Worked Examples

A) Compute residuals

Using $\hat{y}=5x+40$ , compute residuals for points (1,45), (2,50), (3,60).

(1,45): $e=45-(45)=0$

(2,50): $e=50-(50)=0$

(3,60): $e=60-(55)=5$ (model underpredicts)

B) Interpret R²

If $r=0.9$ , then $R^2=0.81$ : about 81% of score variation is explained by study time.

Practice

1) For points (0,2), (2,6), (4,10), find linear model.

Solution

m=2, b=2 Rightarrow hat{y}=2x+2

2) Given model $\hat{y}=3x+1$ , compute residual for (x,y)=(4,11).

Solution

e=11-(3cdot4+1)=11-13=-2

3) Explain why extrapolating from x∈[1,4] to x=20 can be risky.

Sample Answer

Relationship may change; linear pattern outside observed range is not guaranteed.

← Previous Lesson Next Lesson: Correlation vs Causation →