MathIsimple
Lesson 4.2: Scatter Plots & Linear Models

Lines that Explain Relationships

Represent paired data, identify positive/negative or no association, and model relationships with linear functions for prediction.

Learning Objectives

  • Construct scatter plots for paired data
  • Identify association direction and strength
  • Fit linear models and interpret m, b
  • Use models for prediction and evaluate reasonableness

Worked Example

Study time xx (hours): 1,2,3,4; Scores yy: 45,50,55,60. Find model and predict yy when x=5x=5.

Slope m=5m=5, intercept b=40b=40 → model: y^=5x+40\hat{y}=5x+40

Prediction at x=5x=5y^=65\hat{y}=65

Residuals & Least Squares

Residual of a point (xi,yi)(x_i,y_i): ei=yiy^ie_i=y_i-\hat{y}_i. Positive residual means the model underpredicts.

Least squares line minimizes ei2\sum e_i^2. Its slope/intercept can be computed (in advanced courses) from summary stats.

Residual plot: random scatter around 0 suggests a linear model is appropriate; patterns suggest nonlinearity or heteroscedasticity.

Correlation r and R²

Correlation coefficient rr measures linear association strength (−1 ≤ r ≤ 1). Sign gives direction; |r| close to 1 indicates strong linear association.

Coefficient of determination R2R^2: fraction of variability in yy explained by the model; R2=r2R^2=r^2 for simple linear regression.

Contextual interpretation: “About 100R2%100R^2\% of the variation in y can be explained by x via the linear model.”

Common Pitfalls

  • Extrapolation: predictions far beyond observed x are unreliable.
  • Leverage/outliers: extreme x values can unduly affect the fit; inspect residuals and influential points.
  • Nonlinear relationships: a high |r| is not guaranteed; check plots before fitting linear models.
  • Correlation ≠ causation: strong r does not imply cause-and-effect.

Advanced Worked Examples

A) Compute residuals

Using y^=5x+40\hat{y}=5x+40, compute residuals for points (1,45), (2,50), (3,60).

(1,45): e=45(45)=0e=45-(45)=0

(2,50): e=50(50)=0e=50-(50)=0

(3,60): e=60(55)=5e=60-(55)=5 (model underpredicts)

B) Interpret R²

If r=0.9r=0.9, then R2=0.81R^2=0.81: about 81% of score variation is explained by study time.

Practice

1) For points (0,2), (2,6), (4,10), find linear model.

Solution
m=2,b=2Rightarrowhaty=2x+2m=2, b=2 Rightarrow hat{y}=2x+2

2) Given model y^=3x+1\hat{y}=3x+1, compute residual for (x,y)=(4,11).

Solution
e=11(3cdot4+1)=1113=2e=11-(3cdot4+1)=11-13=-2

3) Explain why extrapolating from x∈[1,4] to x=20 can be risky.

Sample Answer
Relationship may change; linear pattern outside observed range is not guaranteed.