MathIsimple

Lesson 5-2: Hypothesis Testing & Significance

Learning Goals

  • Set up H0 and Ha with precise parameter statements.
  • Compute test statistics for mean and proportion scenarios.
  • Interpret p-values and connect them to α and decisions.
  • Explain Type I/II errors and their practical costs.

Assumptions

  • Random sampling and independence (or justified design).
  • Approximate normality for test statistic (CLT or conditions).
  • Clear definition of α (e.g., 0.05) before viewing data.

Structure of a Test

Hypotheses

Specify the parameter: mean μ\mu or proportion pp. Then define a null hypothesis H0H_0 and an alternative HaH_a.

H0:  p=p0vsHa:  p>p0,  p<p0,  or  pp0H_0: \; p=p_0 \quad vs \quad H_a: \; p>p_0,\; p<p_0,\; \text{or}\; p\ne p_0
H0:  μ=μ0vsHa:  μ>μ0,  μ<μ0,  or  μμ0H_0: \; \mu=\mu_0 \quad vs \quad H_a: \; \mu>\mu_0,\; \mu<\mu_0,\; \text{or}\; \mu\ne \mu_0

Workflow

  1. State the parameter and define H0 and Ha (one- or two-sided).
  2. Check conditions and assumptions for the chosen test.
  3. Compute test statistic and p-value.
  4. Compare p-value with significance level α and conclude.
  5. Interpret the decision in the real context.

Test Statistics

Proportion (Large n)

z=p^p0p0(1p0)nz = \dfrac{ \hat{p} - p_0 }{ \sqrt{ \dfrac{p_0(1-p_0)}{n} } }

Under H0H_0, replace pp by p0p_0 for SE.

Mean (Unknown SD, Large n)

z=xˉμ0s/nz = \dfrac{ \bar{x} - \mu_0 }{ s/\sqrt{n} }

When n is large, the z approximation is often reasonable by the CLT.

p-Value

The p-value is the probability, under H0H_0, of seeing a test statistic at least as extreme as the observed one. Smaller p-values provide stronger evidence against H0H_0.

Example: Proportion, One-Sided

A factory claims defect rate p0.03p \le 0.03. In a sample of 200, 10 are defective. Test H0:p=0.03H_0: p=0.03 vs Ha:p>0.03H_a: p>0.03 at α=0.05\alpha=0.05.

p^=10/200=0.05,  SE=0.030.972000.0122\hat{p}=10/200=0.05, \; SE=\sqrt{ \dfrac{0.03\cdot0.97}{200} } \approx 0.0122
z=0.050.030.01221.64z= \dfrac{0.05-0.03}{0.0122} \approx 1.64

The one-sided p-value is about 0.05. At α=0.05\alpha=0.05, this is borderline; report exact p and context before decisions.

Example: Mean, Two-Sided

Standard lifetime is 1,000 hours. From n=36, xˉ=980\bar{x}=980 hours, s=120s=120 hours. Test H0:μ=1000H_0: \mu=1000 vs Ha:μ1000H_a: \mu \ne 1000.

z=9801000120/6=2020=1z= \dfrac{980-1000}{120/6} = \dfrac{-20}{20} = -1

Two-sided p-value is approximately 0.317. At α=0.05\alpha=0.05, fail to reject H0H_0.

Errors and Test Power

Type I and Type II

  • Type I: Reject true H0H_0 (probability α).
  • Type II: Fail to reject false H0H_0 (probability β).
  • Power: 1β1-\beta; increases with n or larger true effect size.

Design Considerations

  • Choose α based on real costs of errors.
  • Increase n to reduce SE and improve power.
  • Use one-sided tests only when justified by context.

Guided Practice

Set 1: One-Sided Mean Test

  1. Parameter μ: population mean weight. H₀: μ = 50, Hₐ: μ > 50.
  2. Check: random sample, n=40> 30 for CLT validity.
  3. z = (x̄-50)/(s/√n), p-value = P(Z > z) for upper tail.
  4. If p < 0.05, reject H₀: evidence mean weight exceeds 50.

Set 2: Two-Sided Proportion Test

  1. Parameter p: defect rate. H₀: p = 0.05, Hₐ: p ≠ 0.05.
  2. Check: np₀≥10, n(1-p₀)≥10 with p₀=0.05.
  3. z = (p̂-0.05)/√(0.05×0.95/n), p-value = 2P(Z > |z|).
  4. Two-sided test: reject if p < 0.05, rate differs from 5%.

Set 3: Type I & II Errors

  1. Medical test: H₀: no disease, Hₐ: disease present.
  2. Type I: false positive (healthy→diagnosed sick).
  3. Type II: false negative (sick→diagnosed healthy).
  4. α controls Type I rate; power = 1-β controls Type II.

Set 4: P-value Interpretation

  1. Parameter μ: mean response time. Test H₀: μ = 5.0 seconds.
  2. Compute p-value from sample data and test statistic.
  3. p-value = probability of observing such extreme data given H₀.
  4. Small p-value (< α) provides evidence against H₀.

Set 5: Power Analysis

  1. Test design: detect 10% improvement in success rate.
  2. Calculate required sample size for 80% power at α=0.05.
  3. Power = P(reject H₀ | Hₐ true), depends on effect size and n.
  4. Higher power requires larger sample or less stringent α.

Set 6: Multiple Testing

  1. Three treatments compared: family-wise error rate concern.
  2. Bonferroni correction: use α/3 for each individual test.
  3. Controls overall Type I error rate at 5% level.
  4. Trade-off: reduced power for individual comparisons.

Two-Sample Tests (Means & Proportions)

Proportions

z=p^1p^2p^(1p^)(1/n1+1/n2)z = \dfrac{ \hat{p}_1-\hat{p}_2 }{ \sqrt{ \hat{p}(1-\hat{p})(1/n_1+1/n_2) } }

Pooled p^\hat{p} under H0:p1=p2H_0: p_1=p_2.

Means (Large n)

z=(xˉ1xˉ2)(μ1μ2)0s12/n1+s22/n2z = \dfrac{ (\bar{x}_1-\bar{x}_2) - (\mu_1-\mu_2)_0 }{ \sqrt{ s_1^2/n_1 + s_2^2/n_2 } }

Use z-approx when n large; small samples require t with Satterthwaite df.

Power & Sample Size

For a minimally interesting effect size Δ, pick n to achieve target power 1-β at significance α.

  • Trade-off: larger n → smaller SE → higher power.
  • Decide one- vs two-sided based on context before seeing data.

Confidence Intervals and Tests

For two-sided α=0.05 tests, rejecting H0H_0 is equivalent to the hypothesized value lying outside the 95% CI.

Practice Bank

Bank A: One-Sample z (Proportion)

  1. State H₀, Hₐ
  2. Compute z and p
  3. Conclude at α=0.05 with context

Bank B: One-Sample z (Mean, large n)

  1. Check CLT
  2. Compute z
  3. Interpret two-sided p

Bank C: Two-Sample Proportions

  1. Pooled p̂ under H₀
  2. z statistic
  3. Decision and CI connection

Bank D: Two-Sample Means

  1. Large n z-approx
  2. Discuss small-n t alternative
  3. Report effect size

Bank E: One-Sided vs Two-Sided

  1. When one-sided is justified
  2. Pre-register direction
  3. Cautions on p-hacking

Bank F: Power & Sample Size

  1. Target Δ and β
  2. Approximate n for 80% power
  3. Trade-offs

Bank G: Multiple Testing

  1. Bonferroni control
  2. FWER vs FDR concepts
  3. Interpretation pitfalls

Bank H: Practical Significance

  1. Compare CI with meaningful thresholds
  2. Distinguish statistical vs practical

Bank I: Assumptions Audit

  1. Randomness/independence
  2. Approximate normality
  3. Outliers/robustness

Bank J: Reporting

  1. Report exact p and CI
  2. Describe effect size
  3. State limitations

FAQ (Extended)

Q: p=0.049 和 p=0.051 有本质区别吗?

不是本质差异;报告精确 p 值与效应大小与区间更有信息量。

Q: 何时使用单侧检验?

仅在研究前就有明确单方向假设且反向没有意义时。

Mini Projects

Project A: Manufacturing Defects

  • Define p and target p₀
  • Choose α and tail direction
  • Collect data and report exact p with CI

Project B: Marketing Uplift

  • Two-sample proportion test
  • Compute pooled SE and z
  • Discuss power for given Δ

Project C: Mean Response Time

  • One-sample mean test
  • Check CLT or justify normal
  • Report effect size (Cohen d)

Project D: A/B Experiment

  • Random assignment
  • Pre-register metric and α
  • Analyse and share reproducible report

Project E: Medical Screening

  • Type I/II costs
  • Pick α to balance risks
  • Include sensitivity analysis