Lesson 5-2: Hypothesis Testing & Significance
Learning Goals
- Set up H0 and Ha with precise parameter statements.
- Compute test statistics for mean and proportion scenarios.
- Interpret p-values and connect them to α and decisions.
- Explain Type I/II errors and their practical costs.
Assumptions
- Random sampling and independence (or justified design).
- Approximate normality for test statistic (CLT or conditions).
- Clear definition of α (e.g., 0.05) before viewing data.
Structure of a Test
Hypotheses
Specify the parameter: mean or proportion . Then define a null hypothesis and an alternative .
Workflow
- State the parameter and define H0 and Ha (one- or two-sided).
- Check conditions and assumptions for the chosen test.
- Compute test statistic and p-value.
- Compare p-value with significance level α and conclude.
- Interpret the decision in the real context.
Test Statistics
Proportion (Large n)
Under , replace by for SE.
Mean (Unknown SD, Large n)
When n is large, the z approximation is often reasonable by the CLT.
p-Value
The p-value is the probability, under , of seeing a test statistic at least as extreme as the observed one. Smaller p-values provide stronger evidence against .
Example: Proportion, One-Sided
A factory claims defect rate . In a sample of 200, 10 are defective. Test vs at .
The one-sided p-value is about 0.05. At , this is borderline; report exact p and context before decisions.
Example: Mean, Two-Sided
Standard lifetime is 1,000 hours. From n=36, hours, hours. Test vs .
Two-sided p-value is approximately 0.317. At , fail to reject .
Errors and Test Power
Type I and Type II
- Type I: Reject true (probability α).
- Type II: Fail to reject false (probability β).
- Power: ; increases with n or larger true effect size.
Design Considerations
- Choose α based on real costs of errors.
- Increase n to reduce SE and improve power.
- Use one-sided tests only when justified by context.
Guided Practice
Set 1: One-Sided Mean Test
- Parameter μ: population mean weight. H₀: μ = 50, Hₐ: μ > 50.
- Check: random sample, n=40> 30 for CLT validity.
- z = (x̄-50)/(s/√n), p-value = P(Z > z) for upper tail.
- If p < 0.05, reject H₀: evidence mean weight exceeds 50.
Set 2: Two-Sided Proportion Test
- Parameter p: defect rate. H₀: p = 0.05, Hₐ: p ≠ 0.05.
- Check: np₀≥10, n(1-p₀)≥10 with p₀=0.05.
- z = (p̂-0.05)/√(0.05×0.95/n), p-value = 2P(Z > |z|).
- Two-sided test: reject if p < 0.05, rate differs from 5%.
Set 3: Type I & II Errors
- Medical test: H₀: no disease, Hₐ: disease present.
- Type I: false positive (healthy→diagnosed sick).
- Type II: false negative (sick→diagnosed healthy).
- α controls Type I rate; power = 1-β controls Type II.
Set 4: P-value Interpretation
- Parameter μ: mean response time. Test H₀: μ = 5.0 seconds.
- Compute p-value from sample data and test statistic.
- p-value = probability of observing such extreme data given H₀.
- Small p-value (< α) provides evidence against H₀.
Set 5: Power Analysis
- Test design: detect 10% improvement in success rate.
- Calculate required sample size for 80% power at α=0.05.
- Power = P(reject H₀ | Hₐ true), depends on effect size and n.
- Higher power requires larger sample or less stringent α.
Set 6: Multiple Testing
- Three treatments compared: family-wise error rate concern.
- Bonferroni correction: use α/3 for each individual test.
- Controls overall Type I error rate at 5% level.
- Trade-off: reduced power for individual comparisons.
Two-Sample Tests (Means & Proportions)
Proportions
Pooled under .
Means (Large n)
Use z-approx when n large; small samples require t with Satterthwaite df.
Power & Sample Size
For a minimally interesting effect size Δ, pick n to achieve target power 1-β at significance α.
- Trade-off: larger n → smaller SE → higher power.
- Decide one- vs two-sided based on context before seeing data.
Confidence Intervals and Tests
For two-sided α=0.05 tests, rejecting is equivalent to the hypothesized value lying outside the 95% CI.
Practice Bank
Bank A: One-Sample z (Proportion)
- State H₀, Hₐ
- Compute z and p
- Conclude at α=0.05 with context
Bank B: One-Sample z (Mean, large n)
- Check CLT
- Compute z
- Interpret two-sided p
Bank C: Two-Sample Proportions
- Pooled p̂ under H₀
- z statistic
- Decision and CI connection
Bank D: Two-Sample Means
- Large n z-approx
- Discuss small-n t alternative
- Report effect size
Bank E: One-Sided vs Two-Sided
- When one-sided is justified
- Pre-register direction
- Cautions on p-hacking
Bank F: Power & Sample Size
- Target Δ and β
- Approximate n for 80% power
- Trade-offs
Bank G: Multiple Testing
- Bonferroni control
- FWER vs FDR concepts
- Interpretation pitfalls
Bank H: Practical Significance
- Compare CI with meaningful thresholds
- Distinguish statistical vs practical
Bank I: Assumptions Audit
- Randomness/independence
- Approximate normality
- Outliers/robustness
Bank J: Reporting
- Report exact p and CI
- Describe effect size
- State limitations
FAQ (Extended)
Q: p=0.049 和 p=0.051 有本质区别吗?
不是本质差异;报告精确 p 值与效应大小与区间更有信息量。
Q: 何时使用单侧检验?
仅在研究前就有明确单方向假设且反向没有意义时。
Mini Projects
Project A: Manufacturing Defects
- Define p and target p₀
- Choose α and tail direction
- Collect data and report exact p with CI
Project B: Marketing Uplift
- Two-sample proportion test
- Compute pooled SE and z
- Discuss power for given Δ
Project C: Mean Response Time
- One-sample mean test
- Check CLT or justify normal
- Report effect size (Cohen d)
Project D: A/B Experiment
- Random assignment
- Pre-register metric and α
- Analyse and share reproducible report
Project E: Medical Screening
- Type I/II costs
- Pick α to balance risks
- Include sensitivity analysis