MathIsimple

Hypothesis Testing

Master the fundamental principles of statistical hypothesis testing: from basic concepts and error analysis to advanced methods and real-world applications in statistical inference.

Learning Objectives
Understand null and alternative hypotheses construction principles
Master Type I and Type II error concepts and their trade-offs
Apply the Neyman-Pearson principle for optimal test design
Construct test functions and power functions for statistical tests
Learn the general framework for hypothesis testing procedures
Master common tests: U-test, t-test, chi-square test, and F-test
Apply generalized likelihood ratio testing (GLRT) methods
Understand the relationship between confidence intervals and hypothesis testing

Essential Definitions

Null Hypothesis (H₀)

The baseline hypothesis under test, typically containing '=', '≥', or '≤', representing the status quo or no effect condition.

Mathematical:
H0:θΘ0H_0: \theta \in \Theta_0
Example: H₀: μ = μ₀ (population mean equals specified value)
Alternative Hypothesis (H₁)

The hypothesis that contradicts H₀, typically containing '≠', '>', or '<', representing what we're trying to detect.

Mathematical:
H1:θΘ1H_1: \theta \in \Theta_1
Example: H₁: μ ≠ μ₀ (two-sided), H₁: μ > μ₀ (right-sided)
Type I Error (α)

The probability of rejecting H₀ when it is actually true (false positive). Controlled by significance level.

Mathematical:
α(θ)=Pθ(XDθΘ0)\alpha(\theta) = P_{\theta}(X \in D | \theta \in \Theta_0)
Example: α = 0.05 means 5% chance of false rejection
Type II Error (β)

The probability of failing to reject H₀ when H₁ is true (false negative). Related to statistical power.

Mathematical:
β(θ)=Pθ(XDθΘ1)\beta(\theta) = P_{\theta}(X \in \overline{D} | \theta \in \Theta_1)
Example: Power = 1 - β measures test's ability to detect true effects

Hypothesis Construction

Hypothesis Construction Principles
Guidelines for properly formulating null and alternative hypotheses

Complementary Hypotheses

H₀ and H₁ must be mutually exclusive and collectively exhaustive

Mathematical:
Θ0Θ1= and Θ0Θ1=Θ\Theta_0 \cap \Theta_1 = \emptyset \text{ and } \Theta_0 \cup \Theta_1 = \Theta
Example: For μ: H₀: μ = μ₀ vs H₁: μ ≠ μ₀

Status Quo in H₀

H₀ typically represents the current belief, no change, or no effect

Mathematical:
H0:parameter=claimed valueH_0: \text{parameter} = \text{claimed value}
Example: Testing drug effectiveness: H₀: drug has no effect

Burden of Proof

H₁ represents what requires evidence to establish (burden of proof)

Mathematical:
H1:what we want to detectH_1: \text{what we want to detect}
Example: Proving guilt: H₀: innocent, H₁: guilty

Directionality

Choose one-sided or two-sided based on research question

Mathematical:
Two-sided: H1:θθ0, One-sided: H1:θ>θ0\text{Two-sided: } H_1: \theta \neq \theta_0 \text{, One-sided: } H_1: \theta > \theta_0
Example: Quality control often uses one-sided tests
Types of Hypothesis Tests
Classification based on alternative hypothesis structure

Two-Sided (Two-Tailed)

Structure: H₀: θ = θ₀ vs H₁: θ ≠ θ₀

Rejection Region: T < c₁ or T > c₂

Example: Testing if population mean differs from specified value

Applications:

  • Quality assurance
  • Scientific experiments
  • A/B testing

Right-Sided (Upper-Tailed)

Structure: H₀: θ ≤ θ₀ vs H₁: θ > θ₀

Rejection Region: T > c

Example: Testing if new process increases efficiency

Applications:

  • Process improvement
  • Treatment effectiveness
  • Performance enhancement

Left-Sided (Lower-Tailed)

Structure: H₀: θ ≥ θ₀ vs H₁: θ < θ₀

Rejection Region: T < c

Example: Testing if new method reduces error rate

Applications:

  • Cost reduction
  • Risk minimization
  • Error rate improvement

Error Analysis & Statistical Power

Type I Error (False Positive)
Rejecting true H₀ - the probability of a false alarm
α=maxθΘ0Pθ(Reject H0)\alpha = \max_{\theta \in \Theta_0} P_{\theta}(\text{Reject } H_0)

Characteristics:

  • Controlled by significance level α
  • Set by researcher before data collection
  • Common values: α = 0.01, 0.05, 0.10
  • Lower α → more conservative test
Real-world analogy: Convicting an innocent person (α-error in justice system)
Type II Error (False Negative)
Failing to reject false H₀ - missing a true effect
β(θ)=Pθ(Accept H0θΘ1)\beta(\theta) = P_{\theta}(\text{Accept } H_0 | \theta \in \Theta_1)

Characteristics:

  • Depends on true parameter value
  • Decreases as sample size increases
  • Related to test power: Power = 1 - β
  • Higher α → lower β (trade-off)
Real-world analogy: Failing to detect a disease when it's present
Power Function
Probability of correctly rejecting H₀ when it's false
g(θ)=Pθ(XD)=Pθ(Reject H0)g(\theta) = P_{\theta}(X \in D) = P_{\theta}(\text{Reject } H_0)

Key Properties:

  • For θ ∈ Θ₀: g(θ) = α(θ) (Type I error rate)
  • For θ ∈ Θ₁: g(θ) = 1 - β(θ) (Power)
  • Ideal: g(θ) ≤ α for θ ∈ Θ₀, g(θ) ≈ 1 for θ ∈ Θ₁
  • Higher power means better ability to detect true effects

Factors Affecting Power:

  • Effect size (larger effects easier to detect)
  • Sample size (larger n increases power)
  • Significance level (larger α increases power)
  • Population variability (lower σ increases power)
Error Trade-off Relationship
For fixed sample size, Type I and Type II errors are inversely related
As α, then β (for fixed n)\text{As } \alpha \downarrow \text{, then } \beta \uparrow \text{ (for fixed n)}

Solutions to Improve Both Errors:

  • Increase sample size n (reduces both errors)
  • Use more informative experimental design
  • Apply sequential testing methods
  • Use composite decision theory approaches

Neyman-Pearson Principle

Neyman-Pearson Principle
The fundamental framework for controlling Type I error while minimizing Type II error

Principle Statement:

Control the maximum Type I error probability at level α, and among all such tests, choose the one with minimum Type II error (maximum power).

supθΘ0α(θ)α\sup_{\theta \in \Theta_0} \alpha(\theta) \leq \alpha

Key Advantages:

  • Provides clear error control framework
  • Enables comparison between different tests
  • Forms basis for optimal test construction
  • Widely applicable across statistical problems
Significance Level (α)
The maximum allowable Type I error probability
α = 0.01

Very strong evidence required

Common in: Medical trials, Safety testing
α = 0.05

Standard in most fields

Common in: Social sciences, Quality control
α = 0.10

Exploratory analysis

Common in: Preliminary studies, Screening tests
Critical Value Selection:
ChoosecriticalvaluecsuchthatsupθΘ0P(T>cH0)=αChoose critical value c such that \sup_{\theta \in \Theta_0} P(T > c | H_0) = \alpha
Optimal Test Construction
Steps to construct tests following Neyman-Pearson principle
  1. 1
    Specify H₀ and H₁ clearly
  2. 2
    Choose significance level α
  3. 3
    Identify test statistic T with known distribution under H₀
  4. 4
    Determine rejection region D to satisfy size constraint
  5. 5
    Evaluate power function for different alternatives
Optimality Concept:

Among all tests with same significance level, choose the one with highest power (Uniformly Most Powerful when exists)

Testing Procedure & Decision Rules

General Hypothesis Testing Procedure
Standard framework for conducting statistical hypothesis tests
1

Formulate Hypotheses

Clearly state H₀ and H₁ based on research question

Key Details:
  • Define parameter of interest
  • Specify parameter space
  • Ensure hypotheses are complementary
Example:

H₀: μ = 50 vs H₁: μ ≠ 50

2

Choose Test Statistic

Select appropriate statistic based on data type and assumptions

Key Details:
  • Consider sampling distribution
  • Check model assumptions
  • Ensure statistic captures effect of interest
Example:

T = (X̄ - μ₀)/(S/√n) for normal population with unknown variance

3

Determine Rejection Region

Based on H₁ direction and significance level α

Key Details:
  • Two-sided: |T| > c
  • Right-sided: T > c
  • Left-sided: T < c
Example:

For α = 0.05, two-sided: |T| > t₀.₀₂₅(n-1)

4

Calculate Test Statistic

Compute statistic value using sample data

Key Details:
  • Substitute sample values
  • Check calculation accuracy
  • Verify units and scale
Example:

t = (15.2 - 15.0)/(0.8/√25) = 1.25

5

Make Decision

Compare statistic to critical value and conclude

Key Details:
  • State decision rule clearly
  • Provide statistical conclusion
  • Interpret in context
Example:

Since |1.25| < 2.064, fail to reject H₀

6

Calculate P-value

Find probability of observed result or more extreme under H₀

Key Details:
  • Use appropriate distribution
  • Account for test direction
  • Report exact value when possible
Example:

P-value = 2 × P(t₂₄ > 1.25) = 0.224

Decision Rules and Interpretation
Guidelines for making and interpreting test decisions

Critical Value Approach

Compare test statistic to critical value

Decision Rule:
If T>cα/2, reject H0\text{If } |T| > c_{\alpha/2} \text{, reject } H_0

Advantages: Direct comparison, Clear decision boundary

Disadvantages: Doesn't show strength of evidence

P-value Approach

Compare P-value to significance level

Decision Rule:
If P-value<α, reject H0\text{If P-value} < \alpha \text{, reject } H_0

Advantages: Shows strength of evidence, More informative

Disadvantages: Can be misinterpreted

Interpretation Guidelines

Reject H₀:

Strong evidence against H₀ in favor of H₁

Fail to Reject H₀:

Insufficient evidence to reject H₀ (not proof of H₀)

Common Mistakes:

  • Don't say 'accept H₀' - we never prove H₀
  • Don't confuse statistical and practical significance
  • P-value is not probability that H₀ is true

Common Statistical Tests

Single Normal Population Tests
Tests for normal population parameters with different scenarios

U-Test (Z-Test)

Scenario:

Testing population mean with known variance

Hypotheses:

H₀: μ = μ₀ vs H₁: μ ≠ μ₀, μ > μ₀, or μ < μ₀

Assumptions:

  • X ~ N(μ, σ²)
  • σ² known
  • Random sample

Test Statistic:

U=Xˉμ0σ/nN(0,1) under H0U = \frac{\bar{X} - \mu_0}{\sigma/\sqrt{n}} \sim N(0,1) \text{ under } H_0

Rejection Regions:

  • Two-sided: |U| > u_{α/2}
  • Right-sided: U > u_α
  • Left-sided: U < -u_α

Example Application:

Testing if mean height = 170cm with σ = 5cm known

T-Test

Scenario:

Testing population mean with unknown variance

Hypotheses:

H₀: μ = μ₀ vs H₁: μ ≠ μ₀, μ > μ₀, or μ < μ₀

Assumptions:

  • X ~ N(μ, σ²)
  • σ² unknown
  • Random sample

Test Statistic:

T=Xˉμ0S/nt(n1) under H0T = \frac{\bar{X} - \mu_0}{S/\sqrt{n}} \sim t(n-1) \text{ under } H_0

Rejection Regions:

  • Two-sided: |T| > t_{α/2}(n-1)
  • Right-sided: T > t_α(n-1)
  • Left-sided: T < -t_α(n-1)

Example Application:

Testing if new teaching method improves test scores

Chi-Square Test

Scenario:

Testing population variance

Hypotheses:

H₀: σ² = σ₀² vs H₁: σ² ≠ σ₀², σ² > σ₀², or σ² < σ₀²

Assumptions:

  • X ~ N(μ, σ²)
  • μ unknown
  • Random sample

Test Statistic:

χ2=(n1)S2σ02χ2(n1) under H0\chi^2 = \frac{(n-1)S^2}{\sigma_0^2} \sim \chi^2(n-1) \text{ under } H_0

Rejection Regions:

  • Two-sided: χ² < χ²₁₋α/₂(n-1) or χ² > χ²α/₂(n-1)
  • Right-sided: χ² > χ²α(n-1)
  • Left-sided: χ² < χ²₁₋α(n-1)

Example Application:

Testing if process variability meets specifications

Two Sample Comparison Tests
Tests comparing parameters between two independent populations

Two-Sample U-Test

Scenario:

Comparing means with known variances

Hypotheses:

H₀: μ_X = μ_Y vs H₁: μ_X ≠ μ_Y, μ_X > μ_Y, or μ_X < μ_Y

Assumptions:

  • X ~ N(μ_X, σ_X²), Y ~ N(μ_Y, σ_Y²)
  • σ_X², σ_Y² known
  • Independent samples

Test Statistic:

U=XˉYˉσX2/m+σY2/nN(0,1) under H0U = \frac{\bar{X} - \bar{Y}}{\sqrt{\sigma_X^2/m + \sigma_Y^2/n}} \sim N(0,1) \text{ under } H_0

Example Application:

Comparing treatment effects in clinical trial with known population variances

Two-Sample T-Test

Scenario:

Comparing means with unknown but equal variances

Hypotheses:

H₀: μ_X = μ_Y vs H₁: μ_X ≠ μ_Y, μ_X > μ_Y, or μ_X < μ_Y

Assumptions:

  • X ~ N(μ_X, σ²), Y ~ N(μ_Y, σ²)
  • σ² unknown but equal
  • Independent samples

Test Statistic:

T=XˉYˉSw1/m+1/nt(m+n2) under H0T = \frac{\bar{X} - \bar{Y}}{S_w\sqrt{1/m + 1/n}} \sim t(m+n-2) \text{ under } H_0

Pooled Variance:

Sw2=(m1)SX2+(n1)SY2m+n2S_w^2 = \frac{(m-1)S_X^2 + (n-1)S_Y^2}{m+n-2}

Example Application:

Comparing test scores between two teaching methods

F-Test

Scenario:

Comparing population variances

Hypotheses:

H₀: σ_X² = σ_Y² vs H₁: σ_X² ≠ σ_Y², σ_X² > σ_Y², or σ_X² < σ_Y²

Assumptions:

  • X ~ N(μ_X, σ_X²), Y ~ N(μ_Y, σ_Y²)
  • Independent samples

Test Statistic:

F=SX2SY2F(m1,n1) under H0F = \frac{S_X^2}{S_Y^2} \sim F(m-1, n-1) \text{ under } H_0

Rejection Regions:

  • Two-sided: F < F₁₋α/₂(m-1,n-1) or F > Fα/₂(m-1,n-1)
  • Right-sided: F > Fα(m-1,n-1)

Example Application:

Testing if two processes have equal variability before pooling data

Generalized Likelihood Ratio Test

Generalized Likelihood Ratio Test (GLRT)
A general method for constructing hypothesis tests using likelihood functions

Motivation:

When optimal tests don't exist or are unknown, GLRT provides a systematic approach

Principle:

Compare maximum likelihood under full parameter space to maximum likelihood under null hypothesis constraint

GLRT Construction

Likelihood Ratio Definition:

λ(x~)=supθΘL(θ;x~)supθΘ0L(θ;x~)=L(θ^;x~)L(θ^0;x~)\lambda(\tilde{x}) = \frac{\sup_{\theta \in \Theta} L(\theta; \tilde{x})}{\sup_{\theta \in \Theta_0} L(\theta; \tilde{x})} = \frac{L(\hat{\theta}; \tilde{x})}{L(\hat{\theta}_0; \tilde{x})}

Components:

  • L(θ;x̃): Likelihood function
  • θ̂: Unrestricted MLE (global maximum)
  • θ̂₀: Restricted MLE under H₀ (constrained maximum)
  • λ(x̃): Likelihood ratio statistic

Test Rule:

Reject H0 if λ(X~)>c\text{Reject } H_0 \text{ if } \lambda(\tilde{X}) > c

Critical Value:

Choose \text{Choose } c  such that \text{ such that } supθΘ0\sup_{\theta \in \Theta_0} P_{θ\theta}(λ\lambda(X~\tilde{X}) > c) \leq α\alpha
GLRT Examples

Normal Mean Test (σ² unknown)

Hypotheses: H₀: μ = μ₀ vs H₁: μ ≠ μ₀
Global MLE:
μ^\hat{\mu} = Xˉ\bar{X}, σ^\hat{\sigma}^2 = \frac{1}{n}\sum(X_i - Xˉ\bar{X})^2
Restricted MLE:
μ^\hat{\mu}_0 = μ0\mu_0, σ^\hat{\sigma}_0^2 = \frac{1}{n}\sum(X_i - μ0\mu_0)^2
Likelihood Ratio:
λ=(1+t2n1)n/2\lambda = \left(1 + \frac{t^2}{n-1}\right)^{n/2}
Test Statistic:
t = \frac{\bar{X} - μ0\mu_0}{S/n\sqrt{n}}
Equivalence: λ\lambda  monotone in \text{ monotone in } |t| \Rightarrow GLRT equivalent to t-test\text{GLRT equivalent to t-test}

Normal Variance Test

Hypotheses: H₀: σ² = σ₀² vs H₁: σ² ≠ σ₀²
Global MLE:
σ^\hat{\sigma}^2 = \frac{1}{n}\sum(X_i - Xˉ\bar{X})^2
Restricted MLE:
σ^\hat{\sigma}_0^2 = σ0\sigma_0^2
Likelihood Ratio:
λ=(σ^2σ02)n/2exp(n2(1σ^2σ02))\lambda = \left(\frac{\hat{\sigma}^2}{\sigma_0^2}\right)^{n/2} \exp\left(\frac{n}{2}\left(1 - \frac{\hat{\sigma}^2}{\sigma_0^2}\right)\right)
Equivalence: GLRT equivalent to chi-square test\text{GLRT equivalent to chi-square test}
Large Sample Properties

Wilks' Theorem:

Underregularityconditions:2logλ(X~)dχ2(r) as nUnder regularity conditions: 2\log\lambda(\tilde{X}) \xrightarrow{d} \chi^2(r) \text{ as } n \to \infty

where r = dim\dim(Θ\Theta) - dim\dim(Θ0\Theta_0)

Applications:

  • Provides approximate critical values for large samples
  • Enables testing in complex models where exact distributions unknown
  • Forms basis for many modern statistical tests

Limitations:

  • Requires large sample sizes for accuracy
  • Regularity conditions may not hold
  • May not be optimal for specific alternatives

Confidence Intervals & Hypothesis Testing

Confidence Interval - Hypothesis Test Duality
Fundamental two-way relationship between interval estimation and hypothesis testing

There's a one-to-one correspondence between confidence intervals and hypothesis tests at the same confidence/significance level

Test → Interval

From acceptance regions to confidence sets

Formula:
C(x~)={θ0:x~A(θ0)}C(\tilde{x}) = \{\theta_0 : \tilde{x} \in A(\theta_0)\}

Explanation: The confidence set contains all parameter values that would not be rejected by the test

Example: If |t| ≤ t_{α/2}(n-1) is acceptance region, then confidence interval is x̄ ± t_{α/2}(n-1) × s/√n

Interval → Test

From confidence sets to acceptance regions

Formula:
A(θ0)={x~:θ0C(x~)}A(\theta_0) = \{\tilde{x} : \theta_0 \in C(\tilde{x})\}

Explanation: Accept H₀: θ = θ₀ if and only if θ₀ lies within the confidence interval

Example: Reject H₀: μ = μ₀ if μ₀ falls outside the 95% confidence interval for μ

Practical Implications:

  • Confidence intervals provide range of plausible parameter values
  • Hypothesis tests provide binary decisions about specific values
  • Intervals show effect size, tests show statistical significance
  • Same data, same conclusions when properly applied
  • Intervals more informative for practical decision making
Duality Examples

Normal Mean (σ unknown)

Confidence Interval:
[xˉtα/2(n1)sn,xˉ+tα/2(n1)sn][\bar{x} - t_{\alpha/2}(n-1) \frac{s}{\sqrt{n}}, \bar{x} + t_{\alpha/2}(n-1) \frac{s}{\sqrt{n}}]
Hypothesis Test:
Reject \text{Reject } H_0: μ\mu = μ0\mu_0  if \text{ if } μ0\mu_0 \notin CI\text{CI}
Interpretation: 95% CI gives all μ₀ values that would not be rejected at α = 0.05

Uniform Distribution U(0,θ)

Confidence Interval:
[X(n),X(n)/αn][X_{(n)}, X_{(n)}/\sqrt[n]{\alpha}]
Hypothesis Test:
Reject \text{Reject } H_0: θ\theta = θ0\theta_0  if \text{ if } X_{(n)} > θ0\theta_0  or \text{ or } X_{(n)} < θ0\theta_0\sqrt[n]{α\alpha}
Interpretation: Exact correspondence between interval and test boundaries

Real-World Applications

Quality Control Testing
Monitor production processes to ensure specifications are met

Common Scenarios:

  • Testing if mean product dimension meets target specification
  • Monitoring process variability within acceptable limits
  • Detecting shifts in production quality over time

Typical Tests:

One-sample t-test
Chi-square test for variance
Control charts

Key Considerations:

  • Economic consequences
  • Cost of Type I vs Type II errors
  • Sample size planning
Medical Research
Evaluate treatment effectiveness and drug safety

Common Scenarios:

  • Testing if new treatment improves patient outcomes
  • Comparing side effect rates between treatments
  • Establishing bioequivalence between generic and brand drugs

Typical Tests:

Two-sample t-test
Chi-square independence test
Equivalence testing

Key Considerations:

  • Patient safety
  • Regulatory requirements
  • Ethical implications
A/B Testing
Compare different versions to optimize performance

Common Scenarios:

  • Testing if new website design increases conversion rate
  • Comparing marketing campaign effectiveness
  • Evaluating user interface changes

Typical Tests:

Two-proportion z-test
Two-sample t-test
Chi-square test

Key Considerations:

  • Business impact
  • Sample size constraints
  • Multiple testing corrections
Environmental Monitoring
Assess compliance with environmental standards

Common Scenarios:

  • Testing if pollutant levels exceed safety thresholds
  • Monitoring changes in ecosystem health indicators
  • Evaluating effectiveness of environmental interventions

Typical Tests:

One-sample tests
Trend analysis
Non-parametric tests

Key Considerations:

  • Regulatory compliance
  • Public health impact
  • Measurement uncertainty
Back to Mathematical Statistics