Hypothesis Testing

Master the fundamental principles of statistical hypothesis testing: from basic concepts and error analysis to advanced methods and real-world applications in statistical inference.

Practice Problems Hypothesis Test Calculator Key Formulas

Learning Objectives

Understand null and alternative hypotheses construction principles

Master Type I and Type II error concepts and their trade-offs

Apply the Neyman-Pearson principle for optimal test design

Construct test functions and power functions for statistical tests

Learn the general framework for hypothesis testing procedures

Master common tests: U-test, t-test, chi-square test, and F-test

Apply generalized likelihood ratio testing (GLRT) methods

Understand the relationship between confidence intervals and hypothesis testing

Essential Definitions

Null Hypothesis (H₀)

The baseline hypothesis under test, typically containing '=', '≥', or '≤', representing the status quo or no effect condition.

Mathematical:

H_0: \theta \in \Theta_0

Example: H₀: μ = μ₀ (population mean equals specified value)

Alternative Hypothesis (H₁)

The hypothesis that contradicts H₀, typically containing '≠', '>', or '<', representing what we're trying to detect.

Mathematical:

H_1: \theta \in \Theta_1

Example: H₁: μ ≠ μ₀ (two-sided), H₁: μ > μ₀ (right-sided)

Type I Error (α)

The probability of rejecting H₀ when it is actually true (false positive). Controlled by significance level.

Mathematical:

\alpha(\theta) = P_{\theta}(X \in D | \theta \in \Theta_0)

Example: α = 0.05 means 5% chance of false rejection

Type II Error (β)

The probability of failing to reject H₀ when H₁ is true (false negative). Related to statistical power.

Mathematical:

\beta(\theta) = P_{\theta}(X \in \overline{D} | \theta \in \Theta_1)

Example: Power = 1 - β measures test's ability to detect true effects

Hypothesis Construction

Hypothesis Construction Principles

Guidelines for properly formulating null and alternative hypotheses

Complementary Hypotheses

H₀ and H₁ must be mutually exclusive and collectively exhaustive

Mathematical:

\Theta_0 \cap \Theta_1 = \emptyset \text{ and } \Theta_0 \cup \Theta_1 = \Theta

Example: For μ: H₀: μ = μ₀ vs H₁: μ ≠ μ₀

Status Quo in H₀

H₀ typically represents the current belief, no change, or no effect

Mathematical:

H_0: \text{parameter} = \text{claimed value}

Example: Testing drug effectiveness: H₀: drug has no effect

Burden of Proof

H₁ represents what requires evidence to establish (burden of proof)

Mathematical:

H_1: \text{what we want to detect}

Example: Proving guilt: H₀: innocent, H₁: guilty

Directionality

Choose one-sided or two-sided based on research question

Mathematical:

\text{Two-sided: } H_1: \theta \neq \theta_0 \text{, One-sided: } H_1: \theta > \theta_0

Example: Quality control often uses one-sided tests

Types of Hypothesis Tests

Classification based on alternative hypothesis structure

Two-Sided (Two-Tailed)

Structure: H₀: θ = θ₀ vs H₁: θ ≠ θ₀

Rejection Region: T < c₁ or T > c₂

Example: Testing if population mean differs from specified value

Applications:

Quality assurance
Scientific experiments
A/B testing

Right-Sided (Upper-Tailed)

Structure: H₀: θ ≤ θ₀ vs H₁: θ > θ₀

Rejection Region: T > c

Example: Testing if new process increases efficiency

Applications:

Process improvement
Treatment effectiveness
Performance enhancement

Left-Sided (Lower-Tailed)

Structure: H₀: θ ≥ θ₀ vs H₁: θ < θ₀

Rejection Region: T < c

Example: Testing if new method reduces error rate

Applications:

Cost reduction
Risk minimization
Error rate improvement

Error Analysis & Statistical Power

Type I Error (False Positive)

Rejecting true H₀ - the probability of a false alarm

\alpha = \max_{\theta \in \Theta_0} P_{\theta}(\text{Reject } H_0)

Characteristics:

Controlled by significance level α
Set by researcher before data collection
Common values: α = 0.01, 0.05, 0.10
Lower α → more conservative test

Real-world analogy: Convicting an innocent person (α-error in justice system)

Type II Error (False Negative)

Failing to reject false H₀ - missing a true effect

\beta(\theta) = P_{\theta}(\text{Accept } H_0 | \theta \in \Theta_1)

Characteristics:

Depends on true parameter value
Decreases as sample size increases
Related to test power: Power = 1 - β
Higher α → lower β (trade-off)

Real-world analogy: Failing to detect a disease when it's present

Power Function

Probability of correctly rejecting H₀ when it's false

g(\theta) = P_{\theta}(X \in D) = P_{\theta}(\text{Reject } H_0)

Key Properties:

For θ ∈ Θ₀: g(θ) = α(θ) (Type I error rate)
For θ ∈ Θ₁: g(θ) = 1 - β(θ) (Power)
Ideal: g(θ) ≤ α for θ ∈ Θ₀, g(θ) ≈ 1 for θ ∈ Θ₁
Higher power means better ability to detect true effects

Factors Affecting Power:

Effect size (larger effects easier to detect)
Sample size (larger n increases power)
Significance level (larger α increases power)
Population variability (lower σ increases power)

Error Trade-off Relationship

For fixed sample size, Type I and Type II errors are inversely related

\text{As } \alpha \downarrow \text{, then } \beta \uparrow \text{ (for fixed n)}

Solutions to Improve Both Errors:

Increase sample size n (reduces both errors)
Use more informative experimental design
Apply sequential testing methods
Use composite decision theory approaches

Neyman-Pearson Principle

The fundamental framework for controlling Type I error while minimizing Type II error

Principle Statement:

Control the maximum Type I error probability at level α, and among all such tests, choose the one with minimum Type II error (maximum power).

\sup_{\theta \in \Theta_0} \alpha(\theta) \leq \alpha

Key Advantages:

Provides clear error control framework
Enables comparison between different tests
Forms basis for optimal test construction
Widely applicable across statistical problems

Significance Level (α)

The maximum allowable Type I error probability

α = 0.01

Very strong evidence required

Common in: Medical trials, Safety testing

α = 0.05

Standard in most fields

Common in: Social sciences, Quality control

α = 0.10

Exploratory analysis

Common in: Preliminary studies, Screening tests

Critical Value Selection:

Choose critical value c such that \sup_{\theta \in \Theta_0} P(T > c | H_0) = \alpha

Optimal Test Construction

Steps to construct tests following Neyman-Pearson principle

1
Specify H₀ and H₁ clearly
2
Choose significance level α
3
Identify test statistic T with known distribution under H₀
4
Determine rejection region D to satisfy size constraint
5
Evaluate power function for different alternatives

Optimality Concept:

Among all tests with same significance level, choose the one with highest power (Uniformly Most Powerful when exists)

Testing Procedure & Decision Rules

General Hypothesis Testing Procedure

Standard framework for conducting statistical hypothesis tests

Formulate Hypotheses

Clearly state H₀ and H₁ based on research question

Key Details:

Define parameter of interest
Specify parameter space
Ensure hypotheses are complementary

Example:

H₀: μ = 50 vs H₁: μ ≠ 50

Choose Test Statistic

Select appropriate statistic based on data type and assumptions

Key Details:

Consider sampling distribution
Check model assumptions
Ensure statistic captures effect of interest

Example:

T = (X̄ - μ₀)/(S/√n) for normal population with unknown variance

Determine Rejection Region

Based on H₁ direction and significance level α

Key Details:

Two-sided: |T| > c
Right-sided: T > c
Left-sided: T < c

Example:

For α = 0.05, two-sided: |T| > t₀.₀₂₅(n-1)

Calculate Test Statistic

Compute statistic value using sample data

Key Details:

Substitute sample values
Check calculation accuracy
Verify units and scale

Example:

t = (15.2 - 15.0)/(0.8/√25) = 1.25

Make Decision

Compare statistic to critical value and conclude

Key Details:

State decision rule clearly
Provide statistical conclusion
Interpret in context

Example:

Since |1.25| < 2.064, fail to reject H₀

Calculate P-value

Find probability of observed result or more extreme under H₀

Key Details:

Use appropriate distribution
Account for test direction
Report exact value when possible

Example:

P-value = 2 × P(t₂₄ > 1.25) = 0.224

Decision Rules and Interpretation

Guidelines for making and interpreting test decisions

Critical Value Approach

Compare test statistic to critical value

Decision Rule:

\text{If } |T| > c_{\alpha/2} \text{, reject } H_0

Advantages: Direct comparison, Clear decision boundary

Disadvantages: Doesn't show strength of evidence

P-value Approach

Compare P-value to significance level

Decision Rule:

\text{If P-value} < \alpha \text{, reject } H_0

Advantages: Shows strength of evidence, More informative

Disadvantages: Can be misinterpreted

Interpretation Guidelines

Reject H₀:

Strong evidence against H₀ in favor of H₁

Fail to Reject H₀:

Insufficient evidence to reject H₀ (not proof of H₀)

Common Mistakes:

• Don't say 'accept H₀' - we never prove H₀
• Don't confuse statistical and practical significance
• P-value is not probability that H₀ is true

Common Statistical Tests

Single Normal Population Tests

Tests for normal population parameters with different scenarios

U-Test (Z-Test)

Scenario:

Testing population mean with known variance

Hypotheses:

H₀: μ = μ₀ vs H₁: μ ≠ μ₀, μ > μ₀, or μ < μ₀

Assumptions:

X ~ N(μ, σ²)
σ² known
Random sample

Test Statistic:

U = \frac{\bar{X} - \mu_0}{\sigma/\sqrt{n}} \sim N(0,1) \text{ under } H_0

Rejection Regions:

Two-sided: |U| > u_{α/2}
Right-sided: U > u_α
Left-sided: U < -u_α

Example Application:

Testing if mean height = 170cm with σ = 5cm known

T-Test

Scenario:

Testing population mean with unknown variance

Hypotheses:

H₀: μ = μ₀ vs H₁: μ ≠ μ₀, μ > μ₀, or μ < μ₀

Assumptions:

X ~ N(μ, σ²)
σ² unknown
Random sample

Test Statistic:

T = \frac{\bar{X} - \mu_0}{S/\sqrt{n}} \sim t(n-1) \text{ under } H_0

Rejection Regions:

Two-sided: |T| > t_{α/2}(n-1)
Right-sided: T > t_α(n-1)
Left-sided: T < -t_α(n-1)

Example Application:

Testing if new teaching method improves test scores

Chi-Square Test

Scenario:

Testing population variance

Hypotheses:

H₀: σ² = σ₀² vs H₁: σ² ≠ σ₀², σ² > σ₀², or σ² < σ₀²

Assumptions:

X ~ N(μ, σ²)
μ unknown
Random sample

Test Statistic:

\chi^2 = \frac{(n-1)S^2}{\sigma_0^2} \sim \chi^2(n-1) \text{ under } H_0

Rejection Regions:

Two-sided: χ² < χ²₁₋α/₂(n-1) or χ² > χ²α/₂(n-1)
Right-sided: χ² > χ²α(n-1)
Left-sided: χ² < χ²₁₋α(n-1)

Example Application:

Testing if process variability meets specifications

Two Sample Comparison Tests

Tests comparing parameters between two independent populations

Two-Sample U-Test

Scenario:

Comparing means with known variances

Hypotheses:

H₀: μ_X = μ_Y vs H₁: μ_X ≠ μ_Y, μ_X > μ_Y, or μ_X < μ_Y

Assumptions:

X ~ N(μ_X, σ_X²), Y ~ N(μ_Y, σ_Y²)
σ_X², σ_Y² known
Independent samples

Test Statistic:

U = \frac{\bar{X} - \bar{Y}}{\sqrt{\sigma_X^2/m + \sigma_Y^2/n}} \sim N(0,1) \text{ under } H_0

Example Application:

Comparing treatment effects in clinical trial with known population variances

Two-Sample T-Test

Scenario:

Comparing means with unknown but equal variances

Hypotheses:

H₀: μ_X = μ_Y vs H₁: μ_X ≠ μ_Y, μ_X > μ_Y, or μ_X < μ_Y

Assumptions:

X ~ N(μ_X, σ²), Y ~ N(μ_Y, σ²)
σ² unknown but equal
Independent samples

Test Statistic:

T = \frac{\bar{X} - \bar{Y}}{S_w\sqrt{1/m + 1/n}} \sim t(m+n-2) \text{ under } H_0

Pooled Variance:

S_w^2 = \frac{(m-1)S_X^2 + (n-1)S_Y^2}{m+n-2}

Example Application:

Comparing test scores between two teaching methods

F-Test

Scenario:

Comparing population variances

Hypotheses:

H₀: σ_X² = σ_Y² vs H₁: σ_X² ≠ σ_Y², σ_X² > σ_Y², or σ_X² < σ_Y²

Assumptions:

X ~ N(μ_X, σ_X²), Y ~ N(μ_Y, σ_Y²)
Independent samples

Test Statistic:

F = \frac{S_X^2}{S_Y^2} \sim F(m-1, n-1) \text{ under } H_0

Rejection Regions:

Two-sided: F < F₁₋α/₂(m-1,n-1) or F > Fα/₂(m-1,n-1)
Right-sided: F > Fα(m-1,n-1)

Example Application:

Testing if two processes have equal variability before pooling data

Generalized Likelihood Ratio Test

Generalized Likelihood Ratio Test (GLRT)

A general method for constructing hypothesis tests using likelihood functions

Motivation:

When optimal tests don't exist or are unknown, GLRT provides a systematic approach

Principle:

Compare maximum likelihood under full parameter space to maximum likelihood under null hypothesis constraint

GLRT Construction

Likelihood Ratio Definition:

\lambda(\tilde{x}) = \frac{\sup_{\theta \in \Theta} L(\theta; \tilde{x})}{\sup_{\theta \in \Theta_0} L(\theta; \tilde{x})} = \frac{L(\hat{\theta}; \tilde{x})}{L(\hat{\theta}_0; \tilde{x})}

Components:

L(θ;x̃): Likelihood function
θ̂: Unrestricted MLE (global maximum)
θ̂₀: Restricted MLE under H₀ (constrained maximum)
λ(x̃): Likelihood ratio statistic

Test Rule:

\text{Reject } H_0 \text{ if } \lambda(\tilde{X}) > c

Critical Value:

\text{Choose }

\text{ such that }

\sup_{\theta \in \Theta_0}

P_{

\theta

}(

\lambda

(

\tilde{X}

) > c)

\leq

\alpha

GLRT Examples

Normal Mean Test (σ² unknown)

Hypotheses: H₀: μ = μ₀ vs H₁: μ ≠ μ₀

Global MLE:

\hat{\mu}

\bar{X}

\hat{\sigma}

^2 =

\frac{1}

{n}

\sum

(X_i -

\bar{X}

)^2

Restricted MLE:

\hat{\mu}

_0 =

\mu_0

\hat{\sigma}

_0^2 =

\frac{1}

{n}

\sum

(X_i -

\mu_0

)^2

Likelihood Ratio:

\lambda = \left(1 + \frac{t^2}{n-1}\right)^{n/2}

Test Statistic:

t =

\frac{\bar{X}

\mu_0

}{S/

\sqrt{n}

}

Equivalence:

\lambda

\text{ monotone in }

|t|

\Rightarrow

\text{GLRT equivalent to t-test}

Normal Variance Test

Hypotheses: H₀: σ² = σ₀² vs H₁: σ² ≠ σ₀²

Global MLE:

\hat{\sigma}

^2 =

\frac{1}

{n}

\sum

(X_i -

\bar{X}

)^2

Restricted MLE:

\hat{\sigma}

_0^2 =

\sigma_0

Likelihood Ratio:

\lambda = \left(\frac{\hat{\sigma}^2}{\sigma_0^2}\right)^{n/2} \exp\left(\frac{n}{2}\left(1 - \frac{\hat{\sigma}^2}{\sigma_0^2}\right)\right)

Equivalence:

\text{GLRT equivalent to chi-square test}

Large Sample Properties

Wilks' Theorem:

Under regularity conditions: 2\log\lambda(\tilde{X}) \xrightarrow{d} \chi^2(r) \text{ as } n \to \infty

where r = $\dim$ ( $\Theta$ ) - $\dim$ ( $\Theta_0$ )

Applications:

Provides approximate critical values for large samples
Enables testing in complex models where exact distributions unknown
Forms basis for many modern statistical tests

Limitations:

Requires large sample sizes for accuracy
Regularity conditions may not hold
May not be optimal for specific alternatives

Confidence Intervals & Hypothesis Testing

Confidence Interval - Hypothesis Test Duality

Fundamental two-way relationship between interval estimation and hypothesis testing

There's a one-to-one correspondence between confidence intervals and hypothesis tests at the same confidence/significance level

Test → Interval

From acceptance regions to confidence sets

Formula:

C(\tilde{x}) = \{\theta_0 : \tilde{x} \in A(\theta_0)\}

Explanation: The confidence set contains all parameter values that would not be rejected by the test

Example: If |t| ≤ t_{α/2}(n-1) is acceptance region, then confidence interval is x̄ ± t_{α/2}(n-1) × s/√n

Interval → Test

From confidence sets to acceptance regions

Formula:

A(\theta_0) = \{\tilde{x} : \theta_0 \in C(\tilde{x})\}

Explanation: Accept H₀: θ = θ₀ if and only if θ₀ lies within the confidence interval

Example: Reject H₀: μ = μ₀ if μ₀ falls outside the 95% confidence interval for μ

Practical Implications:

Confidence intervals provide range of plausible parameter values
Hypothesis tests provide binary decisions about specific values
Intervals show effect size, tests show statistical significance
Same data, same conclusions when properly applied
Intervals more informative for practical decision making

Duality Examples

Normal Mean (σ unknown)

Confidence Interval:

[\bar{x} - t_{\alpha/2}(n-1) \frac{s}{\sqrt{n}}, \bar{x} + t_{\alpha/2}(n-1) \frac{s}{\sqrt{n}}]

Hypothesis Test:

\text{Reject }

H_0:

\mu

\mu_0

\text{ if }

\mu_0

\notin

\text{CI}

Interpretation: 95% CI gives all μ₀ values that would not be rejected at α = 0.05

Uniform Distribution U(0,θ)

Confidence Interval:

[X_{(n)}, X_{(n)}/\sqrt[n]{\alpha}]

Hypothesis Test:

\text{Reject }

H_0:

\theta

\theta_0

\text{ if }

X_{(n)} >

\theta_0

\text{ or }

X_{(n)} <

\theta_0

\sqrt

[n]{

\alpha

}

Interpretation: Exact correspondence between interval and test boundaries

Real-World Applications

Quality Control Testing

Monitor production processes to ensure specifications are met

Common Scenarios:

Testing if mean product dimension meets target specification
Monitoring process variability within acceptable limits
Detecting shifts in production quality over time

Typical Tests:

One-sample t-test

Chi-square test for variance

Control charts

Key Considerations:

Economic consequences
Cost of Type I vs Type II errors
Sample size planning

Medical Research

Evaluate treatment effectiveness and drug safety

Common Scenarios:

Testing if new treatment improves patient outcomes
Comparing side effect rates between treatments
Establishing bioequivalence between generic and brand drugs

Typical Tests:

Two-sample t-test

Chi-square independence test

Equivalence testing

Key Considerations:

Patient safety
Regulatory requirements
Ethical implications

A/B Testing

Compare different versions to optimize performance

Common Scenarios:

Testing if new website design increases conversion rate
Comparing marketing campaign effectiveness
Evaluating user interface changes

Typical Tests:

Two-proportion z-test

Two-sample t-test

Chi-square test

Key Considerations:

Business impact
Sample size constraints
Multiple testing corrections

Environmental Monitoring

Assess compliance with environmental standards

Common Scenarios:

Testing if pollutant levels exceed safety thresholds
Monitoring changes in ecosystem health indicators
Evaluating effectiveness of environmental interventions

Typical Tests:

One-sample tests

Trend analysis

Non-parametric tests

Key Considerations:

Regulatory compliance
Public health impact
Measurement uncertainty

Back to Mathematical Statistics

Practice Problems Test Calculator View Formulas