Distribution-Free Methods

6-8 Hours

Nonparametric Hypothesis Testing

Master distribution-free statistical tests for robust hypothesis testing without strict distributional assumptions

Key Concepts & Definitions

Essential terminology and mathematical foundations for nonparametric testing

Nonparametric Test

Statistical hypothesis test that does not rely on specific distributional assumptions about the population, using only rank, sign, or frequency information from sample data.

\text{Test statistic based on ranks, signs, or frequencies}

Sign Test Statistic (N⁺)

Count of positive differences or values above the hypothesized median in sign test procedures.

N^+ = \sum_{i=1}^n I(X_i - t_0 > 0) \sim B(n, \theta)

Rank Sum (W)

Sum of ranks assigned to one sample group in Wilcoxon rank sum test for comparing two independent samples.

W = \sum_{i=1}^n R_i \text{ where } R_i \text{ is rank of } Y_i

Empirical Distribution Function

Step function that estimates the cumulative distribution function from sample data, used in Kolmogorov-Smirnov tests.

F_n(x) = \frac{1}{n} \sum_{i=1}^n I(X_i \leq x)

Fundamental Concepts

Core principles and advantages of nonparametric hypothesis testing

Core Principles

Distribution-Free Nature

Tests do not require specific distributional assumptions about the population, making them robust and applicable to diverse data types and situations.

Rank-Based Analysis

Many nonparametric tests use rank information instead of raw values, reducing sensitivity to outliers and extreme observations.

Ordinal Data Compatibility

Suitable for ordinal, interval, and ratio scale data, providing flexibility for different measurement levels and research contexts.

Key Advantages

Robustness

Resistant to outliers, non-normal distributions, and violations of parametric test assumptions, providing reliable results across diverse scenarios.

Small Sample Efficiency

Effective with small sample sizes where parametric test assumptions may not hold, making them valuable for pilot studies and limited data.

Broad Applicability

Applicable to categorical, ordinal, and continuous data without requiring transformation or complex distributional modeling.

Nonparametric Test Methods

Comprehensive overview of major nonparametric hypothesis testing procedures

Sign Test

Tests median values and paired sample differences using only sign information

Application Scenarios

Single sample median testing

Paired sample comparisons

Ordinal data analysis

Key Formulas

Single sample: N^+ = \sum I(X_i > t_0) \sim B(n, \theta)

Paired sample: N^+ = \sum I(Z_i > 0) \sim B(n, 1/2) \text{ under } H_0

Rejection region (right-tail): N^+ \geq C^*

Rejection region (two-tail): N^+ \leq C_1^* \text{ or } N^+ \geq C_2^*

Methodology Steps

1Convert sample values to binary indicators (positive/negative)

2Count number of positive differences (N⁺)

3Use binomial distribution for significance testing

4Apply appropriate critical values based on alternative hypothesis

Advantages

Distribution-free

Robust to outliers

Simple computation

Limitations

Ignores magnitude information

Lower power than parametric tests

Wilcoxon Rank Sum Test

Compares two independent samples using rank information (Mann-Whitney U equivalent)

Application Scenarios

Two independent sample comparison

Location shift detection

Ordinal data comparison

Key Formulas

Rank sum: W = \sum_{i=1}^n R_i \text{ (ranks of second sample)}

Expected value: E[W] = \frac{n(m+n+1)}{2}

Variance: \text{Var}(W) = \frac{mn(m+n+1)}{12}

Large sample: W^* = \frac{W - E[W]}{\sqrt{\text{Var}(W)}} \sim N(0,1)

Methodology Steps

1Combine and rank all observations from both samples

2Calculate sum of ranks for one sample group

3Compare rank sum to expected value under null hypothesis

4Use normal approximation for large samples

Advantages

Uses magnitude information

More powerful than sign test

Handles tied observations

Limitations

Requires ordinal or continuous data

Assumes same distribution shape

Wilcoxon Signed Rank Test

Analyzes paired samples using both sign and magnitude information through ranks

Application Scenarios

Paired sample analysis

Before-after comparisons

Symmetric distribution testing

Key Formulas

Signed rank statistic: W^+ = \sum_{i=1}^n R_i I(Z_i > 0)

Expected value: E[W^+] = \frac{n(n+1)}{4}

Variance: \text{Var}(W^+) = \frac{n(n+1)(2n+1)}{24}

Large sample: W^{*+} = \frac{W^+ - E[W^+]}{\sqrt{\text{Var}(W^+)}} \sim N(0,1)

Methodology Steps

1Calculate differences between paired observations

2Rank absolute values of non-zero differences

3Sum ranks corresponding to positive differences

4Compare to expected value under null hypothesis

Advantages

Combines sign and magnitude

More powerful than sign test

Robust to outliers

Limitations

Assumes symmetric distribution

Cannot handle extreme ties

Chi-Square Goodness-of-Fit Test

Tests whether sample data follows a specified theoretical distribution

Application Scenarios

Distribution fitting

Model validation

Categorical data analysis

Key Formulas

Test statistic: \chi^2 = \sum_{i=1}^r \frac{(O_i - E_i)^2}{E_i}

Degrees of freedom: df = r - m - 1

Expected frequency: E_i = n \cdot p_i(\hat{\theta})

Rejection region: \chi^2 > \chi^2_{\alpha}(df)

Methodology Steps

1Group data into categories or intervals

2Calculate observed frequencies for each category

3Compute expected frequencies under null hypothesis

4Calculate chi-square test statistic

Advantages

Flexible for any distribution

Handles categorical data

Well-established theory

Limitations

Requires large samples

Loses information through grouping

Sensitive to category choice

Kolmogorov-Smirnov Test

Compares empirical and theoretical distributions using maximum absolute difference

Application Scenarios

Distribution fitting

Two-sample comparison

Continuous data analysis

Key Formulas

Test statistic: D_n = \sup_x |F_n(x) - F_0(x)|

Two-sample: D_{m,n} = \sup_x |F_{1,m}(x) - F_{2,n}(x)|

Large sample: \sqrt{n}D_n \to K(\lambda) \text{ (Kolmogorov distribution)}

Critical value: D_n > D_{n,\alpha} \text{ reject } H_0

Methodology Steps

1Construct empirical distribution function from sample

2Calculate maximum absolute difference with theoretical CDF

3Compare test statistic to critical values

4Use asymptotic distribution for large samples

Advantages

No grouping required

Uses all sample information

Distribution-free

Limitations

Only for continuous data

Conservative test

Sensitive to ties

Chi-Square Independence Test

Tests independence between two categorical variables using contingency tables

Application Scenarios

Variable independence

Association analysis

Categorical data relationships

Key Formulas

Test statistic: \chi^2 = \sum_{i=1}^r \sum_{j=1}^s \frac{(O_{ij} - E_{ij})^2}{E_{ij}}

Expected frequency: E_{ij} = \frac{n_{i\cdot} n_{\cdot j}}{n}

Degrees of freedom: df = (r-1)(s-1)

Rejection region: \chi^2 > \chi^2_{\alpha}((r-1)(s-1))

Methodology Steps

1Organize data into r×s contingency table

2Calculate expected frequencies under independence

3Compute chi-square test statistic

4Compare to critical value with appropriate degrees of freedom

Advantages

Tests general association

Handles multiple categories

Intuitive interpretation

Limitations

Requires adequate sample sizes

Only detects association, not causation

Sensitive to small expected frequencies

Run Test

Tests randomness of sequences by counting runs of consecutive identical symbols

Application Scenarios

Randomness testing

Pattern detection

Time series analysis

Key Formulas

Run count: R = \text{number of runs in sequence}

Expected runs: E[R] = \frac{2n_1n_2}{n_1+n_2} + 1

Variance: \text{Var}(R) = \frac{2n_1n_2(2n_1n_2-n_1-n_2)}{(n_1+n_2)^2(n_1+n_2-1)}

Large sample: \frac{R - E[R]}{\sqrt{\text{Var}(R)}} \sim N(0,1)

Methodology Steps

1Convert sequence to binary (0-1) format

2Count total number of runs (consecutive identical symbols)

3Compare observed runs to expected under randomness

4Use normal approximation for large samples

Advantages

Simple randomness test

No distributional assumptions

Detects systematic patterns

Limitations

Only detects certain patterns

Less powerful than other tests

Binary conversion loses information

Worked Examples

Step-by-step solutions to nonparametric testing problems

Sign Test for Population Median

Problem:

A manufacturer claims that the median lifetime of their light bulbs is 1000 hours. A consumer group tests 12 bulbs and obtains the following lifetimes (in hours): 985, 1010, 992, 1008, 1015, 988, 1020, 995, 1012, 990, 1005, 1002. Test the claim at α = 0.05 using the Sign Test.

Solution:

1
State hypotheses
H₀: M = 1000 (population median is 1000 hours)
$H_1: M \neq 1000 \text{ (two-tailed test)}$
2
Compute signs
Compare each observation to the hypothesized median M₀ = 1000:
$\text{Signs: } -, +, -, +, +, -, +, -, +, -, +, + \text{ (exclude ties)}$
3
Count positive signs
Count the number of observations above and below the median:
$N^+ = 7, \quad N^- = 5, \quad n = 12$
4
Find test statistic
Under H₀, N⁺ ~ Binomial(12, 0.5). Use the smaller count:
$\text{Test statistic} = \min(N^+, N^-) = \min(7, 5) = 5$
5
Determine critical value
For two-tailed test at α = 0.05 with n = 12, from binomial table:
$P(N^+ \leq 2 \text{ or } N^+ \geq 10) \approx 0.039 < 0.05$
6
Conclusion
Since 5 > 2 (critical value), we fail to reject H₀.
$\text{Conclusion: Not enough evidence to reject the claim that median is 1000 hours}$

Key Insight:

The Sign Test only uses directional information (+/-) and ignores magnitude. With 7 positive and 5 negative differences, the result is not extreme enough to reject H₀ at the 5% significance level.

Sign Test for Paired Samples

Problem:

A study compares blood pressure before and after a new medication for 10 patients. The differences (After - Before) are: -8, -5, +2, -12, -3, +1, -6, -9, -4, -7. Test whether the medication reduces blood pressure at α = 0.05.

Solution:

1
State hypotheses
Let M_D be the median of differences (After - Before):
$H_0: M_D = 0, \quad H_1: M_D < 0 \text{ (one-tailed, reduction expected)}$
2
Count signs
Count positive and negative differences:
$N^+ = 2 \text{ (positive)}, \quad N^- = 8 \text{ (negative)}, \quad n = 10$
3
Identify test statistic
For one-tailed test (reduction), use N⁺ as the test statistic:
$\text{Test statistic} = N^+ = 2$
4
Find p-value
Under H₀, N⁺ ~ Binomial(10, 0.5). Calculate:
$P(N^+ \leq 2) = \sum_{k=0}^{2} \binom{10}{k}(0.5)^{10} = \frac{1 + 10 + 45}{1024} = 0.055$
5
Make decision
Compare p-value to α:
$p = 0.055 > \alpha = 0.05 \Rightarrow \text{Fail to reject } H_0$
6
Conclusion
At α = 0.05, there is insufficient evidence to conclude that the medication reduces blood pressure.
$\text{Note: At } \alpha = 0.10, \text{ we would reject } H_0$

Key Insight:

The Sign Test for paired data is equivalent to testing whether the median of differences equals zero. With 8 negative out of 10 differences, the p-value (0.055) is borderline but does not quite reach the 0.05 threshold.

Wilcoxon Rank Sum Test (Mann-Whitney U)

Problem:

Two teaching methods are compared by test scores. Method A: 78, 82, 85, 88, 90 (n₁=5). Method B: 72, 76, 80, 84, 86, 92, 94 (n₂=7). Test whether there is a significant difference at α = 0.05.

Solution:

1
State hypotheses
Let F₁ and F₂ be the distributions of Methods A and B:
$H_0: F_1(x) = F_2(x), \quad H_1: F_1(x) \neq F_2(x)$
2
Combine and rank all data
Order all 12 observations and assign ranks:
$\begin{array}{c|cccccccccccc} \text{Value} & 72 & 76 & 78 & 80 & 82 & 84 & 85 & 86 & 88 & 90 & 92 & 94 \\ \text{Rank} & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10 & 11 & 12 \\ \text{Group} & B & B & A & B & A & B & A & B & A & A & B & B \end{array}$
3
Calculate rank sum for smaller sample
Sum of ranks for Method A (n₁=5):
$W_A = 3 + 5 + 7 + 9 + 10 = 34$
4
Calculate expected value and variance
Under H₀:
$E[W_A] = \frac{n_1(n_1+n_2+1)}{2} = \frac{5(13)}{2} = 32.5$
5
Calculate variance
Variance formula:
$Var(W_A) = \frac{n_1 n_2 (n_1+n_2+1)}{12} = \frac{5 \times 7 \times 13}{12} = 37.917$
6
Compute Z-statistic
Standardize the rank sum:
$Z = \frac{W_A - E[W_A]}{\sqrt{Var(W_A)}} = \frac{34 - 32.5}{\sqrt{37.917}} = \frac{1.5}{6.158} = 0.244$
7
Make decision
For two-tailed test at α = 0.05, critical value is z₀.₀₂₅ = 1.96:
$|Z| = 0.244 < 1.96 \Rightarrow \text{Fail to reject } H_0$

Key Insight:

The Wilcoxon Rank Sum Test compares the sum of ranks between groups. If one group consistently has higher values, its rank sum will be higher than expected under H₀. Here, the rank sums are nearly equal to expected values, indicating no significant difference.

Wilcoxon Signed Rank Test

Problem:

A diet program is tested on 8 participants. Weight changes (Before - After, in kg) are: 2.1, -0.5, 3.2, 1.8, 4.5, 0.9, 2.7, 1.5. Test if the diet causes significant weight loss at α = 0.05.

Solution:

1
State hypotheses
Let M_D be the median of weight differences:
$H_0: M_D = 0, \quad H_1: M_D > 0 \text{ (one-tailed, weight loss expected)}$
2
Rank absolute differences
Order by |difference| and assign ranks:
$\begin{array}{c|cccccccc} |D| & 0.5 & 0.9 & 1.5 & 1.8 & 2.1 & 2.7 & 3.2 & 4.5 \\ \text{Rank} & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 \\ \text{Sign} & - & + & + & + & + & + & + & + \end{array}$
3
Calculate signed rank statistics
Sum ranks by sign:
$W^+ = 2+3+4+5+6+7+8 = 35, \quad W^- = 1$
4
Calculate expected value and variance
Under H₀:
$E[W^+] = \frac{n(n+1)}{4} = \frac{8(9)}{4} = 18, \quad Var(W^+) = \frac{n(n+1)(2n+1)}{24} = \frac{8(9)(17)}{24} = 51$
5
Compute Z-statistic
Standardize:
$Z = \frac{W^+ - E[W^+]}{\sqrt{Var(W^+)}} = \frac{35 - 18}{\sqrt{51}} = \frac{17}{7.14} = 2.38$
6
Make decision
For one-tailed test at α = 0.05, critical value is z₀.₀₅ = 1.645:
$Z = 2.38 > 1.645 \Rightarrow \text{Reject } H_0$
7
Conclusion
There is significant evidence that the diet program causes weight loss.
$p\text{-value} = P(Z > 2.38) \approx 0.0087 < 0.05$

Key Insight:

The Wilcoxon Signed Rank Test uses both sign and magnitude information through ranks. With 7 positive and only 1 negative difference, and the positive differences being larger (higher ranks), the evidence strongly supports weight loss.

Chi-Square Goodness-of-Fit Test

Problem:

A die is rolled 120 times with results: 1→18, 2→22, 3→17, 4→25, 5→16, 6→22. Test whether the die is fair at α = 0.05.

Solution:

1
State hypotheses
For a fair die, each face has probability 1/6:
$H_0: p_1 = p_2 = \cdots = p_6 = \frac{1}{6}$
2
Calculate expected frequencies
Under H₀, each face should appear:
$E_i = np_i = 120 \times \frac{1}{6} = 20 \text{ times each}$
3
Calculate chi-square statistic
Compute the test statistic:
$\chi^2 = \sum_{i=1}^{6} \frac{(O_i - E_i)^2}{E_i}$
4
Substitute values
Calculate each term:
$\chi^2 = \frac{(18-20)^2}{20} + \frac{(22-20)^2}{20} + \frac{(17-20)^2}{20} + \frac{(25-20)^2}{20} + \frac{(16-20)^2}{20} + \frac{(22-20)^2}{20}$
5
Compute result
Sum all terms:
$\chi^2 = \frac{4 + 4 + 9 + 25 + 16 + 4}{20} = \frac{62}{20} = 3.1$
6
Determine critical value
Degrees of freedom = k - 1 = 6 - 1 = 5:
$\chi^2_{0.05,5} = 11.07$
7
Conclusion
Compare test statistic to critical value:
$\chi^2 = 3.1 < 11.07 \Rightarrow \text{Fail to reject } H_0$

Key Insight:

The chi-square goodness-of-fit test compares observed frequencies to expected frequencies. The deviation (χ² = 3.1) is well below the critical value (11.07), indicating the observed distribution is consistent with a fair die.

Kolmogorov-Smirnov Test

Problem:

Test whether the sample {0.12, 0.35, 0.47, 0.62, 0.81, 0.93} comes from a Uniform(0,1) distribution at α = 0.05.

Solution:

1
State hypotheses
Test goodness of fit to Uniform(0,1):
$H_0: F(x) = x \text{ for } 0 \leq x \leq 1$
2
Construct empirical CDF
The empirical CDF is a step function:
$F_n(x) = \frac{\text{number of } X_i \leq x}{n}$
3
Calculate D⁺ and D⁻
For each order statistic x₍ᵢ₎:
$D^+ = \max_i \left\{\frac{i}{n} - F_0(x_{(i)})\right\}, \quad D^- = \max_i \left\{F_0(x_{(i)}) - \frac{i-1}{n}\right\}$
4
Create computation table
With n=6 and F₀(x) = x:
$\begin{array}{c|c|c|c|c} i & x_{(i)} & i/n & F_0(x_{(i)}) & D_i^+ = i/n - x \\ 1 & 0.12 & 0.167 & 0.12 & 0.047 \\ 2 & 0.35 & 0.333 & 0.35 & -0.017 \\ 3 & 0.47 & 0.500 & 0.47 & 0.030 \\ 4 & 0.62 & 0.667 & 0.62 & 0.047 \\ 5 & 0.81 & 0.833 & 0.81 & 0.023 \\ 6 & 0.93 & 1.000 & 0.93 & 0.070 \end{array}$
5
Find maximum deviations
Calculate D⁺, D⁻ and Dₙ:
$D^+ = 0.070, \quad D^- = 0.120 - 0 = 0.120, \quad D_n = \max(D^+, D^-) = 0.120$
6
Compare to critical value
For n=6 and α=0.05, from K-S table:
$D_{6,0.05} = 0.521$
7
Conclusion
Compare test statistic to critical value:
$D_n = 0.120 < 0.521 \Rightarrow \text{Fail to reject } H_0$

Key Insight:

The K-S test measures the maximum vertical distance between empirical and theoretical CDFs. With Dₙ = 0.120 far below the critical value 0.521, there's no evidence against the Uniform(0,1) hypothesis.

Chi-Square Test for Independence

Problem:

A survey of 200 people examines the relationship between education level and voting preference. Test independence at α = 0.05. Data: High School (A:30, B:40), Bachelor's (A:50, B:35), Graduate (A:20, B:25).

Solution:

1
State hypotheses
Test independence of education and voting:
$H_0: \text{Education and Voting are independent}$
2
Create contingency table
Organize observed frequencies:
$\begin{array}{c|cc|c} & A & B & \text{Total} \\ \text{High School} & 30 & 40 & 70 \\ \text{Bachelor's} & 50 & 35 & 85 \\ \text{Graduate} & 20 & 25 & 45 \\ \text{Total} & 100 & 100 & 200 \end{array}$
3
Calculate expected frequencies
Under independence: E_{ij} = (Row_i × Col_j) / Total
$E_{11} = \frac{70 \times 100}{200} = 35, \quad E_{12} = 35, \quad E_{21} = 42.5, \quad E_{22} = 42.5, \quad E_{31} = 22.5, \quad E_{32} = 22.5$
4
Calculate chi-square statistic
Compute each term:
$\chi^2 = \frac{(30-35)^2}{35} + \frac{(40-35)^2}{35} + \frac{(50-42.5)^2}{42.5} + \frac{(35-42.5)^2}{42.5} + \frac{(20-22.5)^2}{22.5} + \frac{(25-22.5)^2}{22.5}$
5
Compute result
Sum all terms:
$\chi^2 = 0.714 + 0.714 + 1.324 + 1.324 + 0.278 + 0.278 = 4.632$
6
Determine degrees of freedom
For r×c table:
$df = (r-1)(c-1) = (3-1)(2-1) = 2$
7
Find critical value and conclude
From chi-square table:
$\chi^2_{0.05,2} = 5.99, \quad \chi^2 = 4.632 < 5.99 \Rightarrow \text{Fail to reject } H_0$

Key Insight:

The chi-square test for independence compares observed cell frequencies to those expected under independence. With χ² = 4.632 < 5.99, we cannot conclude that education level and voting preference are related.

Run Test for Randomness

Problem:

A sequence of stock price movements shows: + + + - - + + - - - + + + + - - + - +. Test whether the sequence is random at α = 0.05.

Solution:

1
State hypotheses
Test randomness of the sequence:
$H_0: \text{Sequence is random}, \quad H_1: \text{Sequence is not random}$
2
Count runs and symbols
A run is a sequence of identical symbols:
$\text{Sequence: } \underbrace{+++}_{1} \underbrace{--}_{2} \underbrace{++}_{3} \underbrace{---}_{4} \underbrace{++++}_{5} \underbrace{--}_{6} \underbrace{+}_{7} \underbrace{-}_{8} \underbrace{+}_{9}$
3
Determine counts
Count total runs and each symbol:
$R = 9, \quad n_1 = 11 \text{ (plus)}, \quad n_2 = 8 \text{ (minus)}, \quad n = 19$
4
Calculate expected runs
Under randomness:
$E[R] = \frac{2n_1n_2}{n_1+n_2} + 1 = \frac{2(11)(8)}{19} + 1 = 9.26 + 1 = 10.26$
5
Calculate variance
Variance of R under H₀:
$Var(R) = \frac{2n_1n_2(2n_1n_2-n_1-n_2)}{(n_1+n_2)^2(n_1+n_2-1)} = \frac{2(11)(8)(176-19)}{361 \times 18} = \frac{2772}{6498} = 4.27$
6
Compute Z-statistic
Standardize:
$Z = \frac{R - E[R]}{\sqrt{Var(R)}} = \frac{9 - 10.26}{\sqrt{4.27}} = \frac{-1.26}{2.07} = -0.61$
7
Make decision
For two-tailed test at α = 0.05:
$|Z| = 0.61 < 1.96 \Rightarrow \text{Fail to reject } H_0$

Key Insight:

The Run Test detects departures from randomness. Too few runs suggest clustering (positive autocorrelation), too many suggest alternation (negative autocorrelation). With R = 9 close to the expected 10.26, the sequence appears random.

Core Theorem Proofs

Mathematical derivations of key nonparametric statistics

Asymptotic Distribution of Sign Test Statistic

Basis for Large Sample Sign Test

For large n, the sign test statistic N⁺ converges in distribution to a Normal distribution.

Theorem Statement

\frac{N^+ - n/2}{\sqrt{n}/2} \xrightarrow{d} N(0,1)

This allows us to use the standard normal table for hypothesis testing when n > 20.

Proof Steps

Define Indicator Variables

Let Iᵢ = 1 if Xᵢ > M₀ and 0 otherwise. Under H₀: M = M₀, P(Xᵢ > M₀) = 0.5.

I_i \sim \text{Bernoulli}(0.5)

Sum of Indicators

The test statistic N⁺ is the sum of these i.i.d. Bernoulli trials.

N^+ = \sum_{i=1}^n I_i \sim \text{Binomial}(n, 0.5)

Calculate Moments

The mean and variance of N⁺ under H₀ are:

E[N^+] = np = 0.5n, \quad Var(N^+) = np(1-p) = 0.25n

Apply Central Limit Theorem

Since Iᵢ are i.i.d. with finite variance, the standardized sum converges to standard normal.

Z = \frac{N^+ - E[N^+]}{\sqrt{Var(N^+)}} = \frac{N^+ - 0.5n}{0.5\sqrt{n}} \xrightarrow{d} N(0,1)

Continuity Correction

For better approximation, we often apply a continuity correction of 0.5.

Z_{corrected} = \frac{|N^+ - 0.5n| - 0.5}{0.5\sqrt{n}}

Conclusion

Thus, for large n, we can reject H₀ if |Z| > z_{α/2}.

P(|Z| > z_{\alpha/2}) \approx \alpha

Example Application

For n=100, N⁺=60. Z = (60-50)/5 = 2.0. Since 2.0 > 1.96, reject H₀ at α=0.05.

Moments of Wilcoxon Rank-Sum Statistic

Foundation for Normal Approximation

Derivation of the mean and variance of the Rank-Sum statistic W under the null hypothesis.

Theorem Statement

E[W] = \frac{n(m+n+1)}{2}, \quad Var(W) = \frac{mn(m+n+1)}{12}

These moments are crucial for constructing the Z-statistic for large samples.

Proof Steps

Define Rank Sum

Let R₁, …, R_{m+n} be the ranks of the combined sample. Under H₀, any subset of size n is equally likely to be the ranks of the second sample.

W = \sum_{j=1}^n R_{j} \text{ (sum of } n \text{ random ranks)}

Expectation of a Single Rank

The average rank in a set of size N=m+n is (1+N)/2.

E[R_j] = \frac{1}{N} \sum_{k=1}^N k = \frac{N(N+1)}{2N} = \frac{N+1}{2}

Expectation of W

By linearity of expectation:

E[W] = \sum_{j=1}^n E[R_j] = n \frac{m+n+1}{2}

Variance of W

Since ranks are sampled without replacement, they are not independent. Var(W) = ΣVar(Rⱼ) + Σⱼ≠ₖCov(Rⱼ,Rₖ).

Var(R_j) = \frac{N^2-1}{12}, \quad Cov(R_j, R_k) = -\frac{N+1}{12}

Algebraic Simplification

Summing the variance and covariance terms (detailed algebra omitted for brevity) yields:

Var(W) = \frac{mn(m+n+1)}{12}

Normal Approximation

As m, n → ∞, the distribution of W approaches normality.

Z = \frac{W - E[W]}{\sqrt{Var(W)}} \sim N(0,1)

Example Application

For m=10, n=10, E[W] = 10(21)/2 = 105, Var(W) = 100(21)/12 = 175.

Distribution of Wilcoxon Signed Rank Statistic

Foundation for Paired Sample Inference

Under H₀ (symmetric distribution around 0), the signed rank statistic W⁺ has a known distribution based on random sign assignments.

Theorem Statement

W^+ = \sum_{i=1}^n R_i \cdot I(Z_i > 0), \quad E[W^+] = \frac{n(n+1)}{4}, \quad Var(W^+) = \frac{n(n+1)(2n+1)}{24}

This enables both exact small-sample tests and large-sample normal approximations.

Proof Steps

Setup Under Null Hypothesis

Under H₀: the distribution is symmetric about 0. Thus, for each |Zᵢ|, P(Zᵢ > 0) = P(Zᵢ < 0) = 0.5.

\text{Signs } S_i = I(Z_i > 0) \text{ are i.i.d. Bernoulli}(0.5)

Express W⁺ as Random Sum

Let Rᵢ be the rank of |Zᵢ|. The signed rank statistic is:

W^+ = \sum_{i=1}^n R_i \cdot S_i = \sum_{i=1}^n R_i \cdot I(Z_i > 0)

Derive Expected Value

Since E[Sᵢ] = 0.5 and ranks are fixed (1, 2, ..., n):

E[W^+] = \sum_{i=1}^n R_i \cdot E[S_i] = 0.5 \sum_{i=1}^n i = 0.5 \cdot \frac{n(n+1)}{2} = \frac{n(n+1)}{4}

Derive Variance

Since signs are independent and Var(Sᵢ) = 0.25:

Var(W^+) = \sum_{i=1}^n R_i^2 \cdot Var(S_i) = 0.25 \sum_{i=1}^n i^2 = 0.25 \cdot \frac{n(n+1)(2n+1)}{6} = \frac{n(n+1)(2n+1)}{24}

Exact Distribution

For small n, the exact distribution can be enumerated. There are 2ⁿ equally likely sign patterns:

P(W^+ = w) = \frac{\text{number of sign patterns giving sum } w}{2^n}

Normal Approximation

For large n (typically n ≥ 20), the CLT applies:

Z = \frac{W^+ - \frac{n(n+1)}{4}}{\sqrt{\frac{n(n+1)(2n+1)}{24}}} \xrightarrow{d} N(0,1)

Example Application

For n=8: E[W⁺] = 8(9)/4 = 18, Var(W⁺) = 8(9)(17)/24 = 51, SD = 7.14.

Chi-Square Asymptotic Distribution

Basis for Goodness-of-Fit and Independence Tests

Pearson's chi-square statistic converges to a chi-square distribution under the null hypothesis.

Theorem Statement

\chi^2 = \sum_{i=1}^k \frac{(O_i - E_i)^2}{E_i} \xrightarrow{d} \chi^2_{k-m-1}

This provides the theoretical foundation for chi-square tests in categorical data analysis.

Proof Steps

Multinomial Setup

Let (O₁, ..., Oₖ) follow Multinomial(n; p₁, ..., pₖ) where Eᵢ = npᵢ.

(O_1, \ldots, O_k) \sim \text{Multinomial}(n; p_1, \ldots, p_k)

Asymptotic Normality

By the multivariate CLT, the standardized frequencies are asymptotically normal:

\frac{O_i - np_i}{\sqrt{np_i}} \xrightarrow{d} N(0, 1 - p_i)

Pearson's Statistic as Quadratic Form

The chi-square statistic can be written as:

\chi^2 = \sum_{i=1}^k \frac{(O_i - E_i)^2}{E_i} = n \sum_{i=1}^k \frac{(\hat{p}_i - p_i)^2}{p_i}

Covariance Structure

The multinomial covariance creates dependence. The quadratic form becomes:

\chi^2 = \mathbf{Z}^T \Sigma^{-1} \mathbf{Z} \text{ where } \mathbf{Z} \sim N(\mathbf{0}, \Sigma)

Degrees of Freedom Reduction

The constraint Σpᵢ = 1 reduces effective dimensions by 1. If m parameters are estimated, subtract m more:

df = k - 1 - m = k - m - 1

Limiting Distribution

By properties of quadratic forms of normal vectors:

\chi^2 \xrightarrow{d} \chi^2_{k-m-1} \text{ as } n \to \infty

Example Application

For k=6 categories with no estimated parameters: df = 6-1 = 5. Critical value at α=0.05 is χ²₀.₀₅,₅ = 11.07.

Kolmogorov-Smirnov Limiting Distribution

Exact Distribution-Free Testing

The scaled K-S statistic √n·Dₙ converges to the Kolmogorov distribution.

Theorem Statement

\sqrt{n} D_n = \sqrt{n} \sup_x |F_n(x) - F_0(x)| \xrightarrow{d} K

The Kolmogorov distribution K has CDF: P(K ≤ x) = 1 - 2Σ(-1)^(j-1)e^(-2j²x²).

Proof Steps

Empirical Process Definition

Define the empirical process:

\alpha_n(x) = \sqrt{n}[F_n(x) - F_0(x)]

Glivenko-Cantelli Theorem

First, we need the uniform law of large numbers:

\sup_x |F_n(x) - F(x)| \xrightarrow{a.s.} 0 \text{ as } n \to \infty

Donsker's Theorem

The empirical process converges to a Brownian bridge B(t) on [0,1]:

\alpha_n(F^{-1}(t)) \xrightarrow{d} B(t) \text{ in } D[0,1]

Continuous Mapping

Apply the continuous mapping theorem to the supremum functional:

\sqrt{n} D_n = \sup_t |\alpha_n(F^{-1}(t))| \xrightarrow{d} \sup_{0 \leq t \leq 1} |B(t)|

Brownian Bridge Supremum

The distribution of sup|B(t)| is the Kolmogorov distribution:

P\left(\sup_{0 \leq t \leq 1} |B(t)| \leq x\right) = 1 - 2\sum_{j=1}^{\infty} (-1)^{j-1} e^{-2j^2 x^2}

Critical Values

The Kolmogorov distribution provides asymptotic critical values:

K_{0.05} \approx 1.36, \quad K_{0.01} \approx 1.63 \quad \text{(for } \sqrt{n}D_n \text{)}

Example Application

For n=100 and Dₙ=0.12: √n·Dₙ = 10(0.12) = 1.2 < 1.36, fail to reject H₀ at α=0.05.

Practice Quiz

Test your understanding with 10 multiple-choice questions

Practice Quiz

Questions

Correct

Accuracy

What is the primary advantage of nonparametric tests over parametric tests?

In the Sign Test for a median, if we have 15 observations with 11 positive differences, 3 negative differences, and 1 zero (which we discard), what is the test statistic

N^+

For the Wilcoxon Rank Sum Test comparing two groups with m=6 and n=8, what is the expected value of the rank sum W under the null hypothesis?

When should you use the Wilcoxon Signed Rank Test instead of the Sign Test?

In a Chi-Square Goodness-of-Fit Test with k=5 categories and no estimated parameters, the degrees of freedom is:

The Kolmogorov-Smirnov test statistic

D_n

measures:

For a Chi-Square Independence Test with a 3×4 contingency table, the degrees of freedom is:

In the Run Test, if we observe too few runs compared to the expected value, this suggests:

Which assumption is required for the Wilcoxon Rank Sum Test but NOT for the Sign Test?

For the Chi-Square Goodness-of-Fit Test, the rule of thumb is that expected frequency in each cell should be at least:

Frequently Asked Questions

Common questions about nonparametric hypothesis testing

What is Nonparametric Testing and why use it?

Nonparametric tests are statistical testing methods that do not depend on specific distribution forms (such as normal distribution). When data doesn't meet parametric test assumptions (normality, homogeneity of variance), or when data is ordinal rather than interval-scaled, nonparametric tests are the only choice. Although their power is usually slightly lower than parametric tests, they are more robust.

Key Point: Distribution-free & Robust

What's the difference between Sign Test and Wilco xon Rank-Sum Test?

The Sign Test only uses the "sign" information of data (greater or less than median), discarding specific numerical magnitudes, thus having lower efficiency. The Rank-Sum Test uses the "ranking" information of data, preserving more information, making it more efficient (higher power) in most cases than the Sign Test.

Comparison: Sign Test: only signs; Rank-Sum: rankings

What are typical null hypotheses in nonparametric tests?

Null hypotheses in nonparametric tests typically concern distribution shape or location. For example, the two-sample rank-sum test's null hypothesis is "the two population distributions are completely identical". Rejecting the null hypothesis indicates significant differences in location (median) or shape between distributions.

H_0: F_1(x) = F_2(x)

Why are nonparametric tests insensitive to outliers?

Because nonparametric tests are usually based on ranks rather than original values. An extreme outlier (like 1000) after ranking might just be "rank n", and its specific numerical magnitude doesn't affect the test statistic. This makes nonparametric tests very reliable for data with outliers.

Example:

Data: 1, 2, 3, 1000 -> Ranks: 1, 2, 3, 4

What are the limitations of Chi-Square Goodness-of-Fit Test?

The Chi-Square test requires sufficiently large sample size, typically requiring expected frequency of at least 5 in each cell. If expected frequencies are too small, chi-square approximation fails. Additionally, the test is sensitive to data binning methods - different groupings may lead to different conclusions.

Key Point: Expected frequency >= 5

What advantages does Kolmogorov-Smirnov (K-S) test have over Chi-Square?

The K-S test is directly based on the empirical distribution function (EDF) and doesn't require data binning, thus avoiding information loss and subjective grouping decisions. It's particularly effective for continuous distributions and very sensitive to shape differences. However, it's generally only applicable to continuous data.

Comparison: K-S: no binning needed, for continuous data

Is nonparametric test power always lower than parametric tests?

Usually yes. If data truly follows normal distribution, the t-test is most efficient, and nonparametric tests lose some information (about 5% efficiency loss). However, if data is heavily skewed or has heavy tails, nonparametric test power may actually be **higher** than parametric tests.

Key Point: Efficiency trade-off

When should we use the Run Test?

The Run Test is mainly used to detect data randomness. If you suspect there's some trend, periodicity, or autocorrelation in the data collection process (e.g., positive and negative values alternate too frequently or too rarely), the Run Test can help determine if the sample was randomly drawn.

Example:

Sequence: + + - - + + - - (Not random)