Master distribution-free statistical tests for robust hypothesis testing without strict distributional assumptions
Essential terminology and mathematical foundations for nonparametric testing
Statistical hypothesis test that does not rely on specific distributional assumptions about the population, using only rank, sign, or frequency information from sample data.
Count of positive differences or values above the hypothesized median in sign test procedures.
Sum of ranks assigned to one sample group in Wilcoxon rank sum test for comparing two independent samples.
Step function that estimates the cumulative distribution function from sample data, used in Kolmogorov-Smirnov tests.
Core principles and advantages of nonparametric hypothesis testing
Tests do not require specific distributional assumptions about the population, making them robust and applicable to diverse data types and situations.
Many nonparametric tests use rank information instead of raw values, reducing sensitivity to outliers and extreme observations.
Suitable for ordinal, interval, and ratio scale data, providing flexibility for different measurement levels and research contexts.
Resistant to outliers, non-normal distributions, and violations of parametric test assumptions, providing reliable results across diverse scenarios.
Effective with small sample sizes where parametric test assumptions may not hold, making them valuable for pilot studies and limited data.
Applicable to categorical, ordinal, and continuous data without requiring transformation or complex distributional modeling.
Comprehensive overview of major nonparametric hypothesis testing procedures
Step-by-step solutions to nonparametric testing problems
Problem:
A manufacturer claims that the median lifetime of their light bulbs is 1000 hours. A consumer group tests 12 bulbs and obtains the following lifetimes (in hours): 985, 1010, 992, 1008, 1015, 988, 1020, 995, 1012, 990, 1005, 1002. Test the claim at α = 0.05 using the Sign Test.
Solution:
State hypotheses
H₀: M = 1000 (population median is 1000 hours)
Compute signs
Compare each observation to the hypothesized median M₀ = 1000:
Count positive signs
Count the number of observations above and below the median:
Find test statistic
Under H₀, N⁺ ~ Binomial(12, 0.5). Use the smaller count:
Determine critical value
For two-tailed test at α = 0.05 with n = 12, from binomial table:
Conclusion
Since 5 > 2 (critical value), we fail to reject H₀.
Key Insight:
The Sign Test only uses directional information (+/-) and ignores magnitude. With 7 positive and 5 negative differences, the result is not extreme enough to reject H₀ at the 5% significance level.
Problem:
A study compares blood pressure before and after a new medication for 10 patients. The differences (After - Before) are: -8, -5, +2, -12, -3, +1, -6, -9, -4, -7. Test whether the medication reduces blood pressure at α = 0.05.
Solution:
State hypotheses
Let M_D be the median of differences (After - Before):
Count signs
Count positive and negative differences:
Identify test statistic
For one-tailed test (reduction), use N⁺ as the test statistic:
Find p-value
Under H₀, N⁺ ~ Binomial(10, 0.5). Calculate:
Make decision
Compare p-value to α:
Conclusion
At α = 0.05, there is insufficient evidence to conclude that the medication reduces blood pressure.
Key Insight:
The Sign Test for paired data is equivalent to testing whether the median of differences equals zero. With 8 negative out of 10 differences, the p-value (0.055) is borderline but does not quite reach the 0.05 threshold.
Problem:
Two teaching methods are compared by test scores. Method A: 78, 82, 85, 88, 90 (n₁=5). Method B: 72, 76, 80, 84, 86, 92, 94 (n₂=7). Test whether there is a significant difference at α = 0.05.
Solution:
State hypotheses
Let F₁ and F₂ be the distributions of Methods A and B:
Combine and rank all data
Order all 12 observations and assign ranks:
Calculate rank sum for smaller sample
Sum of ranks for Method A (n₁=5):
Calculate expected value and variance
Under H₀:
Calculate variance
Variance formula:
Compute Z-statistic
Standardize the rank sum:
Make decision
For two-tailed test at α = 0.05, critical value is z₀.₀₂₅ = 1.96:
Key Insight:
The Wilcoxon Rank Sum Test compares the sum of ranks between groups. If one group consistently has higher values, its rank sum will be higher than expected under H₀. Here, the rank sums are nearly equal to expected values, indicating no significant difference.
Problem:
A diet program is tested on 8 participants. Weight changes (Before - After, in kg) are: 2.1, -0.5, 3.2, 1.8, 4.5, 0.9, 2.7, 1.5. Test if the diet causes significant weight loss at α = 0.05.
Solution:
State hypotheses
Let M_D be the median of weight differences:
Rank absolute differences
Order by |difference| and assign ranks:
Calculate signed rank statistics
Sum ranks by sign:
Calculate expected value and variance
Under H₀:
Compute Z-statistic
Standardize:
Make decision
For one-tailed test at α = 0.05, critical value is z₀.₀₅ = 1.645:
Conclusion
There is significant evidence that the diet program causes weight loss.
Key Insight:
The Wilcoxon Signed Rank Test uses both sign and magnitude information through ranks. With 7 positive and only 1 negative difference, and the positive differences being larger (higher ranks), the evidence strongly supports weight loss.
Problem:
A die is rolled 120 times with results: 1→18, 2→22, 3→17, 4→25, 5→16, 6→22. Test whether the die is fair at α = 0.05.
Solution:
State hypotheses
For a fair die, each face has probability 1/6:
Calculate expected frequencies
Under H₀, each face should appear:
Calculate chi-square statistic
Compute the test statistic:
Substitute values
Calculate each term:
Compute result
Sum all terms:
Determine critical value
Degrees of freedom = k - 1 = 6 - 1 = 5:
Conclusion
Compare test statistic to critical value:
Key Insight:
The chi-square goodness-of-fit test compares observed frequencies to expected frequencies. The deviation (χ² = 3.1) is well below the critical value (11.07), indicating the observed distribution is consistent with a fair die.
Problem:
Test whether the sample {0.12, 0.35, 0.47, 0.62, 0.81, 0.93} comes from a Uniform(0,1) distribution at α = 0.05.
Solution:
State hypotheses
Test goodness of fit to Uniform(0,1):
Construct empirical CDF
The empirical CDF is a step function:
Calculate D⁺ and D⁻
For each order statistic x₍ᵢ₎:
Create computation table
With n=6 and F₀(x) = x:
Find maximum deviations
Calculate D⁺, D⁻ and Dₙ:
Compare to critical value
For n=6 and α=0.05, from K-S table:
Conclusion
Compare test statistic to critical value:
Key Insight:
The K-S test measures the maximum vertical distance between empirical and theoretical CDFs. With Dₙ = 0.120 far below the critical value 0.521, there's no evidence against the Uniform(0,1) hypothesis.
Problem:
A survey of 200 people examines the relationship between education level and voting preference. Test independence at α = 0.05. Data: High School (A:30, B:40), Bachelor's (A:50, B:35), Graduate (A:20, B:25).
Solution:
State hypotheses
Test independence of education and voting:
Create contingency table
Organize observed frequencies:
Calculate expected frequencies
Under independence: E_{ij} = (Row_i × Col_j) / Total
Calculate chi-square statistic
Compute each term:
Compute result
Sum all terms:
Determine degrees of freedom
For r×c table:
Find critical value and conclude
From chi-square table:
Key Insight:
The chi-square test for independence compares observed cell frequencies to those expected under independence. With χ² = 4.632 < 5.99, we cannot conclude that education level and voting preference are related.
Problem:
A sequence of stock price movements shows: + + + - - + + - - - + + + + - - + - +. Test whether the sequence is random at α = 0.05.
Solution:
State hypotheses
Test randomness of the sequence:
Count runs and symbols
A run is a sequence of identical symbols:
Determine counts
Count total runs and each symbol:
Calculate expected runs
Under randomness:
Calculate variance
Variance of R under H₀:
Compute Z-statistic
Standardize:
Make decision
For two-tailed test at α = 0.05:
Key Insight:
The Run Test detects departures from randomness. Too few runs suggest clustering (positive autocorrelation), too many suggest alternation (negative autocorrelation). With R = 9 close to the expected 10.26, the sequence appears random.
Mathematical derivations of key nonparametric statistics
For large n, the sign test statistic N⁺ converges in distribution to a Normal distribution.
This allows us to use the standard normal table for hypothesis testing when n > 20.
Let Iᵢ = 1 if Xᵢ > M₀ and 0 otherwise. Under H₀: M = M₀, P(Xᵢ > M₀) = 0.5.
The test statistic N⁺ is the sum of these i.i.d. Bernoulli trials.
The mean and variance of N⁺ under H₀ are:
Since Iᵢ are i.i.d. with finite variance, the standardized sum converges to standard normal.
For better approximation, we often apply a continuity correction of 0.5.
Thus, for large n, we can reject H₀ if |Z| > z_{α/2}.
Derivation of the mean and variance of the Rank-Sum statistic W under the null hypothesis.
These moments are crucial for constructing the Z-statistic for large samples.
Let R₁, …, R_{m+n} be the ranks of the combined sample. Under H₀, any subset of size n is equally likely to be the ranks of the second sample.
The average rank in a set of size N=m+n is (1+N)/2.
By linearity of expectation:
Since ranks are sampled without replacement, they are not independent. Var(W) = ΣVar(Rⱼ) + Σⱼ≠ₖCov(Rⱼ,Rₖ).
Summing the variance and covariance terms (detailed algebra omitted for brevity) yields:
As m, n → ∞, the distribution of W approaches normality.
Under H₀ (symmetric distribution around 0), the signed rank statistic W⁺ has a known distribution based on random sign assignments.
This enables both exact small-sample tests and large-sample normal approximations.
Under H₀: the distribution is symmetric about 0. Thus, for each |Zᵢ|, P(Zᵢ > 0) = P(Zᵢ < 0) = 0.5.
Let Rᵢ be the rank of |Zᵢ|. The signed rank statistic is:
Since E[Sᵢ] = 0.5 and ranks are fixed (1, 2, ..., n):
Since signs are independent and Var(Sᵢ) = 0.25:
For small n, the exact distribution can be enumerated. There are 2ⁿ equally likely sign patterns:
For large n (typically n ≥ 20), the CLT applies:
Pearson's chi-square statistic converges to a chi-square distribution under the null hypothesis.
This provides the theoretical foundation for chi-square tests in categorical data analysis.
Let (O₁, ..., Oₖ) follow Multinomial(n; p₁, ..., pₖ) where Eᵢ = npᵢ.
By the multivariate CLT, the standardized frequencies are asymptotically normal:
The chi-square statistic can be written as:
The multinomial covariance creates dependence. The quadratic form becomes:
The constraint Σpᵢ = 1 reduces effective dimensions by 1. If m parameters are estimated, subtract m more:
By properties of quadratic forms of normal vectors:
The scaled K-S statistic √n·Dₙ converges to the Kolmogorov distribution.
The Kolmogorov distribution K has CDF: P(K ≤ x) = 1 - 2Σ(-1)^(j-1)e^(-2j²x²).
Define the empirical process:
First, we need the uniform law of large numbers:
The empirical process converges to a Brownian bridge B(t) on [0,1]:
Apply the continuous mapping theorem to the supremum functional:
The distribution of sup|B(t)| is the Kolmogorov distribution:
The Kolmogorov distribution provides asymptotic critical values:
Test your understanding with 10 multiple-choice questions
Common questions about nonparametric hypothesis testing