Nonparametric Testing Formulas

Nonparametric Testing Formula Reference

Complete collection of formulas for distribution-free statistical tests: sign tests, rank-based methods, chi-square tests, and Kolmogorov-Smirnov procedures with practical applications.

Distribution-Free MethodsPractical ApplicationsRobust Procedures

Quick Formula Reference

Essential nonparametric testing formulas for quick lookup

Sign Test Fundamentals

Basic formulas for sign-based hypothesis testing procedures

Sign Test Statistic

N^+ = \sum_{i=1}^n I(X_i - t_0 > 0)

Count of positive differences from hypothesized median

Binomial Distribution

N^+ \sim B(n, \theta) \text{ where } \theta = P(X > t_0)

Distribution of sign test statistic under null hypothesis

P-value (Two-sided)

P = 2\min\{P(N^+ \leq k), P(N^+ \geq k)\}

Two-sided p-value calculation for sign test

Large Sample Approximation

Z = \frac{N^+ - n\theta}{\sqrt{n\theta(1-\theta)}} \sim N(0,1)

Normal approximation for large sample sign test

Rank-Based Test Statistics

Key formulas for Wilcoxon rank sum and signed rank tests

Rank Sum Statistic (W)

W = \sum_{i=1}^n R_i \text{ (sum of ranks for sample 2)}

Wilcoxon rank sum test statistic

Expected Rank Sum

E[W] = \frac{n(m+n+1)}{2}

Expected value of rank sum under null hypothesis

Rank Sum Variance

\text{Var}(W) = \frac{mn(m+n+1)}{12}

Variance of rank sum statistic

Signed Rank Statistic

W^+ = \sum_{i=1}^n R_i I(Z_i > 0)

Sum of positive signed ranks

Chi-Square Test Statistics

Formulas for goodness-of-fit and independence testing

Chi-Square Statistic

\chi^2 = \sum_{i=1}^k \frac{(O_i - E_i)^2}{E_i}

General chi-square test statistic for goodness-of-fit

Independence Test

\chi^2 = \sum_{i=1}^r \sum_{j=1}^s \frac{(O_{ij} - E_{ij})^2}{E_{ij}}

Chi-square statistic for independence testing

Expected Frequency

E_{ij} = \frac{n_{i\cdot} n_{\cdot j}}{n}

Expected frequency under independence assumption

Degrees of Freedom

df = (r-1)(s-1)

Degrees of freedom for independence test

Sign Test Procedures

Complete formulas for median testing using sign information

Single Sample Median Test

Hypotheses

H_0: t_p = t_0 \text{ vs } H_1: t_p \neq t_0 \text{ (two-sided)}

H_0: t_p \leq t_0 \text{ vs } H_1: t_p > t_0 \text{ (right-sided)}

H_0: t_p \geq t_0 \text{ vs } H_1: t_p < t_0 \text{ (left-sided)}

Assumptions

\text{Random sample from continuous population}

\text{Independence of observations}

\text{Population median } t_p \text{ exists}

Test Statistic

N^+ = \sum_{i=1}^n I(X_i > t_0) \sim B(n, \theta)

Rejection Regions

\text{Two-sided: } N^+ \leq C_1 \text{ or } N^+ \geq C_2

\text{Right-sided: } N^+ \geq C^*

\text{Left-sided: } N^+ \leq C^*

Additional Formulas

\text{Under } H_0: \theta = 0.5 \text{ (for median test)}

\text{Exact p-value: } P = \sum_{k=N^+}^n \binom{n}{k} 0.5^n \text{ (right-tail)}

\text{Large sample: } Z = \frac{N^+ - 0.5n}{0.5\sqrt{n}} \sim N(0,1)

Paired Sample Sign Test

Hypotheses

H_0: P(X > Y) = 0.5 \text{ vs } H_1: P(X > Y) \neq 0.5

H_0: \text{Median}(X-Y) = 0 \text{ vs } H_1: \text{Median}(X-Y) \neq 0

Assumptions

\text{Paired observations } (X_i, Y_i), i = 1, \ldots, n

\text{Differences } Z_i = X_i - Y_i \text{ are independent}

\text{Population of differences is continuous}

Test Statistic

N^+ = \sum_{i=1}^n I(Z_i > 0) \sim B(n', 0.5)

Rejection Regions

\text{Two-sided: } N^+ \leq C_1 \text{ or } N^+ \geq C_2

\text{where } n' = \text{ number of non-zero differences}

Additional Formulas

\text{Tied observations (} Z_i = 0 \text{) are excluded}

\text{Effective sample size: } n' = n - \text{number of ties}

Wilcoxon Rank Tests

Rank-based procedures for sample comparisons

Wilcoxon Rank Sum Test (Mann-Whitney U)

Hypotheses

H_0: F(x) = G(x) \text{ vs } H_1: F(x) \neq G(x)

H_0: P(X > Y) = 0.5 \text{ vs } H_1: P(X > Y) \neq 0.5

Assumptions

\text{Two independent random samples}

X_1, \ldots, X_m \sim F(x), Y_1, \ldots, Y_n \sim G(x)

\text{Observations are continuous (no ties)}

Test Statistic

W = \sum_{i=1}^n R_i \text{ (sum of ranks for sample 2)}

Distribution Properties

E[W] = \frac{n(m+n+1)}{2}

\text{Var}(W) = \frac{mn(m+n+1)}{12}

\text{Range: } \frac{n(n+1)}{2} \leq W \leq \frac{n(2m+n+1)}{2}

Rejection Regions

\text{Two-sided: } W \leq W_{\alpha/2} \text{ or } W \geq W_{1-\alpha/2}

\text{Large sample: } |Z| > z_{\alpha/2} \text{ where } Z = \frac{W - E[W]}{\sqrt{\text{Var}(W)}}

Tie Correction

\text{Variance with ties: } \text{Var}(W) = \frac{mn}{12}\left[(m+n+1) - \frac{\sum t_i^3 - \sum t_i}{(m+n)(m+n-1)}\right]

\text{where } t_i = \text{ number of tied observations in group } i

Wilcoxon Signed Rank Test

Hypotheses

H_0: \text{Population is symmetric about } \theta_0

H_0: \text{Median}(Z) = 0 \text{ vs } H_1: \text{Median}(Z) \neq 0

Assumptions

\text{Paired sample differences } Z_i = X_i - Y_i

\text{Differences are independent and continuous}

\text{Population of differences is symmetric}

Test Statistic

W^+ = \sum_{i=1}^n R_i I(Z_i > 0)

Distribution Properties

E[W^+] = \frac{n(n+1)}{4}

\text{Var}(W^+) = \frac{n(n+1)(2n+1)}{24}

\text{Range: } 0 \leq W^+ \leq \frac{n(n+1)}{2}

Rejection Regions

\text{Two-sided: } W^+ \leq W^+_{\alpha/2} \text{ or } W^+ \geq W^+_{1-\alpha/2}

\text{Large sample: } |Z| > z_{\alpha/2} \text{ where } Z = \frac{W^+ - E[W^+]}{\sqrt{\text{Var}(W^+)}}

Chi-Square Goodness-of-Fit Tests

Testing distributional assumptions with categorical data

Simple Goodness-of-Fit (No Parameters)

Hypotheses

H_0: P(X \in A_i) = p_i, i = 1, \ldots, k

H_1: \text{At least one } P(X \in A_i) \neq p_i

Assumptions

\text{Multinomial distribution: } (n_1, \ldots, n_k) \sim \text{Multinomial}(n, p_1, \ldots, p_k)

\text{Expected frequencies: } E_i = np_i \geq 5 \text{ for all } i

\sum_{i=1}^k p_i = 1

Test Statistic

\chi^2 = \sum_{i=1}^k \frac{(O_i - E_i)^2}{E_i} \sim \chi^2(k-1)

Rejection Regions

\text{Reject } H_0 \text{ if } \chi^2 > \chi^2_{\alpha}(k-1)

Additional Formulas

\text{Degrees of freedom: } df = k - 1

\text{Alternative form: } \chi^2 = \sum_{i=1}^k \frac{O_i^2}{E_i} - n

Composite Goodness-of-Fit (With Parameters)

Hypotheses

H_0: X \sim F(x; \theta_1, \ldots, \theta_m)

H_1: X \not\sim F(x; \theta_1, \ldots, \theta_m)

Assumptions

\text{Parameters } \theta_1, \ldots, \theta_m \text{ estimated from data}

\text{Maximum likelihood estimates: } \hat{\theta}_1, \ldots, \hat{\theta}_m

\text{Large sample sizes and adequate expected frequencies}

Test Statistic

\chi^2 = \sum_{i=1}^k \frac{(O_i - \hat{E}_i)^2}{\hat{E}_i} \sim \chi^2(k-m-1)

Rejection Regions

\text{Reject } H_0 \text{ if } \chi^2 > \chi^2_{\alpha}(k-m-1)

Additional Formulas

\text{Degrees of freedom: } df = k - m - 1

\hat{E}_i = np_i(\hat{\theta}_1, \ldots, \hat{\theta}_m)

\text{Common distributions: Normal (m=2), Exponential (m=1), Poisson (m=1)}

Chi-Square Independence Tests

Testing associations in contingency tables

r × s Contingency Table Test

Hypotheses

H_0: \text{Row and column variables are independent}

H_0: p_{ij} = p_{i\cdot} p_{\cdot j} \text{ for all } i,j

H_1: \text{Variables are associated}

Assumptions

\text{Random sample of size } n

\text{Each observation classified by two categorical variables}

\text{Expected frequencies } E_{ij} \geq 5 \text{ (approximately)}

Test Statistic

\chi^2 = \sum_{i=1}^r \sum_{j=1}^s \frac{(O_{ij} - E_{ij})^2}{E_{ij}} \sim \chi^2((r-1)(s-1))

Expected Frequencies

E_{ij} = \frac{n_{i\cdot} n_{\cdot j}}{n}

\text{Row totals: } n_{i\cdot} = \sum_{j=1}^s O_{ij}

\text{Column totals: } n_{\cdot j} = \sum_{i=1}^r O_{ij}

\text{Grand total: } n = \sum_{i=1}^r \sum_{j=1}^s O_{ij}

Rejection Regions

\text{Reject } H_0 \text{ if } \chi^2 > \chi^2_{\alpha}((r-1)(s-1))

Special Cases

\text{2×2 Table: } \chi^2 = \frac{n(ad-bc)^2}{(a+b)(c+d)(a+c)(b+d)}

\text{Yates' continuity correction: } \chi^2 = \frac{n(|ad-bc|-n/2)^2}{(a+b)(c+d)(a+c)(b+d)}

Kolmogorov-Smirnov Tests

Distribution-free tests using empirical distribution functions

One-Sample K-S Test

Hypotheses

H_0: F(x) = F_0(x) \text{ for all } x

H_1: F(x) \neq F_0(x) \text{ for some } x

Assumptions

\text{Random sample } X_1, \ldots, X_n \text{ from continuous distribution}

F_0(x) \text{ is completely specified (no unknown parameters)}

\text{Sample size sufficiently large for asymptotic approximation}

Test Statistic

D_n = \sup_x |F_n(x) - F_0(x)|

Rejection Regions

\text{Reject } H_0 \text{ if } D_n > D_{n,\alpha}

\text{Large sample: } D_n > \frac{K_\alpha}{\sqrt{n}} \text{ where } K_{0.05} \approx 1.36

Asymptotic Distribution

\lim_{n \to \infty} P(\sqrt{n} D_n \leq x) = K(x) = 1 - 2\sum_{j=1}^\infty (-1)^{j-1} e^{-2j^2x^2}

Empirical Distribution Function

F_n(x) = \frac{1}{n} \sum_{i=1}^n I(X_i \leq x)

\text{Step function increasing by } 1/n \text{ at each observation}

Two-Sample K-S Test

Hypotheses

H_0: F_1(x) = F_2(x) \text{ for all } x

H_1: F_1(x) \neq F_2(x) \text{ for some } x

Assumptions

\text{Two independent samples from continuous distributions}

\text{Sample sizes } m \text{ and } n \text{ (not necessarily equal)}

\text{No unknown parameters in either distribution}

Test Statistic

D_{m,n} = \sup_x |F_{1,m}(x) - F_{2,n}(x)|

Rejection Regions

\text{Reject } H_0 \text{ if } D_{m,n} > D_{m,n,\alpha}

\text{Large samples: } D_{m,n} > K_\alpha \sqrt{\frac{m+n}{mn}}

Run Tests for Randomness

Testing randomness and detecting patterns in sequences

Run Test for Randomness

Basic Definition

\text{Run: sequence of consecutive identical symbols}

Test Statistic

R = \text{total number of runs in the sequence}

Applications

\text{Testing randomness of binary sequences}

\text{Detecting trends in time series}

\text{Two-sample distribution comparison}

Key Formulas

Expected Number of Runs

E[R] = \frac{2n_1 n_2}{n_1 + n_2} + 1

Expected runs under randomness hypothesis

Variance of Runs

\text{Var}(R) = \frac{2n_1 n_2 (2n_1 n_2 - n_1 - n_2)}{(n_1 + n_2)^2 (n_1 + n_2 - 1)}

Variance of run count statistic

Large Sample Approximation

Z = \frac{R - E[R]}{\sqrt{\text{Var}(R)}} \sim N(0,1)

Normal approximation for large samples

Two-Sample Application

\text{Combine samples, assign 0/1 labels, count runs}

Use for comparing two independent samples

Rejection Regions

\text{Two-sided: } R \leq R_{\alpha/2} \text{ or } R \geq R_{1-\alpha/2}

\text{Pattern detection: } R \leq R_\alpha \text{ (too few runs)}

\text{Large sample: } |Z| > z_{\alpha/2}

📋 Practical Guidelines

Essential guidelines for applying nonparametric tests effectively

Sample Size Requirements

Minimum sample sizes for valid nonparametric tests

Sign Test

Requirement: n ≥ 5 for exact test, n ≥ 20 for normal approximation

Notes: Exclude tied observations from sample size

Wilcoxon Rank Sum

Requirement: m, n ≥ 8 for normal approximation

Notes: Use exact tables for smaller samples

Chi-Square Tests

Requirement: All expected frequencies ≥ 5

Notes: Combine categories if necessary

K-S Test

Requirement: n ≥ 35 for asymptotic approximation

Notes: Use exact tables for smaller samples

Test Selection Criteria

Guidelines for choosing appropriate nonparametric tests

Single sample median

Data Type: Continuous, ordinal

Recommended: Sign Test

Alternative: Wilcoxon Signed Rank (if symmetric)

Two independent samples

Data Type: Continuous, ordinal

Recommended: Wilcoxon Rank Sum

Alternative: Kolmogorov-Smirnov

Paired samples

Data Type: Continuous, ordinal

Recommended: Wilcoxon Signed Rank

Alternative: Sign Test

Distribution fitting

Data Type: Continuous

Recommended: Kolmogorov-Smirnov

Alternative: Chi-Square Goodness-of-Fit

📊 How to Use These Formulas Effectively

Master nonparametric testing with these formula application strategies and best practices

Check Assumptions

Verify independence, randomness, and measurement scale requirements before applying tests.

Choose Appropriate Test

Select tests based on data type, sample size, and research question for optimal power.

Handle Ties Properly

Apply appropriate tie corrections and consider alternative tests when ties are extensive.

Apply these formulas

Use Calculator Learn Theory