MathIsimple
Nonparametric Testing Formulas

Nonparametric Testing Formula Reference

Complete collection of formulas for distribution-free statistical tests: sign tests, rank-based methods, chi-square tests, and Kolmogorov-Smirnov procedures with practical applications.

Distribution-Free MethodsPractical ApplicationsRobust Procedures

Quick Formula Reference

Essential nonparametric testing formulas for quick lookup

Sign Test Fundamentals
Basic formulas for sign-based hypothesis testing procedures

Sign Test Statistic

N+=i=1nI(Xit0>0)N^+ = \sum_{i=1}^n I(X_i - t_0 > 0)

Count of positive differences from hypothesized median

Binomial Distribution

N+B(n,θ) where θ=P(X>t0)N^+ \sim B(n, \theta) \text{ where } \theta = P(X > t_0)

Distribution of sign test statistic under null hypothesis

P-value (Two-sided)

P=2min{P(N+k),P(N+k)}P = 2\min\{P(N^+ \leq k), P(N^+ \geq k)\}

Two-sided p-value calculation for sign test

Large Sample Approximation

Z=N+nθnθ(1θ)N(0,1)Z = \frac{N^+ - n\theta}{\sqrt{n\theta(1-\theta)}} \sim N(0,1)

Normal approximation for large sample sign test

Rank-Based Test Statistics
Key formulas for Wilcoxon rank sum and signed rank tests

Rank Sum Statistic (W)

W=i=1nRi (sum of ranks for sample 2)W = \sum_{i=1}^n R_i \text{ (sum of ranks for sample 2)}

Wilcoxon rank sum test statistic

Expected Rank Sum

E[W]=n(m+n+1)2E[W] = \frac{n(m+n+1)}{2}

Expected value of rank sum under null hypothesis

Rank Sum Variance

Var(W)=mn(m+n+1)12\text{Var}(W) = \frac{mn(m+n+1)}{12}

Variance of rank sum statistic

Signed Rank Statistic

W+=i=1nRiI(Zi>0)W^+ = \sum_{i=1}^n R_i I(Z_i > 0)

Sum of positive signed ranks

Chi-Square Test Statistics
Formulas for goodness-of-fit and independence testing

Chi-Square Statistic

χ2=i=1k(OiEi)2Ei\chi^2 = \sum_{i=1}^k \frac{(O_i - E_i)^2}{E_i}

General chi-square test statistic for goodness-of-fit

Independence Test

χ2=i=1rj=1s(OijEij)2Eij\chi^2 = \sum_{i=1}^r \sum_{j=1}^s \frac{(O_{ij} - E_{ij})^2}{E_{ij}}

Chi-square statistic for independence testing

Expected Frequency

Eij=ninjnE_{ij} = \frac{n_{i\cdot} n_{\cdot j}}{n}

Expected frequency under independence assumption

Degrees of Freedom

df=(r1)(s1)df = (r-1)(s-1)

Degrees of freedom for independence test

Sign Test Procedures

Complete formulas for median testing using sign information

Single Sample Median Test

Hypotheses

H0:tp=t0 vs H1:tpt0 (two-sided)H_0: t_p = t_0 \text{ vs } H_1: t_p \neq t_0 \text{ (two-sided)}
H0:tpt0 vs H1:tp>t0 (right-sided)H_0: t_p \leq t_0 \text{ vs } H_1: t_p > t_0 \text{ (right-sided)}
H0:tpt0 vs H1:tp<t0 (left-sided)H_0: t_p \geq t_0 \text{ vs } H_1: t_p < t_0 \text{ (left-sided)}

Assumptions

Random sample from continuous population\text{Random sample from continuous population}
Independence of observations\text{Independence of observations}
Population median tp exists\text{Population median } t_p \text{ exists}

Test Statistic

N+=i=1nI(Xi>t0)B(n,θ)N^+ = \sum_{i=1}^n I(X_i > t_0) \sim B(n, \theta)

Rejection Regions

Two-sided: N+C1 or N+C2\text{Two-sided: } N^+ \leq C_1 \text{ or } N^+ \geq C_2
Right-sided: N+C\text{Right-sided: } N^+ \geq C^*
Left-sided: N+C\text{Left-sided: } N^+ \leq C^*

Additional Formulas

Under H0:θ=0.5 (for median test)\text{Under } H_0: \theta = 0.5 \text{ (for median test)}
Exact p-value: P=k=N+n(nk)0.5n (right-tail)\text{Exact p-value: } P = \sum_{k=N^+}^n \binom{n}{k} 0.5^n \text{ (right-tail)}
Large sample: Z=N+0.5n0.5nN(0,1)\text{Large sample: } Z = \frac{N^+ - 0.5n}{0.5\sqrt{n}} \sim N(0,1)
Paired Sample Sign Test

Hypotheses

H0:P(X>Y)=0.5 vs H1:P(X>Y)0.5H_0: P(X > Y) = 0.5 \text{ vs } H_1: P(X > Y) \neq 0.5
H0:Median(XY)=0 vs H1:Median(XY)0H_0: \text{Median}(X-Y) = 0 \text{ vs } H_1: \text{Median}(X-Y) \neq 0

Assumptions

Paired observations (Xi,Yi),i=1,,n\text{Paired observations } (X_i, Y_i), i = 1, \ldots, n
Differences Zi=XiYi are independent\text{Differences } Z_i = X_i - Y_i \text{ are independent}
Population of differences is continuous\text{Population of differences is continuous}

Test Statistic

N+=i=1nI(Zi>0)B(n,0.5)N^+ = \sum_{i=1}^n I(Z_i > 0) \sim B(n', 0.5)

Rejection Regions

Two-sided: N+C1 or N+C2\text{Two-sided: } N^+ \leq C_1 \text{ or } N^+ \geq C_2
where n= number of non-zero differences\text{where } n' = \text{ number of non-zero differences}

Additional Formulas

Tied observations (Zi=0) are excluded\text{Tied observations (} Z_i = 0 \text{) are excluded}
Effective sample size: n=nnumber of ties\text{Effective sample size: } n' = n - \text{number of ties}

Wilcoxon Rank Tests

Rank-based procedures for sample comparisons

Wilcoxon Rank Sum Test (Mann-Whitney U)

Hypotheses

H0:F(x)=G(x) vs H1:F(x)G(x)H_0: F(x) = G(x) \text{ vs } H_1: F(x) \neq G(x)
H0:P(X>Y)=0.5 vs H1:P(X>Y)0.5H_0: P(X > Y) = 0.5 \text{ vs } H_1: P(X > Y) \neq 0.5

Assumptions

Two independent random samples\text{Two independent random samples}
X1,,XmF(x),Y1,,YnG(x)X_1, \ldots, X_m \sim F(x), Y_1, \ldots, Y_n \sim G(x)
Observations are continuous (no ties)\text{Observations are continuous (no ties)}

Test Statistic

W=i=1nRi (sum of ranks for sample 2)W = \sum_{i=1}^n R_i \text{ (sum of ranks for sample 2)}

Distribution Properties

E[W]=n(m+n+1)2E[W] = \frac{n(m+n+1)}{2}
Var(W)=mn(m+n+1)12\text{Var}(W) = \frac{mn(m+n+1)}{12}
Range: n(n+1)2Wn(2m+n+1)2\text{Range: } \frac{n(n+1)}{2} \leq W \leq \frac{n(2m+n+1)}{2}

Rejection Regions

Two-sided: WWα/2 or WW1α/2\text{Two-sided: } W \leq W_{\alpha/2} \text{ or } W \geq W_{1-\alpha/2}
Large sample: Z>zα/2 where Z=WE[W]Var(W)\text{Large sample: } |Z| > z_{\alpha/2} \text{ where } Z = \frac{W - E[W]}{\sqrt{\text{Var}(W)}}

Tie Correction

Variance with ties: Var(W)=mn12[(m+n+1)ti3ti(m+n)(m+n1)]\text{Variance with ties: } \text{Var}(W) = \frac{mn}{12}\left[(m+n+1) - \frac{\sum t_i^3 - \sum t_i}{(m+n)(m+n-1)}\right]
where ti= number of tied observations in group i\text{where } t_i = \text{ number of tied observations in group } i
Wilcoxon Signed Rank Test

Hypotheses

H0:Population is symmetric about θ0H_0: \text{Population is symmetric about } \theta_0
H0:Median(Z)=0 vs H1:Median(Z)0H_0: \text{Median}(Z) = 0 \text{ vs } H_1: \text{Median}(Z) \neq 0

Assumptions

Paired sample differences Zi=XiYi\text{Paired sample differences } Z_i = X_i - Y_i
Differences are independent and continuous\text{Differences are independent and continuous}
Population of differences is symmetric\text{Population of differences is symmetric}

Test Statistic

W+=i=1nRiI(Zi>0)W^+ = \sum_{i=1}^n R_i I(Z_i > 0)

Distribution Properties

E[W+]=n(n+1)4E[W^+] = \frac{n(n+1)}{4}
Var(W+)=n(n+1)(2n+1)24\text{Var}(W^+) = \frac{n(n+1)(2n+1)}{24}
Range: 0W+n(n+1)2\text{Range: } 0 \leq W^+ \leq \frac{n(n+1)}{2}

Rejection Regions

Two-sided: W+Wα/2+ or W+W1α/2+\text{Two-sided: } W^+ \leq W^+_{\alpha/2} \text{ or } W^+ \geq W^+_{1-\alpha/2}
Large sample: Z>zα/2 where Z=W+E[W+]Var(W+)\text{Large sample: } |Z| > z_{\alpha/2} \text{ where } Z = \frac{W^+ - E[W^+]}{\sqrt{\text{Var}(W^+)}}

Chi-Square Goodness-of-Fit Tests

Testing distributional assumptions with categorical data

Simple Goodness-of-Fit (No Parameters)

Hypotheses

H0:P(XAi)=pi,i=1,,kH_0: P(X \in A_i) = p_i, i = 1, \ldots, k
H1:At least one P(XAi)piH_1: \text{At least one } P(X \in A_i) \neq p_i

Assumptions

Multinomial distribution: (n1,,nk)Multinomial(n,p1,,pk)\text{Multinomial distribution: } (n_1, \ldots, n_k) \sim \text{Multinomial}(n, p_1, \ldots, p_k)
Expected frequencies: Ei=npi5 for all i\text{Expected frequencies: } E_i = np_i \geq 5 \text{ for all } i
i=1kpi=1\sum_{i=1}^k p_i = 1

Test Statistic

χ2=i=1k(OiEi)2Eiχ2(k1)\chi^2 = \sum_{i=1}^k \frac{(O_i - E_i)^2}{E_i} \sim \chi^2(k-1)

Rejection Regions

Reject H0 if χ2>χα2(k1)\text{Reject } H_0 \text{ if } \chi^2 > \chi^2_{\alpha}(k-1)

Additional Formulas

Degrees of freedom: df=k1\text{Degrees of freedom: } df = k - 1
Alternative form: χ2=i=1kOi2Ein\text{Alternative form: } \chi^2 = \sum_{i=1}^k \frac{O_i^2}{E_i} - n
Composite Goodness-of-Fit (With Parameters)

Hypotheses

H0:XF(x;θ1,,θm)H_0: X \sim F(x; \theta_1, \ldots, \theta_m)
H1:X≁F(x;θ1,,θm)H_1: X \not\sim F(x; \theta_1, \ldots, \theta_m)

Assumptions

Parameters θ1,,θm estimated from data\text{Parameters } \theta_1, \ldots, \theta_m \text{ estimated from data}
Maximum likelihood estimates: θ^1,,θ^m\text{Maximum likelihood estimates: } \hat{\theta}_1, \ldots, \hat{\theta}_m
Large sample sizes and adequate expected frequencies\text{Large sample sizes and adequate expected frequencies}

Test Statistic

χ2=i=1k(OiE^i)2E^iχ2(km1)\chi^2 = \sum_{i=1}^k \frac{(O_i - \hat{E}_i)^2}{\hat{E}_i} \sim \chi^2(k-m-1)

Rejection Regions

Reject H0 if χ2>χα2(km1)\text{Reject } H_0 \text{ if } \chi^2 > \chi^2_{\alpha}(k-m-1)

Additional Formulas

Degrees of freedom: df=km1\text{Degrees of freedom: } df = k - m - 1
E^i=npi(θ^1,,θ^m)\hat{E}_i = np_i(\hat{\theta}_1, \ldots, \hat{\theta}_m)
Common distributions: Normal (m=2), Exponential (m=1), Poisson (m=1)\text{Common distributions: Normal (m=2), Exponential (m=1), Poisson (m=1)}

Chi-Square Independence Tests

Testing associations in contingency tables

r × s Contingency Table Test

Hypotheses

H0:Row and column variables are independentH_0: \text{Row and column variables are independent}
H0:pij=pipj for all i,jH_0: p_{ij} = p_{i\cdot} p_{\cdot j} \text{ for all } i,j
H1:Variables are associatedH_1: \text{Variables are associated}

Assumptions

Random sample of size n\text{Random sample of size } n
Each observation classified by two categorical variables\text{Each observation classified by two categorical variables}
Expected frequencies Eij5 (approximately)\text{Expected frequencies } E_{ij} \geq 5 \text{ (approximately)}

Test Statistic

χ2=i=1rj=1s(OijEij)2Eijχ2((r1)(s1))\chi^2 = \sum_{i=1}^r \sum_{j=1}^s \frac{(O_{ij} - E_{ij})^2}{E_{ij}} \sim \chi^2((r-1)(s-1))

Expected Frequencies

Eij=ninjnE_{ij} = \frac{n_{i\cdot} n_{\cdot j}}{n}
Row totals: ni=j=1sOij\text{Row totals: } n_{i\cdot} = \sum_{j=1}^s O_{ij}
Column totals: nj=i=1rOij\text{Column totals: } n_{\cdot j} = \sum_{i=1}^r O_{ij}
Grand total: n=i=1rj=1sOij\text{Grand total: } n = \sum_{i=1}^r \sum_{j=1}^s O_{ij}

Rejection Regions

Reject H0 if χ2>χα2((r1)(s1))\text{Reject } H_0 \text{ if } \chi^2 > \chi^2_{\alpha}((r-1)(s-1))

Special Cases

2×2 Table: χ2=n(adbc)2(a+b)(c+d)(a+c)(b+d)\text{2×2 Table: } \chi^2 = \frac{n(ad-bc)^2}{(a+b)(c+d)(a+c)(b+d)}
Yates’ continuity correction: χ2=n(adbcn/2)2(a+b)(c+d)(a+c)(b+d)\text{Yates' continuity correction: } \chi^2 = \frac{n(|ad-bc|-n/2)^2}{(a+b)(c+d)(a+c)(b+d)}

Kolmogorov-Smirnov Tests

Distribution-free tests using empirical distribution functions

One-Sample K-S Test

Hypotheses

H0:F(x)=F0(x) for all xH_0: F(x) = F_0(x) \text{ for all } x
H1:F(x)F0(x) for some xH_1: F(x) \neq F_0(x) \text{ for some } x

Assumptions

Random sample X1,,Xn from continuous distribution\text{Random sample } X_1, \ldots, X_n \text{ from continuous distribution}
F0(x) is completely specified (no unknown parameters)F_0(x) \text{ is completely specified (no unknown parameters)}
Sample size sufficiently large for asymptotic approximation\text{Sample size sufficiently large for asymptotic approximation}

Test Statistic

Dn=supxFn(x)F0(x)D_n = \sup_x |F_n(x) - F_0(x)|

Rejection Regions

Reject H0 if Dn>Dn,α\text{Reject } H_0 \text{ if } D_n > D_{n,\alpha}
Large sample: Dn>Kαn where K0.051.36\text{Large sample: } D_n > \frac{K_\alpha}{\sqrt{n}} \text{ where } K_{0.05} \approx 1.36

Asymptotic Distribution

limnP(nDnx)=K(x)=12j=1(1)j1e2j2x2\lim_{n \to \infty} P(\sqrt{n} D_n \leq x) = K(x) = 1 - 2\sum_{j=1}^\infty (-1)^{j-1} e^{-2j^2x^2}

Empirical Distribution Function

Fn(x)=1ni=1nI(Xix)F_n(x) = \frac{1}{n} \sum_{i=1}^n I(X_i \leq x)
Step function increasing by 1/n at each observation\text{Step function increasing by } 1/n \text{ at each observation}
Two-Sample K-S Test

Hypotheses

H0:F1(x)=F2(x) for all xH_0: F_1(x) = F_2(x) \text{ for all } x
H1:F1(x)F2(x) for some xH_1: F_1(x) \neq F_2(x) \text{ for some } x

Assumptions

Two independent samples from continuous distributions\text{Two independent samples from continuous distributions}
Sample sizes m and n (not necessarily equal)\text{Sample sizes } m \text{ and } n \text{ (not necessarily equal)}
No unknown parameters in either distribution\text{No unknown parameters in either distribution}

Test Statistic

Dm,n=supxF1,m(x)F2,n(x)D_{m,n} = \sup_x |F_{1,m}(x) - F_{2,n}(x)|

Rejection Regions

Reject H0 if Dm,n>Dm,n,α\text{Reject } H_0 \text{ if } D_{m,n} > D_{m,n,\alpha}
Large samples: Dm,n>Kαm+nmn\text{Large samples: } D_{m,n} > K_\alpha \sqrt{\frac{m+n}{mn}}

Run Tests for Randomness

Testing randomness and detecting patterns in sequences

Run Test for Randomness

Basic Definition

Run: sequence of consecutive identical symbols\text{Run: sequence of consecutive identical symbols}

Test Statistic

R=total number of runs in the sequenceR = \text{total number of runs in the sequence}

Applications

Testing randomness of binary sequences\text{Testing randomness of binary sequences}
Detecting trends in time series\text{Detecting trends in time series}
Two-sample distribution comparison\text{Two-sample distribution comparison}

Key Formulas

Expected Number of Runs
E[R]=2n1n2n1+n2+1E[R] = \frac{2n_1 n_2}{n_1 + n_2} + 1

Expected runs under randomness hypothesis

Variance of Runs
Var(R)=2n1n2(2n1n2n1n2)(n1+n2)2(n1+n21)\text{Var}(R) = \frac{2n_1 n_2 (2n_1 n_2 - n_1 - n_2)}{(n_1 + n_2)^2 (n_1 + n_2 - 1)}

Variance of run count statistic

Large Sample Approximation
Z=RE[R]Var(R)N(0,1)Z = \frac{R - E[R]}{\sqrt{\text{Var}(R)}} \sim N(0,1)

Normal approximation for large samples

Two-Sample Application
Combine samples, assign 0/1 labels, count runs\text{Combine samples, assign 0/1 labels, count runs}

Use for comparing two independent samples

Rejection Regions

Two-sided: RRα/2 or RR1α/2\text{Two-sided: } R \leq R_{\alpha/2} \text{ or } R \geq R_{1-\alpha/2}
Pattern detection: RRα (too few runs)\text{Pattern detection: } R \leq R_\alpha \text{ (too few runs)}
Large sample: Z>zα/2\text{Large sample: } |Z| > z_{\alpha/2}

📋 Practical Guidelines

Essential guidelines for applying nonparametric tests effectively

Sample Size Requirements
Minimum sample sizes for valid nonparametric tests

Sign Test

Requirement: n ≥ 5 for exact test, n ≥ 20 for normal approximation

Notes: Exclude tied observations from sample size

Wilcoxon Rank Sum

Requirement: m, n ≥ 8 for normal approximation

Notes: Use exact tables for smaller samples

Chi-Square Tests

Requirement: All expected frequencies ≥ 5

Notes: Combine categories if necessary

K-S Test

Requirement: n ≥ 35 for asymptotic approximation

Notes: Use exact tables for smaller samples

Test Selection Criteria
Guidelines for choosing appropriate nonparametric tests

Single sample median

Data Type: Continuous, ordinal

Recommended: Sign Test

Alternative: Wilcoxon Signed Rank (if symmetric)

Two independent samples

Data Type: Continuous, ordinal

Recommended: Wilcoxon Rank Sum

Alternative: Kolmogorov-Smirnov

Paired samples

Data Type: Continuous, ordinal

Recommended: Wilcoxon Signed Rank

Alternative: Sign Test

Distribution fitting

Data Type: Continuous

Recommended: Kolmogorov-Smirnov

Alternative: Chi-Square Goodness-of-Fit

📊 How to Use These Formulas Effectively

Master nonparametric testing with these formula application strategies and best practices

Check Assumptions

Verify independence, randomness, and measurement scale requirements before applying tests.

Choose Appropriate Test

Select tests based on data type, sample size, and research question for optimal power.

Handle Ties Properly

Apply appropriate tie corrections and consider alternative tests when ties are extensive.

Apply these formulas