MathIsimple

Sufficient & Complete Statistics

Master the fundamental concepts of sufficient and complete statistics, their theoretical foundations, and applications in optimal statistical inference.

Learning Objectives
Master the concepts of sufficient statistics and their role in statistical inference
Understand the Fisher-Neyman Factorization Theorem and its applications
Learn about complete statistics and their importance in optimal estimation
Apply the Rao-Blackwell and Lehmann-Scheffé theorems in practice
Explore the relationship between sufficiency and completeness
Understand Basu's theorem and independence properties

Essential Definitions

Sufficient Statistic

A statistic T(X̃) that contains all information about θ contained in the sample. Given T=t, the conditional distribution of X̃ is independent of θ.

P(X̃ = x̃ | T = t; θ) is independent of θ
Complete Statistic

A statistic T where the only unbiased function with zero expectation is the zero function (with probability 1).

E_θ[φ(T)] = 0 ∀θ ⇒ P_θ(φ(T) = 0) = 1 ∀θ
Factorization Theorem

T(X̃) is sufficient for θ if and only if the joint density can be factored as p(x̃;θ) = g(T(x̃);θ)h(x̃).

p(x̃;θ) = g(T(x̃);θ) × h(x̃)

Sufficient Statistics

Concept of Sufficient Statistics
A sufficient statistic captures all the information about the parameter contained in the sample

Key Properties:

Simplifies inference without loss of information about the parameter
Reduces data dimensionality while preserving statistical properties
Forms the foundation for optimal estimation theory

Examples:

Binomial B(n,p): T = ΣXᵢ (total successes) is sufficient for p

Normal N(μ,σ²): T = (ΣXᵢ, ΣXᵢ²) is sufficient for (μ,σ²)

Poisson P(λ): T = ΣXᵢ (total count) is sufficient for λ

Uniform U(0,θ): T = X₍ₙ₎ (maximum order statistic) is sufficient for θ

Fisher-Neyman Factorization Theorem
The fundamental criterion for identifying sufficient statistics

Theorem Statement:

T(X̃) is sufficient for θ if and only if the joint density/mass function can be written as:

p(x̃;θ) = g(T(x̃);θ) × h(x̃)

Components:

1
g(T(x̃);θ): depends on data only through T(x̃) and on parameter θ
2
h(x̃): depends on data x̃ but is independent of parameter θ

Detailed Examples:

Normal N(μ,σ²)
Joint Density:
p(x̃;μ,σ²) = (2πσ²)^(-n/2) exp{-Σ(xᵢ-μ)²/(2σ²)}
Factorization:
g(T;μ,σ²) = (2πσ²)^(-n/2) exp{μΣxᵢ/σ² - nμ²/(2σ²) - Σxᵢ²/(2σ²)}, h(x̃) = 1
Sufficient Statistic:
T = (Σxᵢ, Σxᵢ²)
Poisson P(λ)
Joint Density:
p(x̃;λ) = λ^(Σxᵢ) e^(-nλ) / Π(xᵢ!)
Factorization:
g(T;λ) = λ^T e^(-nλ), h(x̃) = 1/Π(xᵢ!)
Sufficient Statistic:
T = Σxᵢ
Uniform U(0,θ)
Joint Density:
p(x̃;θ) = θ^(-n) I{0 ≤ x₍₁₎ ≤ x₍ₙ₎ ≤ θ}
Factorization:
g(T;θ) = θ^(-n) I{T ≤ θ}, h(x̃) = I{0 ≤ x₍₁₎ ≤ T}
Sufficient Statistic:
T = X₍ₙ₎
Rao-Blackwell Theorem
Demonstrates how sufficient statistics improve estimation efficiency

Theorem Statement:

If T is sufficient for θ and φ(X̃) is an unbiased estimator of g(θ), then:

ĝ(T) = E[φ(X̃)|T] is also unbiased for g(θ) with Var_θ(ĝ(T)) ≤ Var_θ(φ(X̃))

Key Implications:

Sufficient statistics allow variance reduction without bias
Optimal unbiased estimators must be functions of sufficient statistics
Provides systematic method for improving estimators

Practical Example:

Setup: Binomial B(1,p) with sample X₁,...,Xₙ
Original Estimator: φ(X̃) = X₁ with Var(X₁) = p(1-p)
Sufficient Statistic: T = ΣXᵢ is sufficient for p
Improved Estimator: ĝ(T) = E[X₁|T] = T/n = X̄ with Var(X̄) = p(1-p)/n ≤ Var(X₁)

Complete Statistics

Complete Statistics
Statistics where unbiased functions with zero expectation must be zero functions

Intuitive Understanding:

Completeness ensures uniqueness of unbiased estimators based on the statistic

If E_θ[φ(T)] = 0 for all θ ∈ Θ, then P_θ(φ(T) = 0) = 1 for all θ ∈ Θ

Statistical Significance:

Guarantees uniqueness of UMVUE when combined with sufficiency
Eliminates non-zero functions with zero expectation
Essential for Lehmann-Scheffé theorem applications
Examples of Complete Statistics
Binomial B(n,p), 0 < p < 1
Statistic:T ~ B(n,p)
Proof Outline:

If E_p[φ(T)] = Σφ(k)C(n,k)p^k(1-p)^(n-k) = 0 for all p ∈ (0,1), substituting θ = p/(1-p) gives polynomial Σφ(k)C(n,k)θ^k = 0. Since polynomial is zero for all θ > 0, all coefficients must be zero, so φ(k) = 0.

Conclusion: T is complete
Normal N(μ,σ²), μ ∈ ℝ, σ > 0
Statistic:T = (ΣXᵢ, ΣXᵢ²)
Proof Outline:

Uses completeness of normal family and gamma distribution properties. If E[φ(T)] = 0 for all (μ,σ²), then φ must be zero almost everywhere.

Conclusion: T is complete
Uniform U(0,θ), θ > 0
Statistic:T = X₍ₙ₎ with density p_T(t;θ) = nt^(n-1)/θ^n, 0 < t < θ
Proof Outline:

If E_θ[φ(T)] = ∫₀^θ φ(t)(nt^(n-1)/θ^n)dt = 0 for all θ > 0, differentiating with respect to θ gives φ(θ)θ^(n-1) = 0, so φ(t) = 0 for all t > 0.

Conclusion: T is complete
Non-Complete Example: Normal N(0,σ²), σ > 0
Statistic:X₁
Counterexample:

φ(X₁) = X₁ has E[φ(X₁)] = 0 for all σ², but P(X₁ = 0) = 0 ≠ 1

Conclusion: X₁ is not complete (the family itself is not complete)
Lehmann-Scheffé Theorem
Core theorem for constructing unique UMVUE using sufficient complete statistics

Theorem Statement:

If S is a sufficient complete statistic for θ and φ(X̃) is an unbiased estimator of g(θ), then:

ĝ = E[φ(X̃)|S] is the unique UMVUE of g(θ)

Corollary:

If h(S) is a function of sufficient complete statistic S with E_θ[h(S)] = g(θ), then h(S) is the unique UMVUE of g(θ)

Applications:

Normal N(μ,σ²)
Sufficient Complete: S = (ΣXᵢ, ΣXᵢ²)
Parameter: μ
UMVUE: X̄ = (ΣXᵢ)/n
Verification: E[X̄] = μ and X̄ is function of S
Normal N(μ,σ²)
Sufficient Complete: S = (ΣXᵢ, ΣXᵢ²)
Parameter: σ²
UMVUE: S² = Σ(Xᵢ-X̄)²/(n-1)
Verification: E[S²] = σ² and S² is function of S
Poisson P(λ)
Sufficient Complete: S = ΣXᵢ
Parameter: λ
UMVUE: X̄ = S/n
Verification: E[X̄] = λ and X̄ is function of S
Basu's Theorem
Establishes independence between sufficient complete statistics and ancillary statistics

Theorem Statement:

If T is a sufficient complete statistic for θ and V is an ancillary statistic (distribution independent of θ), then T and V are independent for all θ ∈ Θ

Key Concepts:

Ancillary statistic: distribution does not depend on θ
Independence follows from sufficiency and completeness
Useful for proving independence in specific problems

Example Application:

Setup: Normal N(μ,σ²) with sample X₁,...,Xₙ
Sufficient Complete: (X̄, S²) is sufficient complete for (μ,σ²)
Ancillary Statistic: Sample skewness: Skew = (√n Σ(Xᵢ-X̄)³)/(Σ(Xᵢ-X̄)²)^(3/2)
Reasoning: Skewness distribution is invariant under location-scale transformations Yᵢ = (Xᵢ-μ)/σ
Conclusion: (X̄, S²) and Skew are independent by Basu's theorem
Sufficiency vs Completeness: Key Differences
AspectSufficient StatisticsComplete Statistics
Core PurposeCaptures all parameter information from sample dataEnsures uniqueness of unbiased functions
Mathematical CriterionFactorization theorem: p(x̃;θ) = g(T(x̃);θ)h(x̃)Zero unbiased functions are zero: E[φ(T)] = 0 ⟹ φ(T) = 0 a.s.
Statistical RoleData reduction without information lossUniqueness guarantee for optimal estimators
IndependenceNot necessarily complete (e.g., U(θ-1/2, θ+1/2))Not necessarily sufficient (depends on family)
Combined PowerWith completeness: enables Lehmann-Scheffé theoremWith sufficiency: constructs unique UMVUE

Practical Applications

UMVUE Construction
Use sufficient complete statistics to find unique optimal unbiased estimators
1
Identify sufficient complete statistic using factorization theorem
2
Find any unbiased estimator for the parameter
3
Apply Lehmann-Scheffé theorem: E[estimator|sufficient complete] = UMVUE
Variance Improvement
Apply Rao-Blackwell theorem to reduce estimator variance
1
Start with any unbiased estimator
2
Find sufficient statistic for the parameter
3
Compute conditional expectation given sufficient statistic
4
Result has smaller or equal variance
Independence Testing
Use Basu's theorem to establish independence properties
1
Identify sufficient complete statistic
2
Verify ancillary property of other statistic
3
Apply Basu's theorem to conclude independence
4
Use independence for further inference
Back to Mathematical Statistics