MathIsimple
Back to Mathematical Statistics
Advanced Theory
4-6 Hours

Sufficient & Complete Statistics

Master the theory of sufficient and complete statistics for optimal estimation

Learning Objectives
  • Master the concepts of sufficient statistics and their role in statistical inference
  • Understand the Fisher-Neyman Factorization Theorem and its applications
  • Learn about complete statistics and their importance in optimal estimation
  • Apply the Rao-Blackwell and Lehmann-Scheffé theorems in practice
  • Explore the relationship between sufficiency and completeness
  • Understand Basu's theorem and independence properties

Essential Definitions

Core concepts in sufficient and complete statistics

Sufficient Statistic

A statistic T(X̃) that contains all information about θ contained in the sample. Given T=t, the conditional distribution of X̃ is independent of θ.

P(X~=x~T=t;θ) is independent of θP(\tilde{X} = \tilde{x} \mid T = t; \theta) \text{ is independent of } \theta
Complete Statistic

A statistic T where the only unbiased function with zero expectation is the zero function (with probability 1).

Eθ[ϕ(T)]=0  θPθ(ϕ(T)=0)=1  θE_\theta[\phi(T)] = 0 \; \forall \theta \Rightarrow P_\theta(\phi(T) = 0) = 1 \; \forall \theta
Factorization Theorem

T(X̃) is sufficient for θ if and only if the joint density can be factored as p(x̃;θ) = g(T(x̃);θ)h(x̃).

p(x~;θ)=g(T(x~);θ)×h(x~)p(\tilde{x};\theta) = g(T(\tilde{x});\theta) \times h(\tilde{x})

Sufficient Statistics

Statistics that capture all parameter information from the sample

Concept of Sufficient Statistics
A sufficient statistic captures all the information about the parameter contained in the sample

Key Properties:

Simplifies inference without loss of information about the parameter
Reduces data dimensionality while preserving statistical properties
Forms the foundation for optimal estimation theory

Examples:

Binomial B(n,p): T=XiT = \sum X_i is sufficient for pp

Normal N(μ,σ²): T=(Xi,Xi2)T = (\sum X_i, \sum X_i^2) is sufficient for (μ,σ2)(\mu, \sigma^2)

Poisson P(λ): T=XiT = \sum X_i is sufficient for λ\lambda

Uniform U(0,θ): T=X(n)T = X_{(n)} is sufficient for heta heta

Fisher-Neyman Factorization Theorem
The fundamental criterion for identifying sufficient statistics

Theorem Statement:

T(X̃) is sufficient for θ if and only if the joint density/mass function can be written as:

p(x~;θ)=g(T(x~);θ)×h(x~)p(x̃;θ) = g(T(x̃);θ) × h(x̃)

Components:

1
g(T(x~);θ):dependsondataonlythroughT(x~)andonparameterθg(T(x̃);θ): depends on data only through T(x̃) and on parameter θ
2
h(x~):dependsondatax~butisindependentofparameterθh(x̃): depends on data x̃ but is independent of parameter θ

Detailed Examples:

Normal N(μ,σ²)
Joint Density:
p(x~;μ,σ2)=(2πσ2)n/2exp{(xiμ)2/(2σ2)}p(\tilde{x};\mu,\sigma^2) = (2\pi\sigma^2)^{-n/2} \exp\{-\sum(x_i-\mu)^2/(2\sigma^2)\}
Factorization:
g(T;μ,σ2)=(2πσ2)n/2exp{μxi/σ2nμ2/(2σ2)xi2/(2σ2)},h(x~)=1g(T;\mu,\sigma^2) = (2\pi\sigma^2)^{-n/2} \exp\{\mu\sum x_i/\sigma^2 - n\mu^2/(2\sigma^2) - \sum x_i^2/(2\sigma^2)\}, h(\tilde{x}) = 1
Sufficient Statistic:
T=(xi,xi2)T = (\sum x_i, \sum x_i^2)
Poisson P(λ)
Joint Density:
p(x~;λ)=λxienλ/(xi!)p(\tilde{x};\lambda) = \lambda^{\sum x_i} e^{-n\lambda} / \prod(x_i!)
Factorization:
g(T;λ)=λTenλ,h(x~)=1/(xi!)g(T;\lambda) = \lambda^T e^{-n\lambda}, h(\tilde{x}) = 1/\prod(x_i!)
Sufficient Statistic:
T=xiT = \sum x_i
Uniform U(0,θ)
Joint Density:
p(x~;θ)=θnI{0x(1)x(n)θ}p(\tilde{x};\theta) = \theta^{-n} I\{0 \leq x_{(1)} \leq x_{(n)} \leq \theta\}
Factorization:
g(T;θ)=θnI{Tθ},h(x~)=I{0x(1)T}g(T;\theta) = \theta^{-n} I\{T \leq \theta\}, h(\tilde{x}) = I\{0 \leq x_{(1)} \leq T\}
Sufficient Statistic:
T=X(n)T = X_{(n)}
Example: Finding Sufficient Statistic for Poisson Distribution

Problem:

Given a random sample X1,,XnPoisson(λ)X_1, \ldots, X_n \sim \text{Poisson}(\lambda), use the Factorization Theorem to find a sufficient statistic for λ\lambda.

Solution:

  1. Write the joint probability mass function:
    p(x;λ)=i=1nλxieλxi!p(\mathbf{x}; \lambda) = \prod_{i=1}^n \frac{\lambda^{x_i} e^{-\lambda}}{x_i!}
  2. Simplify the product:
    =λi=1nxienλi=1nxi!= \frac{\lambda^{\sum_{i=1}^n x_i} e^{-n\lambda}}{\prod_{i=1}^n x_i!}
  3. Factor into g and h:
    =λxienλg(T(x);λ)×1xi!h(x)= \underbrace{\lambda^{\sum x_i} e^{-n\lambda}}_{g(T(\mathbf{x}); \lambda)} \times \underbrace{\frac{1}{\prod x_i!}}_{h(\mathbf{x})}
    where T(x)=i=1nxiT(\mathbf{x}) = \sum_{i=1}^n x_i
  4. Conclusion: Since the joint PMF factors as g(T(x);λ)h(x)g(T(\mathbf{x}); \lambda) h(\mathbf{x}) where gg depends on the data only through T=XiT = \sum X_i and hh is independent of λ\lambda, by the Factorization Theorem, T=i=1nXiT = \sum_{i=1}^n X_i is sufficient for λ\lambda.

Key Insight:

For Poisson distributions, the sum of observations contains all information about λ\lambda. The individual values and their factorial terms don't provide additional information beyond the sum.

Rao-Blackwell Theorem
Demonstrates how sufficient statistics improve estimation efficiency

Theorem Statement:

If T is sufficient for θ and φ(X̃) is an unbiased estimator of g(θ), then:

g^(T)=E[ϕ(X~)T]\hat{g}(T) = E[\phi(\tilde{X})|T]

is also unbiased for g(θ) with

Varθ(g^(T))Varθ(ϕ(X~))\text{Var}_\theta(\hat{g}(T)) \leq \text{Var}_\theta(\phi(\tilde{X}))

Proof:

  1. Step 1 (Verify Unbiasedness): We first show that θ^=E[θ^T]\hat{\theta}^* = E[\hat{\theta} \mid T] is unbiased. Using the tower property of conditional expectation:
    E[θ^]=E[E[θ^T]]E[\hat{\theta}^*] = E[E[\hat{\theta} \mid T]]
    By the law of iterated expectations:
    E[E[θ^T]]=E[θ^]E[E[\hat{\theta} \mid T]] = E[\hat{\theta}]
    Since θ^\hat{\theta} is unbiased for θ\theta, we have E[θ^]=θE[\hat{\theta}] = \theta, thus:
    E[θ^]=θE[\hat{\theta}^*] = \theta
  2. Step 2 (Law of Total Variance): Recall the variance decomposition formula:
    Var(X)=E[Var(XY)]+Var(E[XY])\text{Var}(X) = E[\text{Var}(X \mid Y)] + \text{Var}(E[X \mid Y])
    Applying this to θ^\hat{\theta} conditioned on TT:
    Var(θ^)=E[Var(θ^T)]+Var(E[θ^T])\text{Var}(\hat{\theta}) = E[\text{Var}(\hat{\theta} \mid T)] + \text{Var}(E[\hat{\theta} \mid T])
  3. Step 3 (Substitute Improved Estimator): Recognize that by definition:
    E[θ^T]=θ^E[\hat{\theta} \mid T] = \hat{\theta}^*
    Substituting into the variance decomposition:
    Var(θ^)=E[Var(θ^T)]+Var(θ^)\text{Var}(\hat{\theta}) = E[\text{Var}(\hat{\theta} \mid T)] + \text{Var}(\hat{\theta}^*)
  4. Step 4 (Non-negativity of Conditional Variance): By fundamental properties of variance, conditional variance is always non-negative:
    Var(θ^T)0for all T\text{Var}(\hat{\theta} \mid T) \geq 0 \quad \text{for all } T
    Taking expectations on both sides:
    E[Var(θ^T)]0E[\text{Var}(\hat{\theta} \mid T)] \geq 0
  5. Step 5 (Derive Variance Inequality): From Step 3, rearrange to isolate Var(θ^)\text{Var}(\hat{\theta}^*):
    Var(θ^)=Var(θ^)E[Var(θ^T)]\text{Var}(\hat{\theta}^*) = \text{Var}(\hat{\theta}) - E[\text{Var}(\hat{\theta} \mid T)]
    Since E[Var(θ^T)]0E[\text{Var}(\hat{\theta} \mid T)] \geq 0 from Step 4:
    Var(θ^)Var(θ^)\text{Var}(\hat{\theta}^*) \leq \text{Var}(\hat{\theta})
  6. Step 6 (Characterize Equality): Equality holds when:
    E[Var(θ^T)]=0E[\text{Var}(\hat{\theta} \mid T)] = 0
    Since Var(θ^T)0\text{Var}(\hat{\theta} \mid T) \geq 0, this requires:
    Var(θ^T)=0almost surely\text{Var}(\hat{\theta} \mid T) = 0 \quad \text{almost surely}
  7. Step 7 (Zero Variance Implies Constant): A random variable with zero conditional variance is constant (given the conditioning variable):
    Var(θ^T)=0θ^=E[θ^T]=θ^\text{Var}(\hat{\theta} \mid T) = 0 \quad \Rightarrow \quad \hat{\theta} = E[\hat{\theta} \mid T] = \hat{\theta}^*
    This means θ^\hat{\theta} is already a function of the sufficient statistic TT alone.
  8. Step 8 (Conclusion): We have proven:
    E[θ^]=θandVar(θ^)Var(θ^)E[\hat{\theta}^*] = \theta \quad \text{and} \quad \text{Var}(\hat{\theta}^*) \leq \text{Var}(\hat{\theta})
    with equality if and only if θ^\hat{\theta} is already based on TT alone. \quad \blacksquare

Key Implications:

Sufficient statistics allow variance reduction without bias
Optimal unbiased estimators must be functions of sufficient statistics
Provides systematic method for improving estimators

Practical Example:

Setup: Binomial B(1,p) with sample X₁,...,Xₙ
Original Estimator: φ(X~)=X1withVar(X1)=p(1p)φ(X̃) = X₁ with Var(X₁) = p(1-p)
Sufficient Statistic: T=ΣXiissufficientforpT = ΣXᵢ is sufficient for p
Improved Estimator: g^(T)=E[X1T]=T/n=XˉwithVar(Xˉ)=p(1p)/nVar(X1)ĝ(T) = E[X₁|T] = T/n = X̄ with Var(X̄) = p(1-p)/n ≤ Var(X₁)
Example: Improving Estimator via Rao-Blackwell

Problem:

For X1,,XnExponential(λ)X_1, \ldots, X_n \sim \text{Exponential}(\lambda), start with λ^1=1/X1\hat{\lambda}_1 = 1/X_1 (unbiased). Use Rao-Blackwell to improve it with sufficient statistic T=XiT = \sum X_i.

Solution:

  1. Verify unbiasedness of initial estimator:
    E[1/X1]=01xλeλxdxE[1/X_1] = \int_0^\infty \frac{1}{x} \lambda e^{-\lambda x} dx
    Using integration by parts or direct calculation: E[1/X1]=λE[1/X_1] = \lambda (unbiased)
  2. Identify sufficient statistic: T=i=1nXiΓ(n,λ)T = \sum_{i=1}^n X_i \sim \Gamma(n, \lambda)
  3. Apply Rao-Blackwell:
    λ^=E[1/X1T]\hat{\lambda}^* = E[1/X_1 \mid T]
    By symmetry, X1,,XnX_1, \ldots, X_n are exchangeable given TT:
    E[1/XiT]=E[1/XjT] for all i,jE[1/X_i \mid T] = E[1/X_j \mid T] \text{ for all } i,j
  4. Use linearity:
    nE[1/X1T]=E[i=1n1/XiT]n \cdot E[1/X_1 \mid T] = E\left[\sum_{i=1}^n 1/X_i \mid T\right]
    The improved estimator is:
    λ^=1nE[1/XiT]\hat{\lambda}^* = \frac{1}{n} E\left[\sum 1/X_i \mid T\right]
  5. For exponential family: It can be shown that:
    λ^=nT=nXi=1Xˉ\hat{\lambda}^* = \frac{n}{T} = \frac{n}{\sum X_i} = \frac{1}{\bar{X}}
    This is the MLE!
  6. Variance comparison:
    Var(1/X1)=(infinite variance!)\text{Var}(1/X_1) = \infty \quad \text{(infinite variance!)}
    Var(1/Xˉ)=λ2n(finite, achieves CRLB)\text{Var}(1/\bar{X}) = \frac{\lambda^2}{n} \quad \text{(finite, achieves CRLB)}

Key Insight:

Rao-Blackwell transforms a crude unbiased estimator (with infinite variance!) into an efficient estimator (MLE). Always condition on sufficient statistics to improve estimators.

Complete Statistics

Statistics that ensure uniqueness of unbiased estimators

Complete Statistics
Statistics where unbiased functions with zero expectation must be zero functions

Intuitive Understanding:

Completeness ensures uniqueness of unbiased estimators based on the statistic

Eθ[ϕ(T)]=0 for all θΘ    Pθ(ϕ(T)=0)=1 for all θΘE_\theta[\phi(T)] = 0 \text{ for all } \theta \in \Theta \implies P_\theta(\phi(T) = 0) = 1 \text{ for all } \theta \in \Theta

Statistical Significance:

Guarantees uniqueness of UMVUE when combined with sufficiency
Eliminates non-zero functions with zero expectation
Essential for Lehmann-Scheffé theorem applications
Examples of Complete Statistics
Binomial B(n,p), 0 < p < 1
Statistic:T B(n,p)T ~ B(n,p)
Proof Outline:
If Ep[ϕ(T)]=ϕ(k)(nk)pk(1p)nk=0 for all p(0,1), substituting θ=p/(1p) gives polynomial ϕ(k)(nk)θk=0. Since polynomial is zero for all θ>0, all coefficients must be zero, so ϕ(k)=0.\text{If } E_p[\phi(T)] = \sum \phi(k)\binom{n}{k}p^k(1-p)^{n-k} = 0 \text{ for all } p \in (0,1), \text{ substituting } \theta = p/(1-p) \text{ gives polynomial } \sum \phi(k)\binom{n}{k}\theta^k = 0. \text{ Since polynomial is zero for all } \theta > 0, \text{ all coefficients must be zero, so } \phi(k) = 0.
Conclusion: T is complete
Normal N(μ,σ²), μ ∈ ℝ, σ > 0
Statistic:T=(ΣXi,ΣXi2)T = (ΣXᵢ, ΣXᵢ²)
Proof Outline:
undefinedundefined
Conclusion: T is complete
Uniform U(0,θ), θ > 0
Statistic:T=X(n)withdensitypT(t;θ)=nt(n1)/θn,0<t<θT = X₍ₙ₎ with density p_T(t;θ) = nt^(n-1)/θ^n, 0 < t < θ
Proof Outline:
If Eθ[ϕ(T)]=0θϕ(t)ntn1θndt=0 for all θ>0, differentiating with respect to θ gives ϕ(θ)θn1=0, so ϕ(t)=0 for all t>0.\text{If } E_\theta[\phi(T)] = \int_0^\theta \phi(t)\frac{nt^{n-1}}{\theta^n}dt = 0 \text{ for all } \theta > 0, \text{ differentiating with respect to } \theta \text{ gives } \phi(\theta)\theta^{n-1} = 0, \text{ so } \phi(t) = 0 \text{ for all } t > 0.
Conclusion: T is complete
Non-Complete Example: Normal N(0,σ²), σ > 0
Statistic:X1X₁
Counterexample:

φ(X1)=X1hasE[φ(X1)]=0forallσ2,butP(X1=0)=01φ(X₁) = X₁ has E[φ(X₁)] = 0 for all σ², but P(X₁ = 0) = 0 ≠ 1

Conclusion: X₁ is not complete (the family itself is not complete)
Example: Proving Completeness for Binomial Distribution

Problem:

Show that T=i=1nXiT = \sum_{i=1}^n X_i is complete for pp when X1,,XnBinomial(1,p)X_1, \ldots, X_n \sim \text{Binomial}(1, p) with 0<p<10 < p < 1.

Solution:

  1. Set up the expectation condition: Suppose Ep[ϕ(T)]=0E_p[\phi(T)] = 0 for all p(0,1)p \in (0, 1). We need to show Pp(ϕ(T)=0)=1P_p(\phi(T) = 0) = 1.
  2. Write the expectation explicitly:
    Ep[ϕ(T)]=k=0nϕ(k)(nk)pk(1p)nk=0E_p[\phi(T)] = \sum_{k=0}^n \phi(k) \binom{n}{k} p^k (1-p)^{n-k} = 0
    for all p(0,1)p \in (0, 1).
  3. Substitute parameter transformation: Let θ=p1p\theta = \frac{p}{1-p}, so p=θ1+θp = \frac{\theta}{1+\theta} and 1p=11+θ1-p = \frac{1}{1+\theta}.
  4. Rewrite the expectation:
    Ep[ϕ(T)]=k=0nϕ(k)(nk)(θ1+θ)k(11+θ)nkE_p[\phi(T)] = \sum_{k=0}^n \phi(k) \binom{n}{k} \left(\frac{\theta}{1+\theta}\right)^k \left(\frac{1}{1+\theta}\right)^{n-k}
    =1(1+θ)nk=0nϕ(k)(nk)θk=0= \frac{1}{(1+\theta)^n} \sum_{k=0}^n \phi(k) \binom{n}{k} \theta^k = 0
  5. Since the denominator is positive:
    k=0nϕ(k)(nk)θk=0\sum_{k=0}^n \phi(k) \binom{n}{k} \theta^k = 0
    for all θ>0\theta > 0.
  6. Polynomial argument: The left-hand side is a polynomial in θ\theta of degree at most nn. If this polynomial is zero for all θ>0\theta > 0, then all its coefficients must be zero:
    ϕ(k)(nk)=0for all k=0,1,,n\phi(k) \binom{n}{k} = 0 \quad \text{for all } k = 0, 1, \ldots, n
  7. Since binomial coefficients are positive:
    ϕ(k)=0for all k=0,1,,n\phi(k) = 0 \quad \text{for all } k = 0, 1, \ldots, n
  8. Conclusion: Therefore Pp(ϕ(T)=0)=1P_p(\phi(T) = 0) = 1 for all p(0,1)p \in (0, 1), which proves that TT is complete.

Key Insight:

The completeness proof relies on the fact that a polynomial that is identically zero must have all zero coefficients. This technique works for many discrete exponential family distributions.

Lehmann-Scheffé Theorem
Core theorem for constructing unique UMVUE using sufficient complete statistics

Theorem Statement:

If S is a sufficient complete statistic for θ and φ(X̃) is an unbiased estimator of g(θ), then:

g^=E[ϕ(X~)S] is the unique UMVUE of g(θ)\hat{g} = E[\phi(\tilde{X})|S] \text{ is the unique UMVUE of } g(\theta)

Proof:

  1. Step 1 (Strategy): Suppose θ~\tilde{\theta} is any other unbiased estimator of θ\theta. We will show that Var(θ^)Var(θ~)\text{Var}(\hat{\theta}) \leq \text{Var}(\tilde{\theta}) with equality only when θ~=θ^\tilde{\theta} = \hat{\theta}.
  2. Step 2 (Apply Rao-Blackwell): By Rao-Blackwell theorem, define:
    θ~=E[θ~T]\tilde{\theta}^* = E[\tilde{\theta} \mid T]
    Then θ~\tilde{\theta}^* is also unbiased and:
    Var(θ~)Var(θ~)\text{Var}(\tilde{\theta}^*) \leq \text{Var}(\tilde{\theta})
  3. Step 3 (Function of Sufficient Statistic): Since θ~=E[θ~T]\tilde{\theta}^* = E[\tilde{\theta} \mid T], it is a function of TT alone, say:
    θ~=h(T)\tilde{\theta}^* = h(T)
    for some function hh.
  4. Step 4 (Both are Unbiased Functions of T): We now have two unbiased estimators based on TT:
    E[θ^]=E[g(T)]=θE[\hat{\theta}] = E[g(T)] = \theta
    E[θ~]=E[h(T)]=θE[\tilde{\theta}^*] = E[h(T)] = \theta
  5. Step 5 (Use Completeness): Consider their difference:
    E[θ^θ~]=E[g(T)h(T)]=θθ=0E[\hat{\theta} - \tilde{\theta}^*] = E[g(T) - h(T)] = \theta - \theta = 0
    Since TT is complete and g(T)h(T)g(T) - h(T) is a function of TT with expectation zero:
    P(g(T)h(T)=0)=1P(g(T) - h(T) = 0) = 1
    Therefore: θ^=θ~\hat{\theta} = \tilde{\theta}^* almost surely.
  6. Step 6 (Conclude Uniqueness): Since θ^=θ~\hat{\theta} = \tilde{\theta}^* and Var(θ~)Var(θ~)\text{Var}(\tilde{\theta}^*) \leq \text{Var}(\tilde{\theta}):
    Var(θ^)=Var(θ~)Var(θ~)\text{Var}(\hat{\theta}) = \text{Var}(\tilde{\theta}^*) \leq \text{Var}(\tilde{\theta})
    This holds for any unbiased estimator θ~\tilde{\theta}, so θ^\hat{\theta} has minimum variance among all unbiased estimators.
  7. Step 7 (Uniqueness of UMVUE): If there were another UMVUE θ~\tilde{\theta}', the same argument shows:
    E[θ~T]=θ^E[\tilde{\theta}' \mid T] = \hat{\theta}
    By completeness: θ~=θ^\tilde{\theta}' = \hat{\theta} almost surely. Thus the UMVUE is unique. \quad \blacksquare

Corollary:

If h(S) is a function of sufficient complete statistic S with E_θ[h(S)] = g(θ), then h(S) is the unique UMVUE of g(θ)

Applications:

Normal N(μ,σ²)
Sufficient Complete: S=(Xi,Xi2)S = (\sum X_i, \sum X_i^2)
Parameter: μ\mu
UMVUE: Xˉ=Xin\bar{X} = \frac{\sum X_i}{n}
Verification: E[Xˉ]=μE[\bar{X}] = \mu and X̄ is function of S
Normal N(μ,σ²)
Sufficient Complete: S=(Xi,Xi2)S = (\sum X_i, \sum X_i^2)
Parameter: σ2\sigma^2
UMVUE: S2=(XiXˉ)2n1S^2 = \frac{\sum(X_i - \bar{X})^2}{n-1}
Verification: E[S2]=σ2E[S^2] = \sigma^2 and S² is function of S
Poisson P(λ)
Sufficient Complete: S=XiS = \sum X_i
Parameter: λ\lambda
UMVUE: Xˉ=Sn\bar{X} = \frac{S}{n}
Verification: E[Xˉ]=λE[\bar{X}] = \lambda and X̄ is function of S
Example: Constructing UMVUE for Normal Variance

Problem:

For X1,,XnN(μ,σ2)X_1, \ldots, X_n \sim N(\mu, \sigma^2) with both parameters unknown, find the UMVUE for σ2\sigma^2.

Solution:

  1. Identify sufficient complete statistic: For normal distribution, T=(Xi,Xi2)T = (\sum X_i, \sum X_i^2) is sufficient and complete for (μ,σ2)(\mu, \sigma^2).
  2. Find an unbiased estimator: Consider σ^12=X12X1X2\hat{\sigma}^2_1 = X_1^2 - X_1 X_2. We check:
    E[X12]=Var(X1)+[E(X1)]2=σ2+μ2E[X_1^2] = \text{Var}(X_1) + [E(X_1)]^2 = \sigma^2 + \mu^2
    E[X1X2]=E[X1]E[X2]=μ2E[X_1 X_2] = E[X_1]E[X_2] = \mu^2
    Therefore: E[σ^12]=(σ2+μ2)μ2=σ2E[\hat{\sigma}^2_1] = (\sigma^2 + \mu^2) - \mu^2 = \sigma^2 (unbiased)
  3. Apply Lehmann-Scheffé: The UMVUE is:
    σ^UMVUE2=E[σ^12T]\hat{\sigma}^2_{UMVUE} = E[\hat{\sigma}^2_1 \mid T]
  4. Compute conditional expectation: By symmetry and properties of normal distribution:
    E[X12T]=E[Xi2T]=1ni=1nE[Xi2T]E[X_1^2 \mid T] = E[X_i^2 \mid T] = \frac{1}{n} \sum_{i=1}^n E[X_i^2 \mid T]
    =1nE[Xi2T]=1nXi2= \frac{1}{n} E\left[\sum X_i^2 \mid T\right] = \frac{1}{n} \sum X_i^2
  5. Similarly for cross terms:
    E[X1X2T]=1n(n1)ijE[XiXjT]E[X_1 X_2 \mid T] = \frac{1}{n(n-1)} \sum_{i \neq j} E[X_i X_j \mid T]
    =1n(n1)[(Xi)2Xi2]= \frac{1}{n(n-1)} \left[\left(\sum X_i\right)^2 - \sum X_i^2\right]
  6. Combine results:
    σ^UMVUE2=1nXi21n(n1)[(Xi)2Xi2]\hat{\sigma}^2_{UMVUE} = \frac{1}{n} \sum X_i^2 - \frac{1}{n(n-1)} \left[\left(\sum X_i\right)^2 - \sum X_i^2\right]
    =1n1[Xi21n(Xi)2]= \frac{1}{n-1} \left[\sum X_i^2 - \frac{1}{n}\left(\sum X_i\right)^2\right]
    =1n1(XiXˉ)2=S2= \frac{1}{n-1} \sum (X_i - \bar{X})^2 = S^2
  7. Verification: We know E[S2]=σ2E[S^2] = \sigma^2 and S2S^2 is a function of the sufficient complete statistic TT. Therefore, S2S^2 is the unique UMVUE for σ2\sigma^2.

Key Insight:

The sample variance S2S^2 is not just an unbiased estimator—it's the unique UMVUE. This demonstrates the power of Lehmann-Scheffé theorem in identifying optimal estimators.

Basu's Theorem
Establishes independence between sufficient complete statistics and ancillary statistics

Theorem Statement:

If T is a sufficient complete statistic for θ and V is an ancillary statistic (distribution independent of θ), then T and V are independent for all θ ∈ Θ

Key Concepts:

Ancillary statistic: distribution does not depend on θ
Independence follows from sufficiency and completeness
Useful for proving independence in specific problems

Example Application:

Setup: Normal N(μ,σ²) with sample X₁,...,Xₙ
Sufficient Complete: (Xˉ,S2)(\bar{X}, S^2) is sufficient complete for (μ,σ2)(\mu, \sigma^2)
Sample skewness: Skew=n(XiXˉ)3((XiXˉ)2)3/2\text{Skew} = \frac{\sqrt{n} \sum(X_i - \bar{X})^3}{(\sum(X_i - \bar{X})^2)^{3/2}}
Reasoning: Skewness distribution is invariant under location-scale transformations Yᵢ = (Xᵢ-μ)/σ
Conclusion: (X̄, S²) and Skew are independent by Basu's theorem

Rigorous Theorem Proofs

Step-by-step mathematical derivations of fundamental theorems

Fisher-Neyman Factorization Theorem
Fundamental Criterion for Sufficiency

Let X have pdf/pmf f(x|θ). A statistic T(X) is sufficient for θ if and only if f(x|θ) = g(T(x)|θ)h(x) for some functions g and h.

Theorem Statement

f(xθ)=g(T(x)θ)h(x)xX,θΘf(x|\theta) = g(T(x)|\theta) h(x) \quad \forall x \in \mathcal{X}, \theta \in \Theta

This theorem allows us to find sufficient statistics by simple inspection of the density function.

Proof Steps (Discrete Case)

1
Discrete Case - Sufficiency (⇒)

Assume factorization holds. We show T is sufficient by computing conditional probability.

P(X=xT=t)=P(X=x,T=t)P(T=t)P(X=x|T=t) = \frac{P(X=x, T=t)}{P(T=t)}
2
Substitute Factorization

If T(x) ≠ t, probability is 0. If T(x)=t, substitute f(x|θ) = g(t|θ)h(x).

P(X=xT=t)=g(tθ)h(x)y:T(y)=tg(tθ)h(y)P(X=x|T=t) = \frac{g(t|\theta)h(x)}{\sum_{y:T(y)=t} g(t|\theta)h(y)}
3
Cancel Parameter Dependence

The term g(t|θ) factors out of the sum and cancels with the numerator.

P(X=xT=t)=g(tθ)h(x)g(tθ)y:T(y)=th(y)=h(x)y:T(y)=th(y)P(X=x|T=t) = \frac{g(t|\theta)h(x)}{g(t|\theta)\sum_{y:T(y)=t} h(y)} = \frac{h(x)}{\sum_{y:T(y)=t} h(y)}
4
Conclusion for Sufficiency

The result depends only on x and h(x), not on θ. Thus, T is sufficient.

P(X=xT=t) is independent of θP(X=x|T=t) \text{ is independent of } \theta
5
Discrete Case - Necessity (⇐)

Assume T is sufficient. Then P(X=x|T=t) is independent of θ. Let this be k(x,t).

f(xθ)=P(X=xθ)=P(X=x,T=T(x)θ)f(x|\theta) = P(X=x|\theta) = P(X=x, T=T(x)|\theta)
6
Construct Functions

Write joint prob as conditional × marginal. Define g(t|θ) = P(T=t|θ) and h(x) = P(X=x|T=T(x)).

f(xθ)=P(T=T(x)θ)P(X=xT=T(x))=g(T(x)θ)h(x)f(x|\theta) = P(T=T(x)|\theta) P(X=x|T=T(x)) = g(T(x)|\theta)h(x)

Example Application

ForPoisson(λ),f(xλ)=enλλxi/(xi!)=[enλλxi]g(Tλ)×[1/(xi!)]h(x). Thus T=Xi is sufficient.For Poisson(\lambda), f(\mathbf{x}|\lambda) = e^{-n\lambda}\lambda^{\sum x_i}/(\prod x_i!) = [e^{-n\lambda}\lambda^{\sum x_i}]_{g(T|\lambda)} \times [1/(\prod x_i!)]_{h(\mathbf{x})}. \text{ Thus } T=\sum X_i \text{ is sufficient.}
Basu's Theorem
Independence of Statistics

If T is a complete sufficient statistic for θ, and V is an ancillary statistic, then T and V are independent.

Theorem Statement

T complete sufficient,V ancillary    TVT \text{ complete sufficient}, V \text{ ancillary} \implies T \perp V

Powerful tool for proving independence without finding joint distributions.

Proof Steps

1
Define Conditional Probability

Let A be any event involving V (e.g., V ∈ B). Let η(t) = P(V ∈ B | T=t).

P(VBT=t)=η(t)P(V \in B | T=t) = \eta(t)
2
Use Sufficiency

Since T is sufficient, the conditional distribution of X given T is independent of θ. Since V is a function of X, its conditional distribution given T is also independent of θ.

η(t) does not depend on θ\eta(t) \text{ does not depend on } \theta
3
Compute Expectation

Consider the expectation of η(T) over T. By law of iterated expectations:

E[η(T)]=E[P(VBT)]=P(VB)E[\eta(T)] = E[P(V \in B | T)] = P(V \in B)
4
Use Ancillarity

Since V is ancillary, P(V ∈ B) is a constant c independent of θ.

E[η(T)]=cE[\eta(T)] = c
5
Construct Zero-Mean Function

Consider the function g(T) = η(T) - c. Its expectation is E[η(T) - c] = c - c = 0 for all θ.

Eθ[η(T)c]=0θE_\theta[\eta(T) - c] = 0 \quad \forall \theta
6
Apply Completeness

Since T is complete, g(T) must be zero almost surely. Thus η(T) = c a.s.

P(VBT)=P(VB)    TVP(V \in B | T) = P(V \in B) \implies T \perp V

Example Application

In N(μ,σ2),Xˉ is complete sufficient for μ (fixed σ),S2 is ancillary for μ. Thus XˉS2.\text{In } N(\mu, \sigma^2), \bar{X} \text{ is complete sufficient for } \mu \text{ (fixed } \sigma), S^2 \text{ is ancillary for } \mu. \text{ Thus } \bar{X} \perp S^2.
Sufficiency vs Completeness: Key Differences
AspectSufficient StatisticsComplete Statistics
Core PurposeCaptures all parameter information from sample data\text{Captures all parameter information from sample data}Ensures uniqueness of unbiased functions\text{Ensures uniqueness of unbiased functions}
Mathematical CriterionFactorization theorem: p(x~;θ)=g(T(x~);θ)h(x~)\text{Factorization theorem: } p(\tilde{x};\theta) = g(T(\tilde{x});\theta)h(\tilde{x})Zero unbiased functions are zero: E[ϕ(T)]=0    ϕ(T)=0 a.s.\text{Zero unbiased functions are zero: } E[\phi(T)] = 0 \implies \phi(T) = 0 \text{ a.s.}
Statistical RoleData reduction without information loss\text{Data reduction without information loss}Uniqueness guarantee for optimal estimators\text{Uniqueness guarantee for optimal estimators}
IndependenceNot necessarily complete (e.g., U(θ1/2,θ+1/2))\text{Not necessarily complete (e.g., } U(\theta-1/2, \theta+1/2)\text{)}Not necessarily sufficient (depends on family)\text{Not necessarily sufficient (depends on family)}
Combined PowerWith completeness: enables Lehmann-Scheffeˊ theorem\text{With completeness: enables Lehmann-Scheffé theorem}With sufficiency: constructs unique UMVUE\text{With sufficiency: constructs unique UMVUE}

Practical Applications

How to apply sufficient and complete statistics in practice

UMVUE Construction
Use sufficient complete statistics to find unique optimal unbiased estimators
1
Identify sufficient complete statistic using factorization theorem
2
Find any unbiased estimator for the parameter
3
Apply Lehmann-Scheffé theorem: E[estimator|sufficient complete] = UMVUE
Variance Improvement
Apply Rao-Blackwell theorem to reduce estimator variance
1
Start with any unbiased estimator
2
Find sufficient statistic for the parameter
3
Compute conditional expectation given sufficient statistic
4
Result has smaller or equal variance
Independence Testing
Use Basu's theorem to establish independence properties
1
Identify sufficient complete statistic
2
Verify ancillary property of other statistic
3
Apply Basu's theorem to conclude independence
4
Use independence for further inference

Frequently Asked Questions

Common questions about sufficient and complete statistics

What is the intuitive understanding of Sufficient Statistics?
Intuitively, a sufficient statistic is like "lossless compression". It compresses the raw data into a simpler value (the statistic), but in this process, no information about the unknown parameter θ is lost. If you have the sufficient statistic, the original data is redundant for inferring θ.
Key Point: Lossless compression of data information
How to determine if a statistic is sufficient?
The most commonly used method is the **Factorization Theorem**. If the joint density function can be factored into two parts: one containing only the statistic T and parameter θ, and another containing only the sample x (independent of θ), then T is sufficient. The definition method (conditional distribution independent of θ) is usually difficult to verify directly.
f(xθ)=g(T(x)θ)h(x)f(x|\theta) = g(T(x)|\theta)h(x)
What is a Complete Statistic and why is it important?
Completeness is a uniqueness requirement: if a function g(T) has expectation 0 for all parameter values, then the function itself must be 0 almost surely. Completeness ensures uniqueness in inference based on sufficient statistics, which is used in the Lehmann-Scheffé theorem to determine UMVUE (Uniformly Minimum Variance Unbiased Estimator).
E[g(T)]=0  θ    P(g(T)=0)=1E[g(T)] = 0 \; \forall\theta \implies P(g(T)=0)=1
Comparison: Sufficiency ensures no information loss, completeness ensures estimator uniqueness
Is a sufficient statistic unique?
No. If T is a sufficient statistic, then any one-to-one function of T is also sufficient. For example, if ΣXᵢ is sufficient, then X̄ is also sufficient. We usually focus on the **Minimal Sufficient Statistic**, which has the highest degree of compression among all sufficient statistics.
Key Point: Minimal sufficient is the most compressed form
What is an Ancillary Statistic?
An ancillary statistic is a statistic whose distribution does not depend on the parameter θ. Although it contains no information about θ itself, it is often combined with sufficient statistics (as in Basu's theorem) to prove independence, or used as auxiliary information in conditional inference.
Example: Sample variance S2 is ancillary for μ in N(μ,1)\text{Sample variance } S^2 \text{ is ancillary for } \mu \text{ in } N(\mu, 1)
What are the practical applications of Basu's Theorem?
Basu's theorem is very powerful. It states: if T is a complete sufficient statistic and V is an ancillary statistic, then T and V are independent. This conclusion simplifies many derivations, such as proving that the sample mean and sample variance are independent under normal distribution, without needing to compute complex joint distributions.
Key Point: Complete Sufficient ⊥ Ancillary
What does the Rao-Blackwell Theorem tell us?
The Rao-Blackwell theorem shows that the conditional expectation based on a sufficient statistic can improve any unbiased estimator. Specifically, if δ(X) is an unbiased estimate of θ, then E[δ(X)|T] is still unbiased, but its variance is always no greater than that of δ(X). This is the "dividend" brought by sufficiency.
Var(E[δT])Var(δ)\text{Var}(E[\delta|T]) \leq \text{Var}(\delta)
What is the relationship between Lehmann-Scheffé Theorem and UMVUE?
This theorem combines Rao-Blackwell and completeness, providing a clear method to find UMVUE (Uniformly Minimum Variance Unbiased Estimator): if an unbiased estimator depends only on a complete sufficient statistic, then it is the UMVUE. This is one of the most important results in parameter estimation theory.
Key Point: Complete Sufficient → Uniqueness of UMVUE
Back to Mathematical Statistics