MathIsimple
Point Estimation
5-7 Hours

Point Estimation Theory

Master parameter estimation methods and their optimality properties

Estimation Methods

Method of Moments (MOM)
Equate sample moments to population moments

The Method of Moments estimates parameters by setting sample moments equal to population moments and solving for parameters.

μk=E[Xk],an,k=1ni=1nXik\mu_k = E[X^k], \quad a_{n,k} = \frac{1}{n}\sum_{i=1}^n X_i^k

Population Moment

μk=E[Xk]\mu_k = E[X^k]

Sample Moment

an,k=1ni=1nXika_{n,k} = \frac{1}{n}\sum_{i=1}^n X_i^k
Maximum Likelihood Estimation (MLE)
Find parameters that maximize the probability of observed data

MLE finds the parameter value that makes the observed data most likely. It's the gold standard for point estimation due to optimal asymptotic properties.

L(θ;x)=i=1nf(xi;θ),(θ)=logL(θ)L(\theta; x) = \prod_{i=1}^n f(x_i; \theta), \quad \ell(\theta) = \log L(\theta)

Likelihood Function

L(θ)=i=1nf(xi;θ)L(\theta) = \prod_{i=1}^n f(x_i;\theta)

Score Function

S(θ)=θ=0S(\theta) = \frac{\partial \ell}{\partial \theta} = 0

Evaluation Criteria

Key Properties of Estimators

Unbiasedness

E[θ^]=θE[\hat{\theta}] = \theta

Efficiency

Var(θ^) is minimal\text{Var}(\hat{\theta}) \text{ is minimal}

Consistency

θ^nPθ\hat{\theta}_n \xrightarrow{P} \theta

Mean Squared Error

MSE=E[(θ^θ)2]\text{MSE} = E[(\hat{\theta} - \theta)^2]

Bias-Variance Decomposition:

MSE(θ^)=Var(θ^)+[Bias(θ^)]2\text{MSE}(\hat{\theta}) = \text{Var}(\hat{\theta}) + [\text{Bias}(\hat{\theta})]^2

Cramér-Rao Lower Bound

The CRLB Theorem

For any unbiased estimator θ^\hat{\theta} of θ\theta, the variance satisfies:

Var(θ^)1nI(θ)\text{Var}(\hat{\theta}) \geq \frac{1}{nI(\theta)}

Fisher Information

I(θ)=E[(logfθ)2]I(\theta) = E\left[\left(\frac{\partial \log f}{\partial \theta}\right)^2\right]

Efficiency

e(θ^)=1/(nI(θ))Var(θ^)e(\hat{\theta}) = \frac{1/(nI(\theta))}{\text{Var}(\hat{\theta})}

Theorems & Proofs

Cramér-Rao Inequality Proof
For unbiased estimators, variance is bounded below by Fisher information

Theorem:

Var(θ^)1nI(θ)\text{Var}(\hat{\theta}) \geq \frac{1}{nI(\theta)}
1

Define score function S(θ)=θlogL(θ)S(\theta) = \frac{\partial}{\partial \theta} \log L(\theta). Note: E[S(θ)]=0E[S(\theta)] = 0

2

Since θ^\hat{\theta} is unbiased, E[θ^S(θ)]=1E[\hat{\theta} \cdot S(\theta)] = 1

3

Apply Cauchy-Schwarz: 1Var(θ^)nI(θ)1 \leq \text{Var}(\hat{\theta}) \cdot nI(\theta)

4

Rearranging gives: Var(θ^)1nI(θ)\text{Var}(\hat{\theta}) \geq \frac{1}{nI(\theta)} \quad \blacksquare

Rao-Blackwell Theorem
Improving estimators using sufficient statistics

Theorem:

Let θ^\hat{\theta} be an unbiased estimator and TT a sufficient statistic. Define:

θ^=E[θ^T]\hat{\theta}^* = E[\hat{\theta} \mid T]

Then θ^\hat{\theta}^* is also unbiased and Var(θ^)Var(θ^)\text{Var}(\hat{\theta}^*) \leq \text{Var}(\hat{\theta}).

1

By law of iterated expectations: E[θ^]=E[E[θ^T]]=E[θ^]=θE[\hat{\theta}^*] = E[E[\hat{\theta} \mid T]] = E[\hat{\theta}] = \theta

2

By law of total variance: Var(θ^)=E[Var(θ^T)]+Var(θ^)\text{Var}(\hat{\theta}) = E[\text{Var}(\hat{\theta} \mid T)] + \text{Var}(\hat{\theta}^*)

3

Since E[Var(θ^T)]0E[\text{Var}(\hat{\theta} \mid T)] \geq 0, we have Var(θ^)Var(θ^)\text{Var}(\hat{\theta}^*) \leq \text{Var}(\hat{\theta}) \quad \blacksquare

Lehmann-Scheffé Theorem
Completeness + Sufficiency + Unbiasedness yields unique UMVUE

Theorem:

Let TT be a complete sufficient statistic for θ\theta. If θ^=g(T)\hat{\theta} = g(T) is an unbiased estimator based solely on TT, then θ^\hat{\theta} is the unique UMVUE.

1

For any other unbiased estimator θ~\tilde{\theta}, apply Rao-Blackwell: θ~=E[θ~T]\tilde{\theta}^* = E[\tilde{\theta} \mid T]

2

By completeness: E[g(T)h(T)]=0g(T)=h(T)E[g(T) - h(T)] = 0 \Rightarrow g(T) = h(T) almost surely

3

Therefore θ^\hat{\theta} has minimum variance among all unbiased estimators and is unique. \quad \blacksquare

Examples

1
Example: Exponential Distribution MOM

Problem: Given a sample from Exp(λ)\text{Exp}(\lambda), find the Method of Moments estimator for λ\lambda.

Solution:

Population first moment: μ1=E[X]=1λ\mu_1 = E[X] = \frac{1}{\lambda}

Sample first moment: an,1=Xˉa_{n,1} = \bar{X}

Set equal: 1λ=Xˉ\frac{1}{\lambda} = \bar{X}

λ^MOM=1Xˉ\hat{\lambda}_{\text{MOM}} = \frac{1}{\bar{X}}
2
Example: Normal Distribution MLE

Problem: Find the MLE of μ\mu and σ2\sigma^2 for sample from N(μ,σ2)N(\mu, \sigma^2).

Solution:

Log-likelihood: =n2log(2πσ2)12σ2(xiμ)2\ell = -\frac{n}{2}\log(2\pi\sigma^2) - \frac{1}{2\sigma^2}\sum(x_i-\mu)^2

Differentiate w.r.t. μ\mu: μ=1σ2(xiμ)=0\frac{\partial \ell}{\partial \mu} = \frac{1}{\sigma^2}\sum(x_i - \mu) = 0

Solve: μ^=Xˉ\hat{\mu} = \bar{X}

Differentiate w.r.t. σ2\sigma^2 and solve: σ^2=1n(XiXˉ)2\hat{\sigma}^2 = \frac{1}{n}\sum(X_i - \bar{X})^2

3
Example: Poisson MLE and CRLB

Problem: Given X1,,XnP(λ)X_1, \ldots, X_n \sim P(\lambda) i.i.d., find the MLE of λ\lambda and verify it achieves the CRLB.

Solution:

Log-likelihood: (λ)=(xi)logλnλconst\ell(\lambda) = (\sum x_i) \log \lambda - n\lambda - \text{const}

Score: S(λ)=xiλn=0S(\lambda) = \frac{\sum x_i}{\lambda} - n = 0

MLE: λ^=Xˉ\hat{\lambda} = \bar{X}

Fisher information: I(λ)=1/λI(\lambda) = 1/\lambda

CRLB: Var(λ^)λn\text{Var}(\hat{\lambda}) \geq \frac{\lambda}{n}

Actual variance: Var(Xˉ)=λn\text{Var}(\bar{X}) = \frac{\lambda}{n} - achieves CRLB exactly!

Practice Quiz

Test your understanding with 10 multiple-choice questions

Practice Quiz
10
Questions
0
Correct
0%
Accuracy
1
For a normal distribution XN(μ,σ2)X \sim N(\mu,\sigma^2) with known σ2\sigma^2, the Fisher information I(μ)I(\mu) for the population mean μ\mu is:
Not attempted
2
For a sample X1,X2,,XnB(1,p)X_1, X_2, \ldots, X_n \sim B(1,p), the UMVUE of pp is:
Not attempted
3
For a uniform distribution U(a,b)U(a,b), the method of moments estimator for parameter aa is:
Not attempted
4
For a normal distribution N(μ,σ2)N(\mu,\sigma^2) with known MLE σ^2=Sn2=1n(XiXˉ)2\hat{\sigma}^2 = S_n^2 = \frac{1}{n}\sum(X_i-\bar{X})^2, the MLE of σ\sigma is:
Not attempted
5
For a Poisson distribution P(λ)P(\lambda) with sample X1,X2,,XnX_1, X_2, \ldots, X_n, the Cramér-Rao lower bound for unbiased estimators of λ\lambda is:
Not attempted
6
Which statistic is a sufficient complete statistic for the normal distribution N(μ,σ2)N(\mu,\sigma^2)?
Not attempted
7
The correct decomposition of Mean Squared Error (MSE) is:
Not attempted
8
In the linear regression model Yi=β0+β1xi+εiY_i = \beta_0 + \beta_1 x_i + \varepsilon_i, the least squares estimator for β1\beta_1 is:
Not attempted
9
For uniform distribution U(0,θ)U(0,\theta) with MLE θ^=X(n)\hat{\theta} = X_{(n)} (maximum order statistic), the consistency property is:
Not attempted
10
An estimator g^\hat{g} that achieves the Cramér-Rao lower bound is called:
Not attempted

Frequently Asked Questions

When should I use MLE vs. Method of Moments?

Use MLE when you need optimal asymptotic properties and can compute the likelihood. Use Method of Moments for quick estimates, complex likelihoods, or as starting values for iterative MLE. MLE is generally preferred for its efficiency and invariance property.

What does it mean for an estimator to be "efficient"?

An estimator is efficient if it achieves the Cramér-Rao lower bound: Var(θ^)=1/(nI(θ))\text{Var}(\hat{\theta}) = 1/(nI(\theta)). This means no other unbiased estimator has lower variance. MLE is asymptotically efficient under regularity conditions.

Why is sample variance divided by n-1 instead of n?

Dividing by n1n-1 makes the estimator unbiased: E[S2]=σ2E[S^2] = \sigma^2. We "lose one degree of freedom" because we estimate the mean Xˉ\bar{X} from the same data. The MLE uses nn (biased) but the bias vanishes for large samples.

What's the difference between consistency and unbiasedness?

Unbiasedness (E[θ^]=θE[\hat{\theta}] = \theta) is a finite-sample property: on average across repeated samples of size nn, the estimate equals the true value. Consistency (θ^nθ\hat{\theta}_n \to \theta) is an asymptotic property: as nn \to \infty, the estimate converges to the true value.

How do I compute Fisher information?

Two equivalent methods: (1) I(θ)=E[(logf/θ)2]I(\theta) = E[(\partial \log f/\partial \theta)^2] - expected squared score, or (2) I(θ)=E[2logf/θ2]I(\theta) = -E[\partial^2 \log f/\partial \theta^2] - negative expected Hessian. Often method (2) is easier. For nn i.i.d. observations, total information is nI(θ)nI(\theta).

Ask AI ✨
MathIsimple – Simple, Friendly Math Tools & Learning