MathIsimple – Simple, Friendly Math Tools & Learning

Estimation Methods

Method of Moments (MOM)

Equate sample moments to population moments

The Method of Moments estimates parameters by setting sample moments equal to population moments and solving for parameters.

\mu_k = E[X^k], \quad a_{n,k} = \frac{1}{n}\sum_{i=1}^n X_i^k

Population Moment

\mu_k = E[X^k]

Sample Moment

a_{n,k} = \frac{1}{n}\sum_{i=1}^n X_i^k

Maximum Likelihood Estimation (MLE)

Find parameters that maximize the probability of observed data

MLE finds the parameter value that makes the observed data most likely. It's the gold standard for point estimation due to optimal asymptotic properties.

L(\theta; x) = \prod_{i=1}^n f(x_i; \theta), \quad \ell(\theta) = \log L(\theta)

Likelihood Function

L(\theta) = \prod_{i=1}^n f(x_i;\theta)

Score Function

S(\theta) = \frac{\partial \ell}{\partial \theta} = 0

Evaluation Criteria

Key Properties of Estimators

Unbiasedness

E[\hat{\theta}] = \theta

Efficiency

\text{Var}(\hat{\theta}) \text{ is minimal}

Consistency

\hat{\theta}_n \xrightarrow{P} \theta

Mean Squared Error

\text{MSE} = E[(\hat{\theta} - \theta)^2]

Bias-Variance Decomposition:

\text{MSE}(\hat{\theta}) = \text{Var}(\hat{\theta}) + [\text{Bias}(\hat{\theta})]^2

Cramér-Rao Lower Bound

The CRLB Theorem

For any unbiased estimator $\hat{\theta}$ of $\theta$ , the variance satisfies:

\text{Var}(\hat{\theta}) \geq \frac{1}{nI(\theta)}

Fisher Information

I(\theta) = E\left[\left(\frac{\partial \log f}{\partial \theta}\right)^2\right]

Efficiency

e(\hat{\theta}) = \frac{1/(nI(\theta))}{\text{Var}(\hat{\theta})}

Theorems & Proofs

Cramér-Rao Inequality Proof

For unbiased estimators, variance is bounded below by Fisher information

Theorem:

\text{Var}(\hat{\theta}) \geq \frac{1}{nI(\theta)}

1

Define score function $S(\theta) = \frac{\partial}{\partial \theta} \log L(\theta)$ . Note: $E[S(\theta)] = 0$

2

Since $\hat{\theta}$ is unbiased, $E[\hat{\theta} \cdot S(\theta)] = 1$

3

Apply Cauchy-Schwarz: $1 \leq \text{Var}(\hat{\theta}) \cdot nI(\theta)$

4

Rearranging gives: $\text{Var}(\hat{\theta}) \geq \frac{1}{nI(\theta)} \quad \blacksquare$

Rao-Blackwell Theorem

Improving estimators using sufficient statistics

Theorem:

Let $\hat{\theta}$ be an unbiased estimator and $T$ a sufficient statistic. Define:

\hat{\theta}^* = E[\hat{\theta} \mid T]

Then $\hat{\theta}^*$ is also unbiased and $\text{Var}(\hat{\theta}^*) \leq \text{Var}(\hat{\theta})$ .

1

By law of iterated expectations: $E[\hat{\theta}^*] = E[E[\hat{\theta} \mid T]] = E[\hat{\theta}] = \theta$

2

By law of total variance: $\text{Var}(\hat{\theta}) = E[\text{Var}(\hat{\theta} \mid T)] + \text{Var}(\hat{\theta}^*)$

3

Since $E[\text{Var}(\hat{\theta} \mid T)] \geq 0$ , we have $\text{Var}(\hat{\theta}^*) \leq \text{Var}(\hat{\theta}) \quad \blacksquare$

Lehmann-Scheffé Theorem

Completeness + Sufficiency + Unbiasedness yields unique UMVUE

Theorem:

Let $T$ be a complete sufficient statistic for $\theta$ . If $\hat{\theta} = g(T)$ is an unbiased estimator based solely on $T$ , then $\hat{\theta}$ is the unique UMVUE.

1

For any other unbiased estimator $\tilde{\theta}$ , apply Rao-Blackwell: $\tilde{\theta}^* = E[\tilde{\theta} \mid T]$

2

By completeness: $E[g(T) - h(T)] = 0 \Rightarrow g(T) = h(T)$ almost surely

3

Therefore $\hat{\theta}$ has minimum variance among all unbiased estimators and is unique. $\quad \blacksquare$

Examples

1

Example: Exponential Distribution MOM

Problem: Given a sample from $\text{Exp}(\lambda)$ , find the Method of Moments estimator for $\lambda$ .

Solution:

Population first moment: $\mu_1 = E[X] = \frac{1}{\lambda}$

Sample first moment: $a_{n,1} = \bar{X}$

Set equal: $\frac{1}{\lambda} = \bar{X}$

\hat{\lambda}_{\text{MOM}} = \frac{1}{\bar{X}}

2

Example: Normal Distribution MLE

Problem: Find the MLE of $\mu$ and $\sigma^2$ for sample from $N(\mu, \sigma^2)$ .

Solution:

Log-likelihood: $\ell = -\frac{n}{2}\log(2\pi\sigma^2) - \frac{1}{2\sigma^2}\sum(x_i-\mu)^2$

Differentiate w.r.t. $\mu$ : $\frac{\partial \ell}{\partial \mu} = \frac{1}{\sigma^2}\sum(x_i - \mu) = 0$

Solve: $\hat{\mu} = \bar{X}$

Differentiate w.r.t. $\sigma^2$ and solve: $\hat{\sigma}^2 = \frac{1}{n}\sum(X_i - \bar{X})^2$

3

Example: Poisson MLE and CRLB

Problem: Given $X_1, \ldots, X_n \sim P(\lambda)$ i.i.d., find the MLE of $\lambda$ and verify it achieves the CRLB.

Solution:

Log-likelihood: $\ell(\lambda) = (\sum x_i) \log \lambda - n\lambda - \text{const}$

Score: $S(\lambda) = \frac{\sum x_i}{\lambda} - n = 0$

MLE: $\hat{\lambda} = \bar{X}$

Fisher information: $I(\lambda) = 1/\lambda$

CRLB: $\text{Var}(\hat{\lambda}) \geq \frac{\lambda}{n}$

Actual variance: $\text{Var}(\bar{X}) = \frac{\lambda}{n}$ - achieves CRLB exactly!

Practice Quiz

Test your understanding with 10 multiple-choice questions

Practice Quiz

10

Questions

0

Correct

0%

Accuracy

1

For a normal distribution

X \sim N(\mu,\sigma^2)

with known

\sigma^2

, the Fisher information

I(\mu)

for the population mean

\mu

is:

Not attempted

2

For a sample

X_1, X_2, \ldots, X_n \sim B(1,p)

, the UMVUE of

p

is:

Not attempted

3

For a uniform distribution

U(a,b)

, the method of moments estimator for parameter

a

is:

Not attempted

4

For a normal distribution

N(\mu,\sigma^2)

with known MLE

\hat{\sigma}^2 = S_n^2 = \frac{1}{n}\sum(X_i-\bar{X})^2

, the MLE of

\sigma

is:

Not attempted

5

For a Poisson distribution

P(\lambda)

with sample

X_1, X_2, \ldots, X_n

, the Cramér-Rao lower bound for unbiased estimators of

\lambda

is:

Not attempted

6

Which statistic is a sufficient complete statistic for the normal distribution

N(\mu,\sigma^2)

?

Not attempted

7

The correct decomposition of Mean Squared Error (MSE) is:

Not attempted

8

In the linear regression model

Y_i = \beta_0 + \beta_1 x_i + \varepsilon_i

, the least squares estimator for

\beta_1

is:

Not attempted

9

For uniform distribution

U(0,\theta)

with MLE

\hat{\theta} = X_{(n)}

(maximum order statistic), the consistency property is:

Not attempted

10

An estimator

\hat{g}

that achieves the Cramér-Rao lower bound is called:

Not attempted

Frequently Asked Questions

When should I use MLE vs. Method of Moments?

Use MLE when you need optimal asymptotic properties and can compute the likelihood. Use Method of Moments for quick estimates, complex likelihoods, or as starting values for iterative MLE. MLE is generally preferred for its efficiency and invariance property.

What does it mean for an estimator to be "efficient"?

An estimator is efficient if it achieves the Cramér-Rao lower bound: $\text{Var}(\hat{\theta}) = 1/(nI(\theta))$ . This means no other unbiased estimator has lower variance. MLE is asymptotically efficient under regularity conditions.

Why is sample variance divided by n-1 instead of n?

Dividing by $n-1$ makes the estimator unbiased: $E[S^2] = \sigma^2$ . We "lose one degree of freedom" because we estimate the mean $\bar{X}$ from the same data. The MLE uses $n$ (biased) but the bias vanishes for large samples.

What's the difference between consistency and unbiasedness?

Unbiasedness ( $E[\hat{\theta}] = \theta$ ) is a finite-sample property: on average across repeated samples of size $n$ , the estimate equals the true value. Consistency ( $\hat{\theta}_n \to \theta$ ) is an asymptotic property: as $n \to \infty$ , the estimate converges to the true value.

How do I compute Fisher information?

Two equivalent methods: (1) $I(\theta) = E[(\partial \log f/\partial \theta)^2]$ - expected squared score, or (2) $I(\theta) = -E[\partial^2 \log f/\partial \theta^2]$ - negative expected Hessian. Often method (2) is easier. For $n$ i.i.d. observations, total information is $nI(\theta)$ .

Point Estimation Theory

Estimation Methods

Evaluation Criteria

Cramér-Rao Lower Bound

Theorems & Proofs

Examples

Practice Quiz

Frequently Asked Questions