Point Estimation | Comprehensive Guide with FAQ

Learning Objectives

What you'll master in point estimation theory

Master fundamental concepts of point estimation theory and evaluation criteria
Understand Method of Moments, Maximum Likelihood Estimation methods
Learn Uniformly Minimum Variance Unbiased Estimators (UMVUE) construction
Apply Cramér-Rao inequality and Fisher information in efficiency analysis
Analyze estimator properties: unbiasedness, efficiency, consistency
Solve practical estimation problems in statistical inference

Estimation Methods

Three fundamental approaches to parameter estimation

Method of Moments (MOM)

Equate sample moments to population moments

The Method of Moments estimates parameters by setting sample moments equal to population moments and solving for parameters.

\mu_k = E[X^k], \quad a_{n,k} = \frac{1}{n}\sum_{i=1}^n X_i^k

Population Moment

\mu_k = E[X^k]

Sample Moment

a_{n,k} = \frac{1}{n}\sum_{i=1}^n X_i^k

Estimation Equation

\mu_k(\theta) = a_{n,k}

Consistency

Consistent under regularity

When to Use:

Quick initial estimates when moments are easy to calculate
Starting values for iterative MLE algorithms
Distributions where likelihood is complex

Example: Exponential Distribution MOM

Problem:

Given a sample from $\text{Exp}(\lambda)$ , find the Method of Moments estimator for $\lambda$ .

Solution:

Population first moment: $\mu_1 = E[X] = \frac{1}{\lambda}$
Sample first moment: $a_{n,1} = \bar{X} = \frac{1}{n}\sum_{i=1}^n X_i$
Set equal: $\frac{1}{\lambda} = \bar{X}$
Solve for parameter:
$\hat{\lambda}_{\text{MOM}} = \frac{1}{\bar{X}}$

Key Insight:

MOM is intuitive: match observed sample characteristics to theoretical population characteristics. For exponential, the sample mean estimates $1/\lambda$ , so invert to get $\hat{\lambda}$ .

Maximum Likelihood Estimation (MLE)

Find parameters that maximize the probability of observed data

MLE finds the parameter value that makes the observed data most likely. It's the gold standard for point estimation due to optimal asymptotic properties.

L(\theta; x) = \prod_{i=1}^n f(x_i; \theta), \quad \ell(\theta) = \log L(\theta)

Likelihood Function

L(\theta) = \prod_{i=1}^n f(x_i;\theta)

Log-Likelihood

\ell(\theta) = \sum_{i=1}^n \log f(x_i;\theta)

Score Function

S(\theta) = \frac{\partial \ell}{\partial \theta} = 0

Invariance

$g(\hat{\theta})$ is MLE of $g(\theta)$

Asymptotic Properties:

Consistency: $\hat{\theta}_n \xrightarrow{P} \theta_0$
Asymptotic Normality: $\sqrt{n}(\hat{\theta} - \theta) \xrightarrow{d} N(0, I^{-1}(\theta))$
Efficiency: Achieves Cramér-Rao lower bound

Example: Normal Distribution MLE

Problem:

Find the MLE of $\mu$ and $\sigma^2$ for sample from $N(\mu, \sigma^2)$ .

Solution:

Log-likelihood:
$\ell = -\frac{n}{2}\log(2\pi\sigma^2) - \frac{1}{2\sigma^2}\sum(x_i-\mu)^2$
Differentiate w.r.t. $\mu$ : $\frac{\partial \ell}{\partial \mu} = \frac{1}{\sigma^2}\sum(x_i - \mu) = 0$
Solve: $\hat{\mu} = \bar{X}$
Differentiate w.r.t. $\sigma^2$ and solve: $\hat{\sigma}^2 = \frac{1}{n}\sum(X_i - \bar{X})^2$

Key Insight:

MLE for $\mu$ is unbiased, but $\hat{\sigma}^2$ is biased (uses $n$ not $n-1$ ). Bias vanishes as $n \to \infty$ .

Evaluation Criteria

How to judge the quality of estimators

Key Properties of Estimators

Unbiasedness

E[\hat{\theta}] = \theta

On average, estimator equals true value

Efficiency

\text{Var}(\hat{\theta}) \text{ is minimal}

Smallest variance among unbiased estimators

Consistency

\hat{\theta}_n \xrightarrow{P} \theta

Converges to true value as $n \to \infty$

Mean Squared Error

\text{MSE} = E[(\hat{\theta} - \theta)^2]

Combines bias and variance

Bias-Variance Decomposition:

\text{MSE}(\hat{\theta}) = \text{Var}(\hat{\theta}) + [\text{Bias}(\hat{\theta})]^2

Example: Sample Variance Bias

Problem:

Compare $S_n^2 = \frac{1}{n}\sum(X_i-\bar{X})^2$ vs $S^2 = \frac{1}{n-1}\sum(X_i-\bar{X})^2$ for estimating $\sigma^2$ .

Analysis:

MLE estimator $S_n^2$ : $E[S_n^2] = \frac{n-1}{n}\sigma^2$ (biased)
Unbiased estimator $S^2$ : $E[S^2] = \sigma^2$
Bias of $S_n^2$ : $\text{Bias} = -\frac{\sigma^2}{n} \to 0$ as $n \to \infty$
Both have same variance (up to scaling)
For large $n$ , difference negligible

Key Insight:

MLE may be biased in finite samples but asymptotically unbiased. Use $n-1$ for exact unbiasedness, $n$ for MLE consistency.

Cramér-Rao Lower Bound

Fundamental limit on estimator variance

The CRLB Theorem

For any unbiased estimator $\hat{\theta}$ of $\theta$ , the variance satisfies:

\text{Var}(\hat{\theta}) \geq \frac{1}{nI(\theta)}

where $I(\theta)$ is the Fisher information

Fisher Information

I(\theta) = E\left[\left(\frac{\partial \log f}{\partial \theta}\right)^2\right]

Alternative Form

I(\theta) = -E\left[\frac{\partial^2 \log f}{\partial \theta^2}\right]

Efficiency

e(\hat{\theta}) = \frac{1/(nI(\theta))}{\text{Var}(\hat{\theta})}

Efficient Estimator

Achieves CRLB: $e = 1$

Example: Fisher Information for Exponential

Problem:

Find the Fisher information and CRLB for $\lambda$ in $\text{Exp}(\lambda)$ .

Solution:

PDF: $f(x;\lambda) = \lambda e^{-\lambda x}$
Log-likelihood: $\log f = \log\lambda - \lambda x$
Score: $\frac{\partial \log f}{\partial \lambda} = \frac{1}{\lambda} - x$
Fisher information:
$I(\lambda) = E\left[\left(\frac{1}{\lambda} - X\right)^2\right] = \frac{1}{\lambda^2}$
CRLB: $\text{Var}(\hat{\lambda}) \geq \frac{\lambda^2}{n}$

Key Insight:

The sample mean $\bar{X}$ leads to $\hat{\lambda} = 1/\bar{X}$ with variance $\lambda^2/n$ , achieving the CRLB (efficient estimator).

Rigorous Theorem Proofs

Step-by-step mathematical derivations of fundamental estimation theorems

Proof: Cramér-Rao Inequality

For unbiased estimators, variance is bounded below by Fisher information

Theorem Statement:

Let $\hat{\theta}(X_1,\ldots,X_n)$ be an unbiased estimator of $\theta$ . Under regularity conditions, the variance satisfies:

\text{Var}(\hat{\theta}) \geq \frac{1}{nI(\theta)}

where $I(\theta) = E\left[\left(\frac{\partial \log f}{\partial \theta}\right)^2\right]$ is the Fisher information.

Proof:

Step 1: Define the score function $S(\theta) = \frac{\partial}{\partial \theta} \log L(\theta)$
$S(\theta) = \sum_{i=1}^n \frac{\partial}{\partial \theta} \log f(X_i; \theta)$
Note: $E[S(\theta)] = 0$ (score has zero mean)
Step 2: Since $\hat{\theta}$ is unbiased, $E[\hat{\theta}] = \theta$ . Differentiate both sides:
$\frac{\partial}{\partial \theta} \int \hat{\theta}(x) L(\theta;x) dx = 1$
$\int \hat{\theta}(x) \frac{\partial L}{\partial \theta} dx = 1$
$\int \hat{\theta}(x) \frac{\partial \log L}{\partial \theta} L(\theta;x) dx = 1$
Therefore: $E[\hat{\theta} \cdot S(\theta)] = 1$
Step 3: Apply Cauchy-Schwarz inequality:
$[E[\hat{\theta} \cdot S]]^2 \leq E[\hat{\theta}^2] \cdot E[S^2]$
Substituting $E[\hat{\theta} \cdot S] = 1$ :
$1 \leq E[\hat{\theta}^2] \cdot E[S^2]$
Step 4: Since $E[\hat{\theta}] = \theta$ :
$E[\hat{\theta}^2] = \text{Var}(\hat{\theta}) + \theta^2$
And $E[S^2] = nI(\theta)$ (Fisher information for $n$ observations)
Step 5: For $E[S] = 0$ , we can use $\text{Cov}(\hat{\theta}, S)$ :
$1 = E[\hat{\theta} \cdot S] - E[\hat{\theta}]E[S] = \text{Cov}(\hat{\theta}, S)$
By Cauchy-Schwarz for covariance:
$[\text{Cov}(\hat{\theta}, S)]^2 \leq \text{Var}(\hat{\theta}) \cdot \text{Var}(S)$
$1 \leq \text{Var}(\hat{\theta}) \cdot nI(\theta)$
Step 6: Rearranging gives the Cramér-Rao bound:
$\text{Var}(\hat{\theta}) \geq \frac{1}{nI(\theta)} \quad \blacksquare$

Regularity Conditions:

Support of $f(x;\theta)$ does not depend on $\theta$
Can interchange differentiation and integration
Fisher information $I(\theta) > 0$ and finite

Detailed Example: Poisson MLE with Complete Derivation

Problem:

Given $X_1, \ldots, X_n \sim P(\lambda)$ i.i.d., find the MLE of $\lambda$ and verify it achieves the CRLB.

Solution:

Likelihood function:
$L(\lambda) = \prod_{i=1}^n \frac{\lambda^{x_i} e^{-\lambda}}{x_i!}$
$= \frac{\lambda^{\sum x_i} e^{-n\lambda}}{\prod x_i!}$
Log-likelihood:
$\ell(\lambda) = \log L(\lambda) = \sum_{i=1}^n x_i \log \lambda - n\lambda - \sum \log(x_i!)$
$= \left(\sum x_i\right) \log \lambda - n\lambda - \text{const}$
Score function (first derivative):
$S(\lambda) = \frac{\partial \ell}{\partial \lambda} = \frac{\sum x_i}{\lambda} - n$
Set score to zero:
$\frac{\sum x_i}{\hat{\lambda}} - n = 0$
$\hat{\lambda} = \frac{\sum x_i}{n} = \bar{X}$
Verify second derivative (maximum):
$\frac{\partial^2 \ell}{\partial \lambda^2} = -\frac{\sum x_i}{\lambda^2} < 0$
Therefore $\hat{\lambda} = \bar{X}$ is indeed a maximum.
Fisher Information:
$I(\lambda) = -E\left[\frac{\partial^2 \log f}{\partial \lambda^2}\right]$
$= -E\left[-\frac{X}{\lambda^2}\right] = \frac{E[X]}{\lambda^2} = \frac{\lambda}{\lambda^2} = \frac{1}{\lambda}$
For $n$ observations: $nI(\lambda) = \frac{n}{\lambda}$
CRLB:
$\text{Var}(\hat{\lambda}) \geq \frac{1}{nI(\lambda)} = \frac{\lambda}{n}$
Actual variance of MLE:
$\text{Var}(\bar{X}) = \frac{\text{Var}(X_1)}{n} = \frac{\lambda}{n}$
The MLE achieves the CRLB exactly (efficient estimator)!

Key Insight:

The sample mean $\bar{X}$ is the MLE for Poisson $\lambda$ , and it's efficient (achieves CRLB). This demonstrates why MLE is optimal: it achieves the theoretical lower bound on variance.

Proof: MLE Asymptotic Normality

Under regularity conditions, MLE is asymptotically normal with optimal variance

Theorem Statement:

Let $\hat{\theta}_n$ be the MLE of $\theta$ based on $n$ i.i.d. observations. Under regularity conditions:

\sqrt{n}(\hat{\theta}_n - \theta_0) \xrightarrow{d} N(0, I^{-1}(\theta_0))

where $I(\theta)$ is the Fisher information and $\theta_0$ is the true parameter value.

Proof:

Step 1 (Score Equation): The MLE $\hat{\theta}_n$ satisfies the score equation:
$S_n(\hat{\theta}_n) = \sum_{i=1}^n \frac{\partial \log f(X_i; \hat{\theta}_n)}{\partial \theta} = 0$
Step 2 (Taylor Expansion): Expand the score around the true value $\theta_0$ :
$S_n(\hat{\theta}_n) = S_n(\theta_0) + S_n'(\theta^*)(\hat{\theta}_n - \theta_0)$
where $\theta^*$ lies between $\hat{\theta}_n$ and $\theta_0$ , and:
$S_n'(\theta) = \sum_{i=1}^n \frac{\partial^2 \log f(X_i; \theta)}{\partial \theta^2}$
Step 3 (Rearrange for MLE): Since $S_n(\hat{\theta}_n) = 0$ :
$0 = S_n(\theta_0) + S_n'(\theta^*)(\hat{\theta}_n - \theta_0)$
Solving for $\hat{\theta}_n - \theta_0$ :
$\hat{\theta}_n - \theta_0 = -\frac{S_n(\theta_0)}{S_n'(\theta^*)}$
Step 4 (Normalize Both Sides): Multiply by $\sqrt{n}$ :
$\sqrt{n}(\hat{\theta}_n - \theta_0) = -\frac{S_n(\theta_0)/\sqrt{n}}{S_n'(\theta^*)/n}$
Step 5 (Apply CLT to Numerator): By the Central Limit Theorem, the score at $\theta_0$ satisfies:
$\frac{S_n(\theta_0)}{\sqrt{n}} = \frac{1}{\sqrt{n}} \sum_{i=1}^n \frac{\partial \log f(X_i; \theta_0)}{\partial \theta} \xrightarrow{d} N(0, I(\theta_0))$
since $E[\partial \log f/\partial \theta] = 0$ and $\text{Var}(\partial \log f/\partial \theta) = I(\theta)$ .
Step 6 (Apply LLN to Denominator): By the Law of Large Numbers:
$\frac{S_n'(\theta^*)}{n} = \frac{1}{n} \sum_{i=1}^n \frac{\partial^2 \log f(X_i; \theta^*)}{\partial \theta^2}$
By consistency of MLE, $\theta^* \to \theta_0$ , so:
$\frac{S_n'(\theta^*)}{n} \xrightarrow{P} E\left[\frac{\partial^2 \log f}{\partial \theta^2}\right] = -I(\theta_0)$
Step 7 (Combine via Slutsky's Theorem): By Slutsky's theorem:
$\sqrt{n}(\hat{\theta}_n - \theta_0) = -\frac{S_n(\theta_0)/\sqrt{n}}{S_n'(\theta^*)/n} \xrightarrow{d} \frac{N(0, I(\theta_0))}{-I(\theta_0)}$
Step 8 (Conclude Asymptotic Normality): Simplifying the limiting distribution:
$\sqrt{n}(\hat{\theta}_n - \theta_0) \xrightarrow{d} N\left(0, \frac{I(\theta_0)}{I^2(\theta_0)}\right) = N(0, I^{-1}(\theta_0)) \quad \blacksquare$
This shows MLE achieves the Cramér-Rao lower bound asymptotically.

Regularity Conditions:

True parameter $\theta_0$ is an interior point of parameter space
Likelihood is three times differentiable with respect to $\theta$
Fisher information $0 < I(\theta) < \infty$ for all $\theta$
Interchange of differentiation and integration is valid

Rao-Blackwell Theorem

Improving estimators using sufficient statistics

Theorem Statement:

Let $\hat{\theta}$ be an unbiased estimator and $T$ a sufficient statistic. Define:

\hat{\theta}^* = E[\hat{\theta} \mid T]

Then $\hat{\theta}^*$ is also unbiased and $\text{Var}(\hat{\theta}^*) \leq \text{Var}(\hat{\theta})$ .

Proof:

Step 1 (Verify Unbiasedness): We first show that $\hat{\theta}^* = E[\hat{\theta} \mid T]$ is unbiased. Using the tower property of conditional expectation:
$E[\hat{\theta}^*] = E[E[\hat{\theta} \mid T]]$
By the law of iterated expectations:
$E[E[\hat{\theta} \mid T]] = E[\hat{\theta}]$
Since $\hat{\theta}$ is unbiased for $\theta$ , we have $E[\hat{\theta}] = \theta$ , thus:
$E[\hat{\theta}^*] = \theta$
Step 2 (Law of Total Variance): Recall the variance decomposition formula:
$\text{Var}(X) = E[\text{Var}(X \mid Y)] + \text{Var}(E[X \mid Y])$
Applying this to $\hat{\theta}$ conditioned on $T$ :
$\text{Var}(\hat{\theta}) = E[\text{Var}(\hat{\theta} \mid T)] + \text{Var}(E[\hat{\theta} \mid T])$
Step 3 (Substitute Improved Estimator): Recognize that by definition:
$E[\hat{\theta} \mid T] = \hat{\theta}^*$
Substituting into the variance decomposition:
$\text{Var}(\hat{\theta}) = E[\text{Var}(\hat{\theta} \mid T)] + \text{Var}(\hat{\theta}^*)$
Step 4 (Non-negativity of Conditional Variance): By fundamental properties of variance, conditional variance is always non-negative:
$\text{Var}(\hat{\theta} \mid T) \geq 0 \quad \text{for all } T$
Taking expectations on both sides:
$E[\text{Var}(\hat{\theta} \mid T)] \geq 0$
Step 5 (Derive Variance Inequality): From Step 3, rearrange to isolate $\text{Var}(\hat{\theta}^*)$ :
$\text{Var}(\hat{\theta}^*) = \text{Var}(\hat{\theta}) - E[\text{Var}(\hat{\theta} \mid T)]$
Since $E[\text{Var}(\hat{\theta} \mid T)] \geq 0$ from Step 4:
$\text{Var}(\hat{\theta}^*) \leq \text{Var}(\hat{\theta})$
Step 6 (Characterize Equality): Equality holds when:
$E[\text{Var}(\hat{\theta} \mid T)] = 0$
Since $\text{Var}(\hat{\theta} \mid T) \geq 0$ , this requires:
$\text{Var}(\hat{\theta} \mid T) = 0 \quad \text{almost surely}$
Step 7 (Zero Variance Implies Constant): A random variable with zero conditional variance is constant (given the conditioning variable):
$\text{Var}(\hat{\theta} \mid T) = 0 \quad \Rightarrow \quad \hat{\theta} = E[\hat{\theta} \mid T] = \hat{\theta}^*$
This means $\hat{\theta}$ is already a function of the sufficient statistic $T$ alone.
Step 8 (Conclusion): We have proven:
$E[\hat{\theta}^*] = \theta \quad \text{and} \quad \text{Var}(\hat{\theta}^*) \leq \text{Var}(\hat{\theta})$
with equality if and only if $\hat{\theta}$ is already based on $T$ alone. $\quad \blacksquare$

Practical Use:

Start with any unbiased estimator $\hat{\theta}$ , condition on a sufficient statistic $T$ to get $\hat{\theta}^*$ with lower (or equal) variance. This process is called Rao-Blackwellization.

Example: Improving Estimator via Rao-Blackwell

Problem:

For $X_1, \ldots, X_n \sim \text{Exponential}(\lambda)$ , start with $\hat{\lambda}_1 = 1/X_1$ (unbiased). Use Rao-Blackwell to improve it with sufficient statistic $T = \sum X_i$ .

Solution:

Verify unbiasedness of initial estimator:
$E[1/X_1] = \int_0^\infty \frac{1}{x} \lambda e^{-\lambda x} dx$
Using integration by parts or direct calculation: $E[1/X_1] = \lambda$ (unbiased)
Identify sufficient statistic: $T = \sum_{i=1}^n X_i \sim \Gamma(n, \lambda)$
Apply Rao-Blackwell:
$\hat{\lambda}^* = E[1/X_1 \mid T]$
By symmetry, $X_1, \ldots, X_n$ are exchangeable given $T$ :
$E[1/X_i \mid T] = E[1/X_j \mid T] \text{ for all } i,j$
Use linearity:
$n \cdot E[1/X_1 \mid T] = E\left[\sum_{i=1}^n 1/X_i \mid T\right]$
The improved estimator is:
$\hat{\lambda}^* = \frac{1}{n} E\left[\sum 1/X_i \mid T\right]$
For exponential family: It can be shown that:
$\hat{\lambda}^* = \frac{n}{T} = \frac{n}{\sum X_i} = \frac{1}{\bar{X}}$
This is the MLE!
Variance comparison:
$\text{Var}(1/X_1) = \infty \quad \text{(infinite variance!)}$
$\text{Var}(1/\bar{X}) = \frac{\lambda^2}{n} \quad \text{(finite, achieves CRLB)}$

Key Insight:

Rao-Blackwell transforms a crude unbiased estimator (with infinite variance!) into an efficient estimator (MLE). Always condition on sufficient statistics to improve estimators.

Proof: Lehmann-Scheffé Theorem

Completeness + Sufficiency + Unbiasedness yields unique UMVUE

Theorem Statement:

Let $T$ be a complete sufficient statistic for $\theta$ . If $\hat{\theta} = g(T)$ is an unbiased estimator based solely on $T$ , then $\hat{\theta}$ is the unique UMVUE (Uniformly Minimum Variance Unbiased Estimator) of $\theta$ .

Proof:

Step 1 (Strategy): Suppose $\tilde{\theta}$ is any other unbiased estimator of $\theta$ . We will show that $\text{Var}(\hat{\theta}) \leq \text{Var}(\tilde{\theta})$ with equality only when $\tilde{\theta} = \hat{\theta}$ .
Step 2 (Apply Rao-Blackwell): By Rao-Blackwell theorem, define:
$\tilde{\theta}^* = E[\tilde{\theta} \mid T]$
Then $\tilde{\theta}^*$ is also unbiased and:
$\text{Var}(\tilde{\theta}^*) \leq \text{Var}(\tilde{\theta})$
Step 3 (Function of Sufficient Statistic): Since $\tilde{\theta}^* = E[\tilde{\theta} \mid T]$ , it is a function of $T$ alone, say:
$\tilde{\theta}^* = h(T)$
for some function $h$ .
Step 4 (Both are Unbiased Functions of T): We now have two unbiased estimators based on $T$ :
$E[\hat{\theta}] = E[g(T)] = \theta$
$E[\tilde{\theta}^*] = E[h(T)] = \theta$
Step 5 (Use Completeness): Consider their difference:
$E[\hat{\theta} - \tilde{\theta}^*] = E[g(T) - h(T)] = \theta - \theta = 0$
Since $T$ is complete and $g(T) - h(T)$ is a function of $T$ with expectation zero:
$P(g(T) - h(T) = 0) = 1$
Therefore: $\hat{\theta} = \tilde{\theta}^*$ almost surely.
Step 6 (Conclude Uniqueness): Since $\hat{\theta} = \tilde{\theta}^*$ and $\text{Var}(\tilde{\theta}^*) \leq \text{Var}(\tilde{\theta})$ :
$\text{Var}(\hat{\theta}) = \text{Var}(\tilde{\theta}^*) \leq \text{Var}(\tilde{\theta})$
This holds for any unbiased estimator $\tilde{\theta}$ , so $\hat{\theta}$ has minimum variance among all unbiased estimators.
Step 7 (Uniqueness of UMVUE): If there were another UMVUE $\tilde{\theta}'$ , the same argument shows:
$E[\tilde{\theta}' \mid T] = \hat{\theta}$
By completeness: $\tilde{\theta}' = \hat{\theta}$ almost surely. Thus the UMVUE is unique. $\quad \blacksquare$

Key Concepts:

Completeness: $E[g(T)] = 0 \text{ for all } \theta \Rightarrow P(g(T) = 0) = 1$
Sufficiency: $T$ contains all information about $\theta$
UMVUE Recipe: Find complete sufficient statistic $\to$ Find unbiased function of it

Frequently Asked Questions

Common questions about point estimation

When should I use MLE vs. Method of Moments?

Use MLE when you need optimal asymptotic properties and can compute the likelihood. Use Method of Moments for quick estimates, complex likelihoods, or as starting values for iterative MLE. MLE is generally preferred for its efficiency and invariance property, but MOM is simpler and often provides good initial estimates.

What does it mean for an estimator to be "efficient"?

An estimator is efficient if it achieves the Cramér-Rao lower bound: $\text{Var}(\hat{\theta}) = 1/(nI(\theta))$ . This means no other unbiased estimator has lower variance. MLE is asymptotically efficient under regularity conditions. Efficiency matters because lower variance means more precise estimates from the same sample size.

Why is sample variance divided by n-1 instead of n?

Dividing by $n-1$ makes the estimator unbiased: $E[S^2] = \sigma^2$ . We "lose one degree of freedom" because we estimate the mean $\bar{X}$ from the same data. The MLE uses $n$ (biased) but the bias $\sigma^2/n \to 0$ vanishes for large samples. For small samples, use $n-1$ for unbiasedness.

What's the difference between consistency and unbiasedness?

Unbiasedness ( $E[\hat{\theta}] = \theta$ ) is a finite-sample property: on average across repeated samples of size $n$ , the estimate equals the true value. Consistency ( $\hat{\theta}_n \to \theta$ ) is an asymptotic property: as $n \to \infty$ , the estimate converges to the true value. An estimator can be biased but consistent (like MLE for $\sigma^2$ ).

How do I compute Fisher information?

Two equivalent methods: (1) $I(\theta) = E[(\partial \log f/\partial \theta)^2]$ - expected squared score, or (2) $I(\theta) = -E[\partial^2 \log f/\partial \theta^2]$ - negative expected Hessian. Often method (2) is easier. For $n$ i.i.d. observations, total information is $nI(\theta)$ . Fisher information measures how much information the data contains about $\theta$ .

Can a biased estimator ever be better than an unbiased one?

Yes! By the bias-variance tradeoff, a slightly biased estimator with much lower variance can have smaller MSE: $\text{MSE} = \text{Var} + \text{Bias}^2$ . Examples include ridge regression and James-Stein estimator. However, for large samples, consistency becomes more important than finite-sample bias. MLE sacrifices exact unbiasedness for asymptotic optimality.

What is the invariance property of MLE?

If $\hat{\theta}$ is the MLE of $\theta$ , then $g(\hat{\theta})$ is the MLE of $g(\theta)$ for any function $g$ . This is powerful: if $\hat{\lambda} = 1/\bar{X}$ is MLE for exponential $\lambda$ , then $1/\hat{\lambda} = \bar{X}$ is MLE for mean $1/\lambda$ . Method of Moments doesn't have this property.

How large does n need to be for asymptotic properties to hold?

No universal rule - depends on the distribution and parameter. For normal distributions, asymptotics work well even for $n \approx 30$ . For skewed distributions, need $n > 100$ . For heavy-tailed distributions, may need $n > 200$ . Always verify with simulation or use exact finite-sample methods (e.g., t-distribution) when $n$ is small.

Point Estimation Theory

Estimation Methods

When to Use:

Asymptotic Properties:

Evaluation Criteria

Cramér-Rao Lower Bound

Rigorous Theorem Proofs

Frequently Asked Questions