Master parameter estimation methods and their optimality properties
Three fundamental approaches to parameter estimation
The Method of Moments estimates parameters by setting sample moments equal to population moments and solving for parameters.
Population Moment
Sample Moment
Estimation Equation
Consistency
Consistent under regularity
Problem:
Given a sample from , find the Method of Moments estimator for .
Solution:
Key Insight:
MOM is intuitive: match observed sample characteristics to theoretical population characteristics. For exponential, the sample mean estimates , so invert to get .
MLE finds the parameter value that makes the observed data most likely. It's the gold standard for point estimation due to optimal asymptotic properties.
Likelihood Function
Log-Likelihood
Score Function
Invariance
is MLE of
Problem:
Find the MLE of and for sample from .
Solution:
Key Insight:
MLE for is unbiased, but is biased (uses not ). Bias vanishes as .
How to judge the quality of estimators
Unbiasedness
On average, estimator equals true value
Efficiency
Smallest variance among unbiased estimators
Consistency
Converges to true value as
Mean Squared Error
Combines bias and variance
Bias-Variance Decomposition:
Problem:
Compare vs for estimating .
Analysis:
Key Insight:
MLE may be biased in finite samples but asymptotically unbiased. Use for exact unbiasedness, for MLE consistency.
Fundamental limit on estimator variance
For any unbiased estimator of , the variance satisfies:
where is the Fisher information
Fisher Information
Alternative Form
Efficiency
Efficient Estimator
Achieves CRLB:
Problem:
Find the Fisher information and CRLB for in .
Solution:
Key Insight:
The sample mean leads to with variance , achieving the CRLB (efficient estimator).
Step-by-step mathematical derivations of fundamental estimation theorems
Theorem Statement:
Let be an unbiased estimator of . Under regularity conditions, the variance satisfies:
where is the Fisher information.
Proof:
Regularity Conditions:
Problem:
Given i.i.d., find the MLE of and verify it achieves the CRLB.
Solution:
Key Insight:
The sample mean is the MLE for Poisson , and it's efficient (achieves CRLB). This demonstrates why MLE is optimal: it achieves the theoretical lower bound on variance.
Theorem Statement:
Let be the MLE of based on i.i.d. observations. Under regularity conditions:
where is the Fisher information and is the true parameter value.
Proof:
Regularity Conditions:
Theorem Statement:
Let be an unbiased estimator and a sufficient statistic. Define:
Then is also unbiased and .
Proof:
Practical Use:
Start with any unbiased estimator , condition on a sufficient statistic to get with lower (or equal) variance. This process is called Rao-Blackwellization.
Problem:
For , start with (unbiased). Use Rao-Blackwell to improve it with sufficient statistic .
Solution:
Key Insight:
Rao-Blackwell transforms a crude unbiased estimator (with infinite variance!) into an efficient estimator (MLE). Always condition on sufficient statistics to improve estimators.
Theorem Statement:
Let be a complete sufficient statistic for . If is an unbiased estimator based solely on , then is the unique UMVUE (Uniformly Minimum Variance Unbiased Estimator) of .
Proof:
Key Concepts:
Common questions about point estimation
Use MLE when you need optimal asymptotic properties and can compute the likelihood. Use Method of Moments for quick estimates, complex likelihoods, or as starting values for iterative MLE. MLE is generally preferred for its efficiency and invariance property, but MOM is simpler and often provides good initial estimates.
An estimator is efficient if it achieves the Cramér-Rao lower bound: . This means no other unbiased estimator has lower variance. MLE is asymptotically efficient under regularity conditions. Efficiency matters because lower variance means more precise estimates from the same sample size.
Dividing by makes the estimator unbiased: . We "lose one degree of freedom" because we estimate the mean from the same data. The MLE uses (biased) but the bias vanishes for large samples. For small samples, use for unbiasedness.
Unbiasedness () is a finite-sample property: on average across repeated samples of size , the estimate equals the true value. Consistency () is an asymptotic property: as , the estimate converges to the true value. An estimator can be biased but consistent (like MLE for ).
Two equivalent methods: (1) - expected squared score, or (2) - negative expected Hessian. Often method (2) is easier. For i.i.d. observations, total information is . Fisher information measures how much information the data contains about .
Yes! By the bias-variance tradeoff, a slightly biased estimator with much lower variance can have smaller MSE: . Examples include ridge regression and James-Stein estimator. However, for large samples, consistency becomes more important than finite-sample bias. MLE sacrifices exact unbiasedness for asymptotic optimality.
If is the MLE of , then is the MLE of for any function . This is powerful: if is MLE for exponential , then is MLE for mean . Method of Moments doesn't have this property.
No universal rule - depends on the distribution and parameter. For normal distributions, asymptotics work well even for . For skewed distributions, need . For heavy-tailed distributions, may need . Always verify with simulation or use exact finite-sample methods (e.g., t-distribution) when is small.