MathIsimple

Statistical Inference & Estimation

Master the theoretical foundations of parameter estimation in time series: mean and autocovariance estimation, consistency, asymptotic distributions, and white noise diagnostics.

Estimation Theory
Asymptotic Analysis
White Noise Testing

Mean Estimation Theory

Foundations of parameter estimation and consistency analysis

Core Estimation Problem

Fundamental Principle

For AR, MA, and ARMA models, all parameters are uniquely determined by the autocovariance function. Therefore, the key to parameter identification is accurate estimation of γk\gamma_k.

Step 1

Estimate sample mean XˉN\bar{X}_N

Step 2

Estimate autocovariance γ^k\hat{\gamma}_k

Step 3

Identify model parameters

Consistency Theorem

Theorem (Consistency of Sample Mean)

For a stationary sequence with γm0\gamma_m \to 0 as mm \to \infty, the sample mean XˉN\bar{X}_N is a consistent estimator of the population mean μ\mu.

XˉN=1Nt=1NXtPμ\bar{X}_N = \frac{1}{N}\sum_{t=1}^N X_t \xrightarrow{P} \mu

Proof Outline (5 Steps)

1

Mean Square Error Decomposition

E(XˉNμ)2=1N2k=1Nj=1NγkjE(\bar{X}_N - \mu)^2 = \frac{1}{N^2}\sum_{k=1}^N\sum_{j=1}^N \gamma_{|k-j|}

2

Index Transformation

Set m=kjm = k-j, transform to 1N2m=(N1)N1(Nm)γm\frac{1}{N^2}\sum_{m=-(N-1)}^{N-1}(N-|m|)\gamma_m

3

Upper Bound

Use NmN21N\frac{N-|m|}{N^2} \leq \frac{1}{N} to get 1Nm=(N1)N1γm\frac{1}{N}\sum_{m=-(N-1)}^{N-1}|\gamma_m|

4

Césaro Convergence

When γm0\gamma_m \to 0, Césaro average 1Nγm0\frac{1}{N}\sum|\gamma_m| \to 0

5

Probability Convergence

Apply Chebyshev's inequality: P(XˉNμ>ϵ)E(XˉNμ)2ϵ20P(|\bar{X}_N-\mu| > \epsilon) \leq \frac{E(\bar{X}_N-\mu)^2}{\epsilon^2} \to 0

Strong Consistency

For strictly stationary and ergodic sequences, the sample mean is strongly consistent:

XˉNa.s.μ(N)\bar{X}_N \xrightarrow{a.s.} \mu \quad (N \to \infty)

This is a consequence of the ergodic theorem: time averages converge to ensemble averages.

Central Limit Theorem & Asymptotic Distribution

CLT for Linear Stationary Processes

Theorem Statement

For a linear stationary process Xt=μ+k=ψkϵtkX_t = \mu + \sum_{k=-\infty}^{\infty}\psi_k\epsilon_{t-k} with:

  • ψk2<\sum \psi_k^2 < \infty (square-summable coefficients)
  • Spectral density f(λ)f(\lambda) continuous at λ=0\lambda=0 with f(0)0f(0) \neq 0
N(XˉNμ)dN(0,2πf(0))\sqrt{N}(\bar{X}_N - \mu) \xrightarrow{d} N(0, 2\pi f(0))
Asymptotic Variance Calculation

Method 1: Direct computation

NE(XˉNμ)2m=γm=2πf(0)NE(\bar{X}_N-\mu)^2 \to \sum_{m=-\infty}^{\infty}\gamma_m = 2\pi f(0)

The limiting variance captures all temporal dependence through the autocovariance sum.

Spectral Representation

Method 2: Using Wold coefficients

2πf(0)=σ2(k=ψk)22\pi f(0) = \sigma^2\left(\sum_{k=-\infty}^{\infty}\psi_k\right)^2

This shows the connection between spectral density and the MA(∞) representation.

Practical Applications

Confidence Intervals

μXˉN±1.962πf(0)N\mu \in \bar{X}_N \pm 1.96\sqrt{\frac{2\pi f(0)}{N}}

Hypothesis Testing

Test H0:μ=μ0H_0: \mu = \mu_0 using normal approximation

Forecast Intervals

Quantify prediction uncertainty

Convergence Speed & Law of Iterated Logarithm

Law of Iterated Logarithm (LIL)

Theoretical Foundation

The LIL provides a more precise characterization of convergence than the CLT. While CLT gives rate O(1/N)O(1/\sqrt{N}), LIL gives the exact fluctuation bounds:

Convergence rate: O(2lnlnNN)\text{Convergence rate: } O\left(\sqrt{\frac{2\ln\ln N}{N}}\right)

This is a refinement of the CLT: it describes not just the limiting distribution, but the "worst-case" behavior of the sample mean.

LIL Theorem for Linear Stationary Sequences

Conditions

  1. 1.Xt=μ+k=ψkϵtkX_t = \mu + \sum_{k=-\infty}^{\infty}\psi_k\epsilon_{t-k} with ψk<\sum|\psi_k| < \infty
  2. 2.f(0)f(0) continuous at 0, and Eϵtr<E|\epsilon_t|^r < \infty for some r>2r > 2

Result

lim supNN2lnlnN(XˉNμ)=2πf(0),a.s.\limsup_{N\to\infty} \sqrt{\frac{N}{2\ln\ln N}}(\bar{X}_N - \mu) = \sqrt{2\pi f(0)}, \quad a.s.lim infNN2lnlnN(XˉNμ)=2πf(0),a.s.\liminf_{N\to\infty} \sqrt{\frac{N}{2\ln\ln N}}(\bar{X}_N - \mu) = -\sqrt{2\pi f(0)}, \quad a.s.

Interpretation

  • The sample mean fluctuates infinitely often between the bounds ±2πf(0)2lnlnNN\pm\sqrt{2\pi f(0)}\cdot\sqrt{\frac{2\ln\ln N}{N}}
  • Critical: N(XˉNμ)\sqrt{N}(\bar{X}_N-\mu) itself does not converge — it oscillates within precise bounds
  • From this: (XˉNμ)=O(lnlnNN)(\bar{X}_N - \mu) = O\left(\sqrt{\frac{\ln\ln N}{N}}\right)
Practical Value

The LIL tells us the minimum sample size needed for a given estimation precision:

To achieve error tolerance ϵ\epsilon, we need approximately:

N2πf(0)2lnlnNϵ2N \approx \frac{2\pi f(0) \cdot 2\ln\ln N}{\epsilon^2}

AR(2) Mean Calculation & Simulation

AR(2) Model Analysis

Model Definition

Consider an AR(2) model with complex characteristic roots:

A(z)=(1ρeiθz)(1ρeiθz)A(z) = (1-\rho e^{i\theta}\cdot z)(1-\rho e^{-i\theta}\cdot z)

The AR(2) process is:

Xt=2ρcosθXt1ρ2Xt2+ϵtX_t = 2\rho\cos\theta \cdot X_{t-1} - \rho^2 X_{t-2} + \epsilon_t

where ϵti.i.d.N(0,σ2)\epsilon_t \stackrel{i.i.d.}{\sim} N(0,\sigma^2)

Sample Mean Relationship

Taking average of both sides over t=1,...,N:

XˉN1A(1)ϵˉN=112ρcosθ+ρ2ϵˉN\bar{X}_N \approx \frac{1}{A(1)}\bar{\epsilon}_N = \frac{1}{1-2\rho\cos\theta+\rho^2}\bar{\epsilon}_N

Key insight: The sample mean of the AR(2) series is approximately proportional to the white noise sample mean, with proportionality constant 1/A(1)1/A(1).

Simulation Results

Two parameter configurations were simulated with M=1000 replications:

Configuration 1

  • ρ=1/1.1\rho = 1/1.1 (close to unit root)
  • θ=2.34\theta = 2.34
  • • Higher variance expected

Configuration 2

  • ρ=1/4\rho = 1/4 (more stable)
  • θ=2.34\theta = 2.34
  • • Lower variance expected
N1020401004001000
Ave(XˉN\bar{X}_N)-0.0055-0.0032-0.0029-0.0009-0.00080.0001
Ave(ϵˉN\bar{\epsilon}_N)-0.0168-0.0135-0.0060-0.0037-0.00240.0003
Std(XˉN\bar{X}_N)0.19220.10680.06160.03470.01540.0102
Std(ϵˉN\bar{\epsilon}_N)0.35110.23510.15750.09670.04640.0312
Theoretical Validation
Standard deviation decreases at rate O(1/N)O(1/\sqrt{N}), confirming CLT
Std(ϵˉN\bar{\epsilon}_N) > Std(XˉN\bar{X}_N) shows AR smoothing effect
Sample means converge to 0 (true mean), validating consistency

Autocovariance Function Estimation

Sample Autocovariance

Basic Definitions

Sample Autocovariance Function

γ^k=1Nj=1Nk(XjXˉN)(Xj+kXˉN),0kN1\hat{\gamma}_k = \frac{1}{N}\sum_{j=1}^{N-k}(X_j-\bar{X}_N)(X_{j+k}-\bar{X}_N), \quad 0 \leq k \leq N-1

Sample Autocorrelation Function

ρ^k=γ^kγ^0,kN1\hat{\rho}_k = \frac{\hat{\gamma}_k}{\hat{\gamma}_0}, \quad |k| \leq N-1
Why Divide by N instead of N-k?

Critical Reason: Dividing by N ensures positive definiteness of the sample autocovariance matrix.

  • 1.Dividing by N-k might seem "more unbiased" for individual lags
  • 2.However, it can produce non-positive-definite covariance matrices
  • 3.N-divisor guarantees all eigenvalues ≥ 0, essential for valid inference
Positive Definiteness Theorem

Theorem

If sample observations X1,X2,,XNX_1, X_2, \ldots, X_N are not all equal, then the sample autocovariance matrix Γ^N=(γ^kj)k,j=1,,N\hat{\Gamma}_N = (\hat{\gamma}_{k-j})_{k,j=1,\ldots,N} is positive definite.

Proof (Constructive)

Step 1: Define Yj=XjXˉNY_j = X_j - \bar{X}_N (centered observations)

Step 2: Construct the lower triangular matrix A:

"A=(0Y1Y2YN1YN00Y1YN2YN1000Y1Y20000Y1)" {"A = \begin{pmatrix} 0 & Y_1 & Y_2 & \cdots & Y_{N-1} & Y_N \\ 0 & 0 & Y_1 & \cdots & Y_{N-2} & Y_{N-1} \\ \vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\ 0 & 0 & 0 & \cdots & Y_1 & Y_2 \\ 0 & 0 & 0 & \cdots & 0 & Y_1 \end{pmatrix}"}

Step 3: Show that Γ^N=1NAAT\hat{\Gamma}_N = \frac{1}{N}AA^T

Step 4: Since YjY_j not all zero, A has full rank N

Conclusion: AATAA^T is positive definite, so Γ^N\hat{\Gamma}_N is positive definite

Consistency Analysis
Theorem 1: Asymptotic Unbiasedness

For a stationary process:

limNEγ^k=γk\lim_{N\to\infty} E\hat{\gamma}_k = \gamma_k

The estimator is asymptotically unbiased, though it may have finite-sample bias.

Theorem 2: Strong Consistency

If {Xt}\{X_t\} is a strictly stationary and ergodic sequence:

limNγ^k=γk,a.s.\lim_{N\to\infty} \hat{\gamma}_k = \gamma_k, \quad a.s.limNρ^k=ρk,a.s.\lim_{N\to\infty} \hat{\rho}_k = \rho_k, \quad a.s.

Ergodicity enables time averages to replace ensemble averages with probability 1.

Asymptotic Distribution Theory

General Asymptotic Normality

Key Parameters

Fourth Moment

μ4=Eϵt4\mu_4 = E\epsilon_t^4

Normalized excess kurtosis

M0=1σ2(μ4σ4)1/2M_0 = \frac{1}{\sigma^2}(\mu_4 - \sigma^4)^{1/2}
Asymptotic Distribution

Under spectral density condition ππf(λ)2dλ<\int_{-\pi}^{\pi} f(\lambda)^2 d\lambda < \infty:

N(γ^0γ0,γ^1γ1,,γ^hγh)d(ξ0,ξ1,,ξh)\sqrt{N}(\hat{\gamma}_0-\gamma_0, \hat{\gamma}_1-\gamma_1, \ldots, \hat{\gamma}_h-\gamma_h) \xrightarrow{d} (\xi_0, \xi_1, \ldots, \xi_h)N(ρ^1ρ1,ρ^2ρ2,,ρ^hρh)d(R1,R2,,Rh)\sqrt{N}(\hat{\rho}_1-\rho_1, \hat{\rho}_2-\rho_2, \ldots, \hat{\rho}_h-\rho_h) \xrightarrow{d} (R_1, R_2, \ldots, R_h)

where ξj\xi_j and RjR_j are defined through weighted sums of i.i.d. N(0,1)N(0,1) random variables.

MA(q) Specific Case

For m>qm > q in an MA(q) model:

Nρ^mdN(0,1+2ρ12++2ρq2)\sqrt{N}\hat{\rho}_m \xrightarrow{d} N(0, 1+2\rho_1^2+\cdots+2\rho_q^2)

This provides basis for white noise testing and model order selection.

AR(1) Specific Case

For AR(1) with ρ=am\rho = a^m:

N(ρ^mρm)dN(0,Vm)\sqrt{N}(\hat{\rho}_m - \rho_m) \xrightarrow{d} N(0, V_m)Vm=(1+a2)(1a2m)1a22ma2mV_m = \frac{(1+a^2)(1-a^{2m})}{1-a^2} - 2ma^{2m}

Worked Example: MA(1) Parameter Estimation

Complete Estimation Procedure

Problem Setup

Given an MA(1) model with N=100N=100 observations and estimated parameter θ^=0.6\hat{\theta} = 0.6. Construct a 95% confidence interval for the true parameter θ\theta.

Model Specification

Xt=ϵt+θϵt1,ϵti.i.d.N(0,σ2)X_t = \epsilon_t + \theta\epsilon_{t-1}, \quad \epsilon_t \stackrel{i.i.d.}{\sim} N(0,\sigma^2)
Step-by-Step Solution
1

Determine Asymptotic Variance

For MA(1), the asymptotic variance of MLE is:

Avar(Nθ^)=1θ21+θ2+θ4\text{Avar}(\sqrt{N}\hat{\theta}) = \frac{1-\theta^2}{1+\theta^2+\theta^4}
2

Plug-in Estimation

Substitute θ^=0.6\hat{\theta} = 0.6:

Avar^=10.621+0.62+0.64=0.641.48960.4297\widehat{\text{Avar}} = \frac{1-0.6^2}{1+0.6^2+0.6^4} = \frac{0.64}{1.4896} \approx 0.4297SE^(θ^)=0.42971000.0656\widehat{\text{SE}}(\hat{\theta}) = \sqrt{\frac{0.4297}{100}} \approx 0.0656
3

Construct Confidence Interval

Using normal approximation (95% → z = 1.96):

CI=0.6±1.96×0.0656\text{CI} = 0.6 \pm 1.96 \times 0.0656=0.6±0.1286=[0.471,0.729]= 0.6 \pm 0.1286 = [0.471, 0.729]
4

Interpretation

With 95% confidence, the true MA parameter θ lies in [0.471, 0.729]. Since the interval doesn't contain 0, we have strong evidence that MA component is significant.

Alternative: Bootstrap Approach

For small samples or non-normal innovations, bootstrap confidence intervals may be more accurate:

  1. 1.Estimate model and obtain residuals ϵ^t\hat{\epsilon}_t
  2. 2.Resample residuals with replacement: ϵt\epsilon_t^*
  3. 3.Generate bootstrap series: Xt=ϵt+θ^ϵt1X_t^* = \epsilon_t^* + \hat{\theta}\epsilon_{t-1}^*
  4. 4.Re-estimate model on XtX_t^* to get θ^\hat{\theta}^*
  5. 5.Repeat B=1000 times, use percentiles of {θ^b}\{\hat{\theta}^*_b\}

Model Diagnostics & Residual Analysis

Comprehensive procedures for assessing model adequacy

Three-Stage Diagnostic Framework
1
Residual Calculation

Compute standardized residuals:

et=XtX^tt1σ^t2e_t = \frac{X_t - \hat{X}_{t|t-1}}{\sqrt{\hat{\sigma}_t^2}}

where X^tt1\hat{X}_{t|t-1} is the one-step-ahead forecast.

2
Graphical Analysis
  • Time series plot of residuals
  • ACF/PACF of residuals
  • QQ-plot for normality
  • Histogram + density estimate
  • Residuals vs. fitted values
3
Formal Tests
  • Ljung-Box test (H₀: white noise)
  • Jarque-Bera test (H₀: normality)
  • ARCH test (H₀: homoscedasticity)
  • Runs test (H₀: randomness)
Ljung-Box Q-Statistic

Modified version of Box-Pierce with better small-sample properties:

QLB(m)=N(N+2)k=1mρ^k2Nkχ2(mpq)Q_{LB}(m) = N(N+2)\sum_{k=1}^m \frac{\hat{\rho}_k^2}{N-k} \sim \chi^2(m-p-q)

Degrees of freedom adjusted for estimated ARMA(p,q) parameters.

Decision Rules

Model is Adequate if:

  • ✓ Residual ACF within confidence bands
  • ✓ Ljung-Box p-value > 0.05
  • ✓ QQ-plot approximately linear
  • ✓ No obvious patterns in residual plot

Model Needs Revision if:

  • ✗ Multiple ACF lags significant
  • ✗ Ljung-Box p-value < 0.05
  • ✗ Heavy tails in QQ-plot
  • ✗ Systematic patterns/heteroscedasticity
Practical Workflow Example

Scenario: Quarterly sales data (N=80)

After differencing and seasonal adjustment, you fit an ARMA(1,1) model. Estimated parameters: φ=0.7, θ=0.4, σ²=2.5. Assess model adequacy.

Step 1: Compute and Plot Residuals

Generate standardized residuals e_t and create time series plot. ✓ No obvious trends or volatility clustering observed.

Step 2: ACF Analysis

Compute sample ACF up to lag 20. Confidence bands: ±1.96/√80 ≈ ±0.219. ✓ All lags within bands except lag 12 (ρ̂₁₂ = 0.23), possibly spurious.

Step 3: Ljung-Box Test

Test up to lag m=15 (adjusted df = 15-2=13):

QLB(15)=80×82k=115ρ^k280k=16.7Q_{LB}(15) = 80 \times 82 \sum_{k=1}^{15} \frac{\hat{\rho}_k^2}{80-k} = 16.7

Critical value: χ²(13, 0.95) ≈ 22.36. Since 16.7 < 22.36, fail to reject H₀. ✓

Step 4: Normality Check

QQ-plot shows good alignment with theoretical quantiles except slight heaviness in right tail. Jarque-Bera test p-value = 0.08. ✓ Acceptable at 5% level.

Conclusion

ARMA(1,1) model appears adequate. All diagnostic tests support white noise assumption for residuals. Proceed with forecasting and inference.

White Noise Testing

Diagnostic tests for model adequacy

Chi-Square (Portmanteau) Test

Test Statistic

Under the white noise null hypothesis:

X2(m)=N(ρ^12+ρ^22++ρ^m2)χ2(m)X^2(m) = N(\hat{\rho}_1^2 + \hat{\rho}_2^2 + \cdots + \hat{\rho}_m^2) \sim \chi^2(m)

Reject white noise if X2(m)>χm,1α2X^2(m) > \chi^2_{m,1-\alpha}, where α\alpha is the significance level.

Advantages of Chi-Square Test

Joint Testing

Tests multiple lags simultaneously

Higher Power

More efficient than individual tests

Error Control

Automatic Type I error control

Parameter Selection

Choosing m: The number of lags to test

  • Typical choice: m10m \leq 10 in practice
  • Too large m reduces test power (autocorrelations quickly → 0)
  • Too small m may miss important dependencies
ACF Confidence Interval Method

Individual Testing

Under white noise assumption, for each lag k:

P(Nρ^k>1.96)0.05P(\sqrt{N}|\hat{\rho}_k| > 1.96) \approx 0.05

95% Confidence Interval:

ρ^k1.96N|\hat{\rho}_k| \leq \frac{1.96}{\sqrt{N}}
Multiple Testing Issue

When testing m lags simultaneously:

• Even if truly white noise, ~5% of lags will fall outside bounds

• For m=20 lags, expect ~1 false rejection

• Need to consider the overall pattern, not individual violations

Practical Recommendation
1
Use chi-square test for overall assessment of white noise
2
Plot ACF with confidence bands to identify problematic lags
3
Look for systematic patterns, not isolated exceedances

Practical Guidelines & Pitfalls

Common Estimation Pitfalls

Insufficient Sample Size

Asymptotic results (CLT, consistency) rely on NN \to \infty. For N<50N < 50, estimates can be heavily biased.Recommendation: Use bootstrap methods or small-sample corrections (e.g., AICc) for short series.

Ignoring Non-Stationarity

Applying standard estimation to non-stationary data (trends, unit roots) yields spurious results.Recommendation: Always perform unit root tests (ADF, KPSS) and difference data if necessary before estimation.

Over-Parameterization

Fitting high-order ARMA models to capture noise leads to high variance and poor forecasting.Recommendation: Adhere to the Principle of Parsimony. Use AIC/BIC for model selection.

Best Practices Checklist

Visual Inspection First

Plot the time series, ACF, and PACF before any modeling. Look for outliers, seasonality, and trends.

Residual Diagnostics

Never accept a model without checking residuals for whiteness (Ljung-Box) and normality (QQ-plot).

Compare Multiple Models

Don't stop at the first "good" model. Compare 2-3 candidates using Information Criteria and out-of-sample validation.

Report Uncertainty

Always provide confidence intervals for parameters and prediction intervals for forecasts.

Practice Quiz
10
Questions
0
Correct
0%
Accuracy
1
What is Maximum Likelihood Estimation (MLE)?
Not attempted
2
For ARMA models, what is the relationship between MLE and OLS?
Not attempted
3
What are the key asymptotic properties of MLE for ARMA models?
Not attempted
4
What is the Wald test used for?
Not attempted
5
What should residuals look like if the model is adequate?
Not attempted
6
How do AIC and BIC differ in their penalty for model complexity?
Not attempted
7
What is overfitting in time series models?
Not attempted
8
What is the Likelihood Ratio (LR) test used for?
Not attempted
9
What does a significant Ljung-Box Q statistic indicate?
Not attempted
10
How do you construct a forecast interval for ARMA models?
Not attempted

Frequently Asked Questions

Why do we divide by N instead of N-k when estimating autocovariance?

Dividing by N (rather than N-k) ensures that the sample autocovariance matrix is positive definite, which is crucial for statistical inference. While N-k might seem more 'unbiased' for large k, it can lead to non-positive-definite covariance matrices, breaking the mathematical properties needed for estimation and hypothesis testing.

What is the difference between consistency and strong consistency?

Consistency means the estimator converges to the true value in probability (Xbar_n →^p μ), while strong consistency means almost sure convergence (Xbar_n → μ a.s.). Strong consistency is a stronger condition that requires ergodicity of the sequence. In practice, for strictly stationary ergodic sequences, we have strong consistency, which provides stronger guarantees about estimation accuracy.

How does spectral density f(0) affect convergence rate?

The spectral density at frequency zero, f(0), determines the asymptotic variance of the sample mean. It captures the 'long-run variance' of the process. Higher f(0) means stronger long-term dependence and slower convergence. The asymptotic variance is 2πf(0)/N, so processes with f(0)=0 (which don't exist in practice) would converge infinitely fast, while those with large f(0) converge more slowly.

When should I use chi-square test vs individual ACF confidence intervals?

The chi-square test (Portmanteau test) is more powerful for detecting overall departure from white noise because it jointly tests multiple lags. Individual ACF tests are useful for identifying specific problematic lags but suffer from multiple testing issues. For model diagnostics, use both: chi-square for overall assessment and ACF plot for identifying which lags are problematic.

What does the Law of Iterated Logarithm tell us that CLT doesn't?

While CLT gives the rate O(1/√N), the LIL provides the exact bounds for fluctuations: the sample mean oscillates infinitely often between ±√(2πf(0))·√(2 ln ln N/N). This gives us the 'worst-case' behavior and shows that √N(Xbar-μ) doesn't converge but oscillates within precise bounds. It's like knowing not just the average error, but the maximum likely deviation.

Ask AI ✨
MathIsimple – Simple, Friendly Math Tools & Learning