MathIsimple

Statistical Inference & Estimation

Master the theoretical foundations of parameter estimation in time series: mean and autocovariance estimation, consistency, asymptotic distributions, and white noise diagnostics.

Estimation Theory
Asymptotic Analysis
White Noise Testing
Learning Objectives
Understand consistency and strong consistency of sample mean estimators
Apply Central Limit Theorem and Law of Iterated Logarithm to time series
Estimate autocovariance functions and assess their convergence properties
Derive asymptotic distributions for sample ACF and perform white noise tests
Interpret simulation results and validate theoretical convergence rates
Construct confidence intervals for AR and MA model parameters

Mean Estimation Theory

Foundations of parameter estimation and consistency analysis

Core Estimation Problem

Fundamental Principle

For AR, MA, and ARMA models, all parameters are uniquely determined by the autocovariance function. Therefore, the key to parameter identification is accurate estimation of γk\gamma_k.

Step 1

Estimate sample mean XˉN\bar{X}_N

Step 2

Estimate autocovariance γ^k\hat{\gamma}_k

Step 3

Identify model parameters

Consistency Theorem

Theorem (Consistency of Sample Mean)

For a stationary sequence with γm0\gamma_m \to 0 as mm \to \infty, the sample mean XˉN\bar{X}_N is a consistent estimator of the population mean μ\mu.

XˉN=1Nt=1NXtPμ\bar{X}_N = \frac{1}{N}\sum_{t=1}^N X_t \xrightarrow{P} \mu

Proof Outline (5 Steps)

1

Mean Square Error Decomposition

E(XˉNμ)2=1N2k=1Nj=1NγkjE(\bar{X}_N - \mu)^2 = \frac{1}{N^2}\sum_{k=1}^N\sum_{j=1}^N \gamma_{|k-j|}

2

Index Transformation

Set m=kjm = k-j, transform to 1N2m=(N1)N1(Nm)γm\frac{1}{N^2}\sum_{m=-(N-1)}^{N-1}(N-|m|)\gamma_m

3

Upper Bound

Use NmN21N\frac{N-|m|}{N^2} \leq \frac{1}{N} to get 1Nm=(N1)N1γm\frac{1}{N}\sum_{m=-(N-1)}^{N-1}|\gamma_m|

4

Césaro Convergence

When γm0\gamma_m \to 0, Césaro average 1Nγm0\frac{1}{N}\sum|\gamma_m| \to 0

5

Probability Convergence

Apply Chebyshev's inequality: P(XˉNμ>ϵ)E(XˉNμ)2ϵ20P(|\bar{X}_N-\mu| > \epsilon) \leq \frac{E(\bar{X}_N-\mu)^2}{\epsilon^2} \to 0

Strong Consistency

For strictly stationary and ergodic sequences, the sample mean is strongly consistent:

XˉNa.s.μ(N)\bar{X}_N \xrightarrow{a.s.} \mu \quad (N \to \infty)

This is a consequence of the ergodic theorem: time averages converge to ensemble averages.

Central Limit Theorem & Asymptotic Distribution

CLT for Linear Stationary Processes

Theorem Statement

For a linear stationary process Xt=μ+k=ψkϵtkX_t = \mu + \sum_{k=-\infty}^{\infty}\psi_k\epsilon_{t-k} with:

  • ψk2<\sum \psi_k^2 < \infty (square-summable coefficients)
  • Spectral density f(λ)f(\lambda) continuous at λ=0\lambda=0 with f(0)0f(0) \neq 0
N(XˉNμ)dN(0,2πf(0))\sqrt{N}(\bar{X}_N - \mu) \xrightarrow{d} N(0, 2\pi f(0))
Asymptotic Variance Calculation

Method 1: Direct computation

NE(XˉNμ)2m=γm=2πf(0)NE(\bar{X}_N-\mu)^2 \to \sum_{m=-\infty}^{\infty}\gamma_m = 2\pi f(0)

The limiting variance captures all temporal dependence through the autocovariance sum.

Spectral Representation

Method 2: Using Wold coefficients

2πf(0)=σ2(k=ψk)22\pi f(0) = \sigma^2\left(\sum_{k=-\infty}^{\infty}\psi_k\right)^2

This shows the connection between spectral density and the MA(∞) representation.

Practical Applications

Confidence Intervals

μXˉN±1.962πf(0)N\mu \in \bar{X}_N \pm 1.96\sqrt{\frac{2\pi f(0)}{N}}

Hypothesis Testing

Test H0:μ=μ0H_0: \mu = \mu_0 using normal approximation

Forecast Intervals

Quantify prediction uncertainty

Convergence Speed & Law of Iterated Logarithm

Law of Iterated Logarithm (LIL)

Theoretical Foundation

The LIL provides a more precise characterization of convergence than the CLT. While CLT gives rate O(1/N)O(1/\sqrt{N}), LIL gives the exact fluctuation bounds:

Convergence rate: O(2lnlnNN)\text{Convergence rate: } O\left(\sqrt{\frac{2\ln\ln N}{N}}\right)

This is a refinement of the CLT: it describes not just the limiting distribution, but the "worst-case" behavior of the sample mean.

LIL Theorem for Linear Stationary Sequences

Conditions

  1. 1.Xt=μ+k=ψkϵtkX_t = \mu + \sum_{k=-\infty}^{\infty}\psi_k\epsilon_{t-k} with ψk<\sum|\psi_k| < \infty
  2. 2.f(0)f(0) continuous at 0, and Eϵtr<E|\epsilon_t|^r < \infty for some r>2r > 2

Result

lim supNN2lnlnN(XˉNμ)=2πf(0),a.s.\limsup_{N\to\infty} \sqrt{\frac{N}{2\ln\ln N}}(\bar{X}_N - \mu) = \sqrt{2\pi f(0)}, \quad a.s.
lim infNN2lnlnN(XˉNμ)=2πf(0),a.s.\liminf_{N\to\infty} \sqrt{\frac{N}{2\ln\ln N}}(\bar{X}_N - \mu) = -\sqrt{2\pi f(0)}, \quad a.s.

Interpretation

  • The sample mean fluctuates infinitely often between the bounds ±2πf(0)2lnlnNN\pm\sqrt{2\pi f(0)}\cdot\sqrt{\frac{2\ln\ln N}{N}}
  • Critical: N(XˉNμ)\sqrt{N}(\bar{X}_N-\mu) itself does not converge — it oscillates within precise bounds
  • From this: (XˉNμ)=O(lnlnNN)(\bar{X}_N - \mu) = O\left(\sqrt{\frac{\ln\ln N}{N}}\right)
Practical Value

The LIL tells us the minimum sample size needed for a given estimation precision:

To achieve error tolerance ϵ\epsilon, we need approximately:

N2πf(0)2lnlnNϵ2N \approx \frac{2\pi f(0) \cdot 2\ln\ln N}{\epsilon^2}

AR(2) Mean Calculation & Simulation

AR(2) Model Analysis

Model Definition

Consider an AR(2) model with complex characteristic roots:

A(z)=(1ρeiθz)(1ρeiθz)A(z) = (1-\rho e^{i\theta}\cdot z)(1-\rho e^{-i\theta}\cdot z)

The AR(2) process is:

Xt=2ρcosθXt1ρ2Xt2+ϵtX_t = 2\rho\cos\theta \cdot X_{t-1} - \rho^2 X_{t-2} + \epsilon_t

where ϵti.i.d.N(0,σ2)\epsilon_t \stackrel{i.i.d.}{\sim} N(0,\sigma^2)

Sample Mean Relationship

Taking average of both sides over t=1,...,N:

XˉN1A(1)ϵˉN=112ρcosθ+ρ2ϵˉN\bar{X}_N \approx \frac{1}{A(1)}\bar{\epsilon}_N = \frac{1}{1-2\rho\cos\theta+\rho^2}\bar{\epsilon}_N

Key insight: The sample mean of the AR(2) series is approximately proportional to the white noise sample mean, with proportionality constant 1/A(1)1/A(1).

Simulation Results

Two parameter configurations were simulated with M=1000 replications:

Configuration 1

  • ρ=1/1.1\rho = 1/1.1 (close to unit root)
  • θ=2.34\theta = 2.34
  • • Higher variance expected

Configuration 2

  • ρ=1/4\rho = 1/4 (more stable)
  • θ=2.34\theta = 2.34
  • • Lower variance expected
N1020401004001000
Ave(XˉN\bar{X}_N)-0.0055-0.0032-0.0029-0.0009-0.00080.0001
Ave(ϵˉN\bar{\epsilon}_N)-0.0168-0.0135-0.0060-0.0037-0.00240.0003
Std(XˉN\bar{X}_N)0.19220.10680.06160.03470.01540.0102
Std(ϵˉN\bar{\epsilon}_N)0.35110.23510.15750.09670.04640.0312
Theoretical Validation
Standard deviation decreases at rate O(1/N)O(1/\sqrt{N}), confirming CLT
Std(ϵˉN\bar{\epsilon}_N) > Std(XˉN\bar{X}_N) shows AR smoothing effect
Sample means converge to 0 (true mean), validating consistency

Autocovariance Function Estimation

Sample Autocovariance

Basic Definitions

Sample Autocovariance Function

γ^k=1Nj=1Nk(XjXˉN)(Xj+kXˉN),0kN1\hat{\gamma}_k = \frac{1}{N}\sum_{j=1}^{N-k}(X_j-\bar{X}_N)(X_{j+k}-\bar{X}_N), \quad 0 \leq k \leq N-1

Sample Autocorrelation Function

ρ^k=γ^kγ^0,kN1\hat{\rho}_k = \frac{\hat{\gamma}_k}{\hat{\gamma}_0}, \quad |k| \leq N-1
Why Divide by N instead of N-k?

Critical Reason: Dividing by N ensures positive definiteness of the sample autocovariance matrix.

  • 1.Dividing by N-k might seem "more unbiased" for individual lags
  • 2.However, it can produce non-positive-definite covariance matrices
  • 3.N-divisor guarantees all eigenvalues ≥ 0, essential for valid inference
Positive Definiteness Theorem

Theorem

If sample observations X1,X2,,XNX_1, X_2, \ldots, X_N are not all equal, then the sample autocovariance matrix Γ^N=(γ^kj)k,j=1,,N\hat{\Gamma}_N = (\hat{\gamma}_{k-j})_{k,j=1,\ldots,N} is positive definite.

Proof (Constructive)

Step 1: Define Yj=XjXˉNY_j = X_j - \bar{X}_N (centered observations)

Step 2: Construct the lower triangular matrix A:

A=(0Y1Y2YN1YN00Y1YN2YN1000Y1Y20000Y1)A = \begin{pmatrix} 0 & Y_1 & Y_2 & \cdots & Y_{N-1} & Y_N \\ 0 & 0 & Y_1 & \cdots & Y_{N-2} & Y_{N-1} \\ \vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\ 0 & 0 & 0 & \cdots & Y_1 & Y_2 \\ 0 & 0 & 0 & \cdots & 0 & Y_1 \end{pmatrix}

Step 3: Show that Γ^N=1NAAT\hat{\Gamma}_N = \frac{1}{N}AA^T

Step 4: Since YjY_j not all zero, A has full rank N

Conclusion: AATAA^T is positive definite, so Γ^N\hat{\Gamma}_N is positive definite

Consistency Analysis
Theorem 1: Asymptotic Unbiasedness

For a stationary process:

limNEγ^k=γk\lim_{N\to\infty} E\hat{\gamma}_k = \gamma_k

The estimator is asymptotically unbiased, though it may have finite-sample bias.

Theorem 2: Strong Consistency

If {Xt}\{X_t\} is a strictly stationary and ergodic sequence:

limNγ^k=γk,a.s.\lim_{N\to\infty} \hat{\gamma}_k = \gamma_k, \quad a.s.
limNρ^k=ρk,a.s.\lim_{N\to\infty} \hat{\rho}_k = \rho_k, \quad a.s.

Ergodicity enables time averages to replace ensemble averages with probability 1.

Asymptotic Distribution Theory

General Asymptotic Normality

Key Parameters

Fourth Moment

μ4=Eϵt4\mu_4 = E\epsilon_t^4

Normalized excess kurtosis

M0=1σ2(μ4σ4)1/2M_0 = \frac{1}{\sigma^2}(\mu_4 - \sigma^4)^{1/2}
Asymptotic Distribution

Under spectral density condition ππf(λ)2dλ<\int_{-\pi}^{\pi} f(\lambda)^2 d\lambda < \infty:

N(γ^0γ0,γ^1γ1,,γ^hγh)d(ξ0,ξ1,,ξh)\sqrt{N}(\hat{\gamma}_0-\gamma_0, \hat{\gamma}_1-\gamma_1, \ldots, \hat{\gamma}_h-\gamma_h) \xrightarrow{d} (\xi_0, \xi_1, \ldots, \xi_h)
N(ρ^1ρ1,ρ^2ρ2,,ρ^hρh)d(R1,R2,,Rh)\sqrt{N}(\hat{\rho}_1-\rho_1, \hat{\rho}_2-\rho_2, \ldots, \hat{\rho}_h-\rho_h) \xrightarrow{d} (R_1, R_2, \ldots, R_h)

where ξj\xi_j and RjR_j are defined through weighted sums of i.i.d. N(0,1)N(0,1) random variables.

MA(q) Specific Case

For m>qm > q in an MA(q) model:

Nρ^mdN(0,1+2ρ12++2ρq2)\sqrt{N}\hat{\rho}_m \xrightarrow{d} N(0, 1+2\rho_1^2+\cdots+2\rho_q^2)

This provides basis for white noise testing and model order selection.

AR(1) Specific Case

For AR(1) with ρ=am\rho = a^m:

N(ρ^mρm)dN(0,Vm)\sqrt{N}(\hat{\rho}_m - \rho_m) \xrightarrow{d} N(0, V_m)
Vm=(1+a2)(1a2m)1a22ma2mV_m = \frac{(1+a^2)(1-a^{2m})}{1-a^2} - 2ma^{2m}

Worked Example: MA(1) Parameter Estimation

Complete Estimation Procedure

Problem Setup

Given an MA(1) model with N=100N=100 observations and estimated parameter θ^=0.6\hat{\theta} = 0.6. Construct a 95% confidence interval for the true parameter θ\theta.

Model Specification

Xt=ϵt+θϵt1,ϵti.i.d.N(0,σ2)X_t = \epsilon_t + \theta\epsilon_{t-1}, \quad \epsilon_t \stackrel{i.i.d.}{\sim} N(0,\sigma^2)
Step-by-Step Solution
1

Determine Asymptotic Variance

For MA(1), the asymptotic variance of MLE is:

Avar(Nθ^)=1θ21+θ2+θ4\text{Avar}(\sqrt{N}\hat{\theta}) = \frac{1-\theta^2}{1+\theta^2+\theta^4}
2

Plug-in Estimation

Substitute θ^=0.6\hat{\theta} = 0.6:

Avar^=10.621+0.62+0.64=0.641.48960.4297\widehat{\text{Avar}} = \frac{1-0.6^2}{1+0.6^2+0.6^4} = \frac{0.64}{1.4896} \approx 0.4297
SE^(θ^)=0.42971000.0656\widehat{\text{SE}}(\hat{\theta}) = \sqrt{\frac{0.4297}{100}} \approx 0.0656
3

Construct Confidence Interval

Using normal approximation (95% → z = 1.96):

CI=0.6±1.96×0.0656\text{CI} = 0.6 \pm 1.96 \times 0.0656
=0.6±0.1286=[0.471,0.729]= 0.6 \pm 0.1286 = [0.471, 0.729]
4

Interpretation

With 95% confidence, the true MA parameter θ lies in [0.471, 0.729]. Since the interval doesn't contain 0, we have strong evidence that MA component is significant.

Alternative: Bootstrap Approach

For small samples or non-normal innovations, bootstrap confidence intervals may be more accurate:

  1. 1.Estimate model and obtain residuals ϵ^t\hat{\epsilon}_t
  2. 2.Resample residuals with replacement: ϵt\epsilon_t^*
  3. 3.Generate bootstrap series: Xt=ϵt+θ^ϵt1X_t^* = \epsilon_t^* + \hat{\theta}\epsilon_{t-1}^*
  4. 4.Re-estimate model on XtX_t^* to get θ^\hat{\theta}^*
  5. 5.Repeat B=1000 times, use percentiles of {θ^b}\{\hat{\theta}^*_b\}

Model Diagnostics & Residual Analysis

Comprehensive procedures for assessing model adequacy

Three-Stage Diagnostic Framework
1
Residual Calculation

Compute standardized residuals:

et=XtX^tt1σ^t2e_t = \frac{X_t - \hat{X}_{t|t-1}}{\sqrt{\hat{\sigma}_t^2}}

where X^tt1\hat{X}_{t|t-1} is the one-step-ahead forecast.

2
Graphical Analysis
  • Time series plot of residuals
  • ACF/PACF of residuals
  • QQ-plot for normality
  • Histogram + density estimate
  • Residuals vs. fitted values
3
Formal Tests
  • Ljung-Box test (H₀: white noise)
  • Jarque-Bera test (H₀: normality)
  • ARCH test (H₀: homoscedasticity)
  • Runs test (H₀: randomness)
Ljung-Box Q-Statistic

Modified version of Box-Pierce with better small-sample properties:

QLB(m)=N(N+2)k=1mρ^k2Nkχ2(mpq)Q_{LB}(m) = N(N+2)\sum_{k=1}^m \frac{\hat{\rho}_k^2}{N-k} \sim \chi^2(m-p-q)

Degrees of freedom adjusted for estimated ARMA(p,q) parameters.

Decision Rules

Model is Adequate if:

  • ✓ Residual ACF within confidence bands
  • ✓ Ljung-Box p-value > 0.05
  • ✓ QQ-plot approximately linear
  • ✓ No obvious patterns in residual plot

Model Needs Revision if:

  • ✗ Multiple ACF lags significant
  • ✗ Ljung-Box p-value < 0.05
  • ✗ Heavy tails in QQ-plot
  • ✗ Systematic patterns/heteroscedasticity
Practical Workflow Example

Scenario: Quarterly sales data (N=80)

After differencing and seasonal adjustment, you fit an ARMA(1,1) model. Estimated parameters: φ=0.7, θ=0.4, σ²=2.5. Assess model adequacy.

Step 1: Compute and Plot Residuals

Generate standardized residuals e_t and create time series plot. ✓ No obvious trends or volatility clustering observed.

Step 2: ACF Analysis

Compute sample ACF up to lag 20. Confidence bands: ±1.96/√80 ≈ ±0.219. ✓ All lags within bands except lag 12 (ρ̂₁₂ = 0.23), possibly spurious.

Step 3: Ljung-Box Test

Test up to lag m=15 (adjusted df = 15-2=13):

QLB(15)=80×82k=115ρ^k280k=16.7Q_{LB}(15) = 80 \times 82 \sum_{k=1}^{15} \frac{\hat{\rho}_k^2}{80-k} = 16.7

Critical value: χ²(13, 0.95) ≈ 22.36. Since 16.7 < 22.36, fail to reject H₀. ✓

Step 4: Normality Check

QQ-plot shows good alignment with theoretical quantiles except slight heaviness in right tail. Jarque-Bera test p-value = 0.08. ✓ Acceptable at 5% level.

Conclusion

ARMA(1,1) model appears adequate. All diagnostic tests support white noise assumption for residuals. Proceed with forecasting and inference.

White Noise Testing

Diagnostic tests for model adequacy

Chi-Square (Portmanteau) Test

Test Statistic

Under the white noise null hypothesis:

X2(m)=N(ρ^12+ρ^22++ρ^m2)χ2(m)X^2(m) = N(\hat{\rho}_1^2 + \hat{\rho}_2^2 + \cdots + \hat{\rho}_m^2) \sim \chi^2(m)

Reject white noise if X2(m)>χm,1α2X^2(m) > \chi^2_{m,1-\alpha}, where α\alpha is the significance level.

Advantages of Chi-Square Test

Joint Testing

Tests multiple lags simultaneously

Higher Power

More efficient than individual tests

Error Control

Automatic Type I error control

Parameter Selection

Choosing m: The number of lags to test

  • Typical choice: m10m \leq 10 in practice
  • Too large m reduces test power (autocorrelations quickly → 0)
  • Too small m may miss important dependencies
ACF Confidence Interval Method

Individual Testing

Under white noise assumption, for each lag k:

P(Nρ^k>1.96)0.05P(\sqrt{N}|\hat{\rho}_k| > 1.96) \approx 0.05

95% Confidence Interval:

ρ^k1.96N|\hat{\rho}_k| \leq \frac{1.96}{\sqrt{N}}
Multiple Testing Issue

When testing m lags simultaneously:

• Even if truly white noise, ~5% of lags will fall outside bounds

• For m=20 lags, expect ~1 false rejection

• Need to consider the overall pattern, not individual violations

Practical Recommendation
1
Use chi-square test for overall assessment of white noise
2
Plot ACF with confidence bands to identify problematic lags
3
Look for systematic patterns, not isolated exceedances

Practical Guidelines & Pitfalls

Common Estimation Pitfalls

Insufficient Sample Size

Asymptotic results (CLT, consistency) rely on NN \to \infty. For N<50N < 50, estimates can be heavily biased.Recommendation: Use bootstrap methods or small-sample corrections (e.g., AICc) for short series.

Ignoring Non-Stationarity

Applying standard estimation to non-stationary data (trends, unit roots) yields spurious results.Recommendation: Always perform unit root tests (ADF, KPSS) and difference data if necessary before estimation.

Over-Parameterization

Fitting high-order ARMA models to capture noise leads to high variance and poor forecasting.Recommendation: Adhere to the Principle of Parsimony. Use AIC/BIC for model selection.

Best Practices Checklist

Visual Inspection First

Plot the time series, ACF, and PACF before any modeling. Look for outliers, seasonality, and trends.

Residual Diagnostics

Never accept a model without checking residuals for whiteness (Ljung-Box) and normality (QQ-plot).

Compare Multiple Models

Don't stop at the first "good" model. Compare 2-3 candidates using Information Criteria and out-of-sample validation.

Report Uncertainty

Always provide confidence intervals for parameters and prediction intervals for forecasts.

Frequently Asked Questions

1Why do we divide by N instead of N-k when estimating autocovariance?

Dividing by N (rather than N-k) ensures that the sample autocovariance matrix is positive definite, which is crucial for statistical inference. While N-k might seem more 'unbiased' for large k, it can lead to non-positive-definite covariance matrices,破坏ing the mathematical properties needed for estimation and hypothesis testing.

2What is the difference between consistency and strong consistency?

Consistency means the estimator converges to the true value in probability (Xbar_n →^p μ), while strong consistency means almost sure convergence (Xbar_n → μ a.s.). Strong consistency is a stronger condition that requires ergodicity of the sequence. In practice, for strictly stationary ergodic sequences, we have strong consistency, which provides stronger guarantees about estimation accuracy.

3How does spectral density f(0) affect convergence速度?

The spectral density at frequency zero, f(0), determines the asymptotic variance of the sample mean. It captures the 'long-run variance' of the process. Higher f(0) means stronger long-term dependence and slower convergence. The asymptotic variance is 2πf(0)/N, so processes with f(0)=0 (which don't exist in practice) would converge infinitely fast, while those with large f(0) converge more slowly.

4When should I use chi-square test vs individual ACF confidence intervals?

The chi-square test (Portmanteau test) is more powerful for detecting overall departure from white noise because it jointly tests multiple lags. Individual ACF tests are useful for identifying specific problematic lags but suffer from multiple testing issues. For model diagnostics, use both: chi-square for overall assessment and ACF plot for identifying which lags are problematic.

5What does the Law of Iterated Logarithm tell us that CLT doesn't?

While CLT gives the rate O(1/√N), the LIL provides the exact bounds for fluctuations: the sample mean oscillates infinitely often between ±√(2πf(0))·√(2 ln ln N/N). This gives us the 'worst-case' behavior and shows that √N(Xbar-μ) doesn't converge but oscillates within precise bounds. It's like knowing not just the average error, but the maximum likely deviation.

Chapter Summary

Key Theoretical Results

Mean Estimation

  • • Consistency under γ_m → 0
  • • Strong consistency for ergodic sequences
  • • CLT with variance 2πf(0)

Convergence Rates

  • • CLT: O(1/√N)
  • • LIL: O(√(ln ln N / N))
  • • Oscillation bounds: ±√(2πf(0))

Auto covariance

  • • Positive definiteness with N-divisor
  • • Asymptotic unbiasedness
  • • Strong consistency (ergodic case)

White Noise Tests

  • • Chi-square (Portmanteau) test
  • • ACF confidence bands: ±1.96/√N
  • • Multiple testing considerations

Practical Skills Acquired

Compute and interpret sample ACF/PACF
Construct asymptotic confidence intervals
Perform chi-square white noise tests
Interpret simulation convergence results
Assess estimation precision requirements
Diagnose model adequacy using residuals

Further Reading

Time Series Analysis
Hamilton (1994)

Chapters 3-4 cover asymptotic theory for stationary processes with detailed proofs of convergence results and limit distributions.

Time Series: Theory and Methods
Brockwell & Davis (1991)

Chapter 7 provides rigorous treatment of estimation theory, including spectral density estimation and asymptotic distribut ion theory.

Introduction to Time Series and Forecasting
Brockwell & Davis (2002)

More accessible introduction with practical examples. Section 5.2 covers parameter estimation with worked examples.

Forecasting and Time Series: An Applied Approach
Bowerman & O'Connell (1993)

Emphasizes practical model diagnostics and white noise testing procedures with business applications.