MathIsimple

Autoregressive Models

AR Processes, Difference Equations & Spectral Analysis

2-3 hours
Advanced
Time Series

Learning Objectives

What You'll Learn
Master the theory and practice of autoregressive time series models

Master lag operator algebra and difference equation solutions

Understand stationarity conditions via characteristic polynomial roots

Derive and apply Yule-Walker equations for parameter estimation

Analyze spectral density and frequency domain properties

Use ACF/PACF for model identification and diagnostics

Implement AIC/BIC model selection criteria

Apply AR models to real-world econometric and financial data

Essential Definitions

Core concepts and mathematical foundations

Autoregressive Model AR(p)

Formal Definition

A stochastic process {Xt}\{X_t\} is an autoregressive process of order pp if it satisfies:

Xt=j=1pajXtj+ϵt,tZX_t = \sum_{j=1}^{p} a_j X_{t-j} + \epsilon_t, \quad t \in \mathbb{Z}

where {ϵt}WN(0,σ2)\{\epsilon_t\} \sim WN(0,\sigma^2) is white noise and a1,,apa_1, \ldots, a_p are real coefficients with ap0a_p \neq 0.

Interpretation

The current value is a linear combination of past values plus an innovation term. This models persistence and momentum in time series data.

Key Property

AR models exhibit short memory: influence of past values decays exponentially over time, controlled by the characteristic roots.

Lag Operator (B)

Operator Definition

The lag operator (backshift operator) BB is defined by:

BXt=Xt1B X_t = X_{t-1}
BnXt=XtnB^n X_t = X_{t-n}
Properties
  • Linearity: B(aXt+bYt)=aBXt+bBYtB(aX_t + bY_t) = aB X_t + bB Y_t
  • Constants: Bc=cB c = c for any constant cc
  • Composition: BnBm=Bn+mB^n B^m = B^{n+m}
  • Polynomial Action: For ψ(z)=cjzj\psi(z) = \sum c_j z^j, ψ(B)Xt=cjXtj\psi(B)X_t = \sum c_j X_{t-j}
AR(p) in Operator Form

Using the lag operator, AR(p) becomes:

A(B)Xt=ϵtA(B) X_t = \epsilon_t

where A(z)=1j=1pajzjA(z) = 1 - \sum_{j=1}^p a_j z^j is the characteristic polynomial.

Stationarity Condition (Minimum Phase)

Critical Requirement

An AR(p) process is weakly stationary if and only if:

All roots of A(z)=0 satisfy z>1\text{All roots of } A(z) = 0 \text{ satisfy } |z| > 1

Equivalently, all roots lie outside the unit circle in the complex plane.

|z| > 1

Stationary: Solutions decay exponentially to zero, ensuring bounded variance.

|z| = 1

Unit Root: Persistent oscillations, non-stationary (requires differencing).

|z| < 1

Explosive: Solutions grow exponentially, variance → ∞.

Key Theorems & Proofs

Rigorous mathematical foundations with step-by-step derivations

Theorem 1: AR(p) Stationarity Criterion
Necessary and sufficient condition for weak stationarity

Theorem Statement:

The AR(p) process Xt=j=1pajXtj+ϵtX_t = \sum_{j=1}^p a_j X_{t-j} + \epsilon_t is weakly stationary if and only if all roots of the characteristic equation A(z)=0A(z) = 0 satisfy z>1|z| > 1.

Proof:

  1. Setup: Consider the homogeneous difference equation A(B)Xt=0A(B)X_t = 0. The characteristic equation is 1j=1pajzj=01 - \sum_{j=1}^p a_j z^j = 0.

    Let z1,,zkz_1, \ldots, z_k denote the distinct roots with multiplicities r(1),,r(k)r(1), \ldots, r(k) where r(j)=p\sum r(j) = p.

  2. General solution form: The solution to the homogeneous equation is:
    Xt(h)=j=1km=0r(j)1Cj,mtmzjtX_t^{(h)} = \sum_{j=1}^k \sum_{m=0}^{r(j)-1} C_{j,m} t^m z_j^{-t}

    where Cj,mC_{j,m} are constants determined by initial conditions.

  3. Stationarity requires convergence: For the particular solution to exist and be stationary, we need:
    limtXt(h)=0\lim_{t \to \infty} X_t^{(h)} = 0

    This ensures the process doesn't explode or maintain unit root oscillations.

  4. Convergence condition: The term tmzjt0t^m z_j^{-t} \to 0 as tt \to \infty if and only if zj1<1|z_j^{-1}| < 1, i.e., zj>1|z_j| > 1.

    Polynomial growth tmt^m is dominated by exponential decay zjt|z_j|^{-t} when zj>1|z_j| > 1.

  5. Wold representation existence: When all zj>1|z_j| > 1, we can invert the operator:
    Xt=A(B)1ϵt=j=0ψjϵtjX_t = A(B)^{-1} \epsilon_t = \sum_{j=0}^{\infty} \psi_j \epsilon_{t-j}

    with ψj<\sum |\psi_j| < \infty (absolutely summable), guaranteeing finite variance.

  6. Conclusion: The process is weakly stationary with:
    E[Xt]=0,Var(Xt)=σ2j=0ψj2<E[X_t] = 0, \quad \text{Var}(X_t) = \sigma^2 \sum_{j=0}^{\infty} \psi_j^2 < \infty

Practical Implication:

Before fitting an AR model, verify that estimated characteristic roots lie outside the unit circle. If not, the data likely requires differencing (ARIMA modeling) or has structural breaks.

Theorem 2: Yule-Walker Equations
Autocovariance structure and parameter estimation

Theorem Statement:

For a stationary AR(p) process, the autocovariance function satisfies:

γk=j=1pajγkj,k1\gamma_k = \sum_{j=1}^p a_j \gamma_{k-j}, \quad k \geq 1

with variance:

σ2=γ0j=1pajγj\sigma^2 = \gamma_0 - \sum_{j=1}^p a_j \gamma_j

Proof:

  1. Start from AR definition: Multiply both sides by XtkX_{t-k}:
    XtXtk=j=1pajXtjXtk+ϵtXtkX_t X_{t-k} = \sum_{j=1}^p a_j X_{t-j} X_{t-k} + \epsilon_t X_{t-k}
  2. Take expectations: Using linearity of expectation:
    E[XtXtk]=j=1pajE[XtjXtk]+E[ϵtXtk]E[X_t X_{t-k}] = \sum_{j=1}^p a_j E[X_{t-j} X_{t-k}] + E[\epsilon_t X_{t-k}]
  3. Apply white noise orthogonality: For k1k \geq 1, ϵt\epsilon_t is uncorrelated with XtkX_{t-k}:
    E[ϵtXtk]=E[ϵt]E[Xtk]=0E[\epsilon_t X_{t-k}] = E[\epsilon_t] E[X_{t-k}] = 0

    since ϵt\epsilon_t is independent of all past XX values.

  4. Substitute autocovariance: By definition γk=E[XtXtk]\gamma_k = E[X_t X_{t-k}]:
    γk=j=1pajγkj,k1\gamma_k = \sum_{j=1}^p a_j \gamma_{k-j}, \quad k \geq 1
  5. Variance equation (k=0 case): When k=0k=0, E[ϵtXt]0E[\epsilon_t X_t] \neq 0:
    γ0=j=1pajγj+E[ϵtXt]\gamma_0 = \sum_{j=1}^p a_j \gamma_j + E[\epsilon_t X_t]

    Since Xt=ajXtj+ϵtX_t = \sum a_j X_{t-j} + \epsilon_t, we have E[ϵtXt]=σ2E[\epsilon_t X_t] = \sigma^2.

  6. Final variance formula: Rearranging:
    σ2=γ0j=1pajγj\sigma^2 = \gamma_0 - \sum_{j=1}^p a_j \gamma_j
Matrix Formulation

For k=1,,pk = 1, \ldots, p, the Yule-Walker system in matrix form:

[γ0γ1γp1γ1γ0γp2γp1γp2γ0][a1a2ap]=[γ1γ2γp]\begin{bmatrix} \gamma_0 & \gamma_1 & \cdots & \gamma_{p-1} \\ \gamma_1 & \gamma_0 & \cdots & \gamma_{p-2} \\ \vdots & \vdots & \ddots & \vdots \\ \gamma_{p-1} & \gamma_{p-2} & \cdots & \gamma_0 \end{bmatrix} \begin{bmatrix} a_1 \\ a_2 \\ \vdots \\ a_p \end{bmatrix} = \begin{bmatrix} \gamma_1 \\ \gamma_2 \\ \vdots \\ \gamma_p \end{bmatrix}

The coefficient matrix Γp\Gamma_p is a Toeplitz matrix (constant diagonals), guaranteed positive definite for stationary processes.

Theorem 3: PACF Truncation Property
Diagnostic signature for AR model order

Theorem Statement:

For an AR(p) process, the partial autocorrelation function (PACF) satisfies:

ϕkk={akkp0k>p\phi_{kk} = \begin{cases} a_k & k \leq p \\ 0 & k > p \end{cases}

Proof Sketch:

  1. PACF definition: ϕkk\phi_{kk} is the last coefficient in the regression of XtX_t on Xt1,,XtkX_{t-1}, \ldots, X_{t-k}.
  2. For k ≤ p: The optimal k-step predictor includes lags 1 through k, giving ϕkk=ak\phi_{kk} = a_k.
  3. For k > p: Lags beyond p add no predictive power (already captured by lags 1 to p), thus ϕkk=0\phi_{kk} = 0.

Model Identification Rule:

Plot sample PACF with 95% confidence bands ±1.96/n\pm 1.96/\sqrt{n}. The order p is identified where PACF last exceeds the bands before remaining within them.

Worked Examples

Step-by-step solutions for model analysis and parameter estimation

Example 1: Complete Analysis of AR(1) Process
Fundamental
Deriving properties for Xt=0.8Xt1+ϵtX_t = 0.8 X_{t-1} + \epsilon_t

Problem:

Consider the process Xt=0.8Xt1+ϵtX_t = 0.8 X_{t-1} + \epsilon_t where ϵtWN(0,1)\epsilon_t \sim WN(0, 1). Determine stationarity, calculate the mean, variance, and the first 3 values of the ACF. Find the spectral density.

1
Stationarity Check

The characteristic polynomial is A(z)=10.8zA(z) = 1 - 0.8z.

Root: 10.8z=0    z=1/0.8=1.251 - 0.8z = 0 \implies z = 1/0.8 = 1.25.

Since z=1.25>1|z| = 1.25 > 1, the root lies outside the unit circle.

The process is weakly stationary.

2
Moments Calculation

Mean

E[Xt]=0.8E[Xt1]+E[ϵt]E[X_t] = 0.8 E[X_{t-1}] + E[\epsilon_t]
μ=0.8μ+0    0.2μ=0    μ=0\mu = 0.8 \mu + 0 \implies 0.2\mu = 0 \implies \mu = 0

Variance

Var(Xt)=0.82Var(Xt1)+Var(ϵt)\text{Var}(X_t) = 0.8^2 \text{Var}(X_{t-1}) + \text{Var}(\epsilon_t)
γ0=0.64γ0+1    0.36γ0=1\gamma_0 = 0.64 \gamma_0 + 1 \implies 0.36 \gamma_0 = 1
γ0=1/0.362.778\gamma_0 = 1/0.36 \approx 2.778

3
Autocorrelation Function (ACF)

For AR(1), ρk=ϕk=0.8k\rho_k = \phi^k = 0.8^k.

Lag 1ρ1=0.81=0.8\rho_1 = 0.8^1 = 0.8
Lag 2ρ2=0.82=0.64\rho_2 = 0.8^2 = 0.64
Lag 3ρ3=0.83=0.512\rho_3 = 0.8^3 = 0.512

Note the geometric decay, characteristic of AR(1) processes.

4
Spectral Density

Using the formula f(λ)=σ22π11ϕeiλ2f(\lambda) = \frac{\sigma^2}{2\pi} \frac{1}{|1 - \phi e^{-i\lambda}|^2}:

f(λ)=12π110.8eiλ2=12π(1+0.822(0.8)cosλ)f(\lambda) = \frac{1}{2\pi} \frac{1}{|1 - 0.8 e^{-i\lambda}|^2} = \frac{1}{2\pi(1 + 0.8^2 - 2(0.8)\cos\lambda)}
f(λ)=12π(1.641.6cosλ)f(\lambda) = \frac{1}{2\pi(1.64 - 1.6\cos\lambda)}

Analysis: The denominator is minimized when cosλ=1\cos\lambda = 1 (i.e., λ=0\lambda = 0), maximizing f(lambda)f(\\lambda). This indicates a "low-pass" filter effect, dominating low frequencies (smooth trends).

Example 2: AR(2) with Real Roots
Intermediate
Analyzing Xt=0.7Xt10.1Xt2+ϵtX_t = 0.7X_{t-1} - 0.1X_{t-2} + \epsilon_t

Problem:

For the model Xt0.7Xt1+0.1Xt2=ϵtX_t - 0.7X_{t-1} + 0.1X_{t-2} = \epsilon_t with σ2=1\sigma^2=1, find the characteristic roots, check stationarity, and calculate γ0,γ1\gamma_0, \gamma_1.

1. Roots Calculation

Characteristic equation: 10.7z+0.1z2=01 - 0.7z + 0.1z^2 = 0

Using quadratic formula for 0.1z20.7z+1=00.1z^2 - 0.7z + 1 = 0:

z=0.7±0.490.40.2=0.7±0.30.2z = \frac{0.7 \pm \sqrt{0.49 - 0.4}}{0.2} = \frac{0.7 \pm 0.3}{0.2}
z1=1.0/0.2=5z_1 = 1.0/0.2 = 5
z2=0.4/0.2=2z_2 = 0.4/0.2 = 2

Both z1>1|z_1| > 1 and z2>1|z_2| > 1. The process is stationary.

2. Yule-Walker System

The Yule-Walker equations for AR(2):

γ0=0.7γ10.1γ2+σ2\gamma_0 = 0.7\gamma_1 - 0.1\gamma_2 + \sigma^2
γ1=0.7γ00.1γ1\gamma_1 = 0.7\gamma_0 - 0.1\gamma_1
γ2=0.7γ10.1γ0\gamma_2 = 0.7\gamma_1 - 0.1\gamma_0

3. Solving for Autocovariances

From eq (2): γ1(1+0.1)=0.7γ0    1.1γ1=0.7γ0    ρ1=0.71.10.636\gamma_1(1 + 0.1) = 0.7\gamma_0 \implies 1.1\gamma_1 = 0.7\gamma_0 \implies \rho_1 = \frac{0.7}{1.1} \approx 0.636

Substitute γ1\gamma_1 and γ2\gamma_2 into eq (1):

γ0=0.7(0.636γ0)0.1(0.7(0.636γ0)0.1γ0)+1\gamma_0 = 0.7(0.636\gamma_0) - 0.1(0.7(0.636\gamma_0) - 0.1\gamma_0) + 1

Solving this linear system yields:

γ01.88,γ11.20\gamma_0 \approx 1.88, \quad \gamma_1 \approx 1.20
Example 3: AR(2) with Complex Roots (Pseudo-Cyclic)
Advanced
Analyzing Xt=1.5Xt10.75Xt2+ϵtX_t = 1.5X_{t-1} - 0.75X_{t-2} + \epsilon_t

Problem:

Investigate the behavior of Xt=1.5Xt10.75Xt2+ϵtX_t = 1.5X_{t-1} - 0.75X_{t-2} + \epsilon_t. Determine the period of the pseudo-cycles.

1. Characteristic Roots

11.5z+0.75z2=01 - 1.5z + 0.75z^2 = 0

Discriminant Δ=2.253=0.75<0\Delta = 2.25 - 3 = -0.75 < 0. Roots are complex.

z=1.5±i0.751.5=1±i33z = \frac{1.5 \pm i\sqrt{0.75}}{1.5} = 1 \pm i\frac{\sqrt{3}}{3}

Modulus z=1+1/3=4/31.15>1|z| = \sqrt{1 + 1/3} = \sqrt{4/3} \approx 1.15 > 1. Stationary.

2. Cycle Analysis

The ACF exhibits damped cosine waves: ρkArkcos(kλ0+ϕ)\rho_k \approx A r^k \cos(k\lambda_0 + \phi).

Frequency λ0\lambda_0 is determined by the argument of the roots z1z^{-1}.

cosλ0=a12a2=1.520.75=1.52(0.866)0.866\cos \lambda_0 = \frac{a_1}{2\sqrt{-a_2}} = \frac{1.5}{2\sqrt{0.75}} = \frac{1.5}{2(0.866)} \approx 0.866
λ0=arccos(0.866)=π/6\lambda_0 = \arccos(0.866) = \pi/6

Period: P=2π/λ0=12P = 2\pi / \lambda_0 = 12.

The process exhibits pseudo-cycles of length 12.

Example 4: Yule-Walker Parameter Estimation
Practical
Estimating AR(2) coefficients from sample data

Problem:

A time series yields sample autocorrelations ρ^1=0.8\hat{\rho}_1 = 0.8 and ρ^2=0.5\hat{\rho}_2 = 0.5. Estimate the parameters a^1,a^2\hat{a}_1, \hat{a}_2 for an AR(2) model.

1. Setup Yule-Walker System

For AR(2), the normalized Yule-Walker equations (using correlations) are:

{ρ1=a1+a2ρ1ρ2=a1ρ1+a2\begin{cases} \rho_1 = a_1 + a_2 \rho_1 \\ \rho_2 = a_1 \rho_1 + a_2 \end{cases}

2. Substitute Sample Values

{0.8=a^1+0.8a^20.5=0.8a^1+a^2\begin{cases} 0.8 = \hat{a}_1 + 0.8 \hat{a}_2 \\ 0.5 = 0.8 \hat{a}_1 + \hat{a}_2 \end{cases}

3. Solve Linear System

From eq 1: a^1=0.80.8a^2\hat{a}_1 = 0.8 - 0.8\hat{a}_2. Substitute into eq 2:

0.5=0.8(0.80.8a^2)+a^20.5 = 0.8(0.8 - 0.8\hat{a}_2) + \hat{a}_2
0.5=0.640.64a^2+a^20.5 = 0.64 - 0.64\hat{a}_2 + \hat{a}_2
0.50.64=0.36a^2    0.14=0.36a^20.5 - 0.64 = 0.36\hat{a}_2 \implies -0.14 = 0.36\hat{a}_2
a^2=0.14/0.360.389\hat{a}_2 = -0.14/0.36 \approx -0.389

Now find a^1\hat{a}_1:

a^1=0.80.8(0.389)=0.8+0.311=1.111\hat{a}_1 = 0.8 - 0.8(-0.389) = 0.8 + 0.311 = 1.111

Estimated Model:

Xt=1.11Xt10.39Xt2+ϵtX_t = 1.11 X_{t-1} - 0.39 X_{t-2} + \epsilon_t

Check stationarity: a^1+a^2=0.72<1\hat{a}_1 + \hat{a}_2 = 0.72 < 1, a^2a^1=1.5<1\hat{a}_2 - \hat{a}_1 = -1.5 < 1, a^2<1|\hat{a}_2| < 1. Valid stationary model.

Spectral Density Analysis

Understanding AR processes in the frequency domain

The Spectral Density Function

The spectral density f(λ)f(\lambda) describes how the variance of the time series is distributed across different frequencies λ[π,π]\lambda \in [-\pi, \pi].

General Formula for AR(p):

f(λ)=σ22π1A(eiλ)2=σ22π11j=1pajeijλ2f(\lambda) = \frac{\sigma^2}{2\pi} \frac{1}{|A(e^{-i\lambda})|^2} = \frac{\sigma^2}{2\pi} \frac{1}{|1 - \sum_{j=1}^p a_j e^{-ij\lambda}|^2}
Key Insights:
  • Peaks: Occur at frequencies where A(eiλ)|A(e^{-i\lambda})| is small (roots close to unit circle).
  • Low Frequency: Positive AR coefficients (e.g., a1>0a_1 > 0) boost low frequencies (trends).
  • High Frequency: Negative AR coefficients (e.g., a1<0a_1 < 0) boost high frequencies (rapid oscillation).
Frequency Domain Signatures
AR(1) Spectra

Positive φ: Peak at λ=0\lambda=0. Looks like "red noise" (smooth).
Negative φ: Peak at λ=π\lambda=\pi. Looks like "blue noise" (jagged).

AR(2) Pseudo-Cyclic Spectra

With complex roots re±iλ0re^{\pm i\lambda_0}, the spectrum has a peak near λ0\lambda_0.
The closer rr is to 1, the sharper the peak (stronger periodicity).

"The spectral density is the Fourier transform of the autocovariance function. They contain equivalent information but offer different perspectives."

Model Diagnostics & Validation

Ensuring model adequacy through residual analysis and order selection

Order Selection Criteria
PACF Method

Look for the lag pp where PACF cuts off.
Rule: ϕkk>1.96/n|\phi_{kk}| > 1.96/\sqrt{n} for kpk \le p and 0\approx 0 for k>pk > p.

AIC (Akaike)
AIC(p)=lnσ^p2+2pnAIC(p) = \ln \hat{\sigma}_p^2 + \frac{2p}{n}

Penalizes complexity. Good for prediction, tends to overestimate order slightly.

BIC (Bayesian)
BIC(p)=lnσ^p2+plnnnBIC(p) = \ln \hat{\sigma}_p^2 + \frac{p \ln n}{n}

Stronger penalty. Consistent estimator of true order pp as nn \to \infty.

Residual Whiteness Test

If the model is adequate, residuals ϵ^t\hat{\epsilon}_t should behave like white noise.

  • ACF of Residuals: Should be within bounds ±1.96/n\pm 1.96/\sqrt{n} for all lags k>0k > 0.
  • Ljung-Box Test: Q=n(n+2)k=1mρ^k2(ϵ^)nkQ = n(n+2)\sum_{k=1}^m \frac{\hat{\rho}_k^2(\hat{\epsilon})}{n-k}.
Common Pitfalls
  • Overfitting: Including too many lags increases variance of estimates.
  • Underfitting: Missing lags leaves serial correlation in residuals (biased estimates).
  • Unit Roots: Applying AR to non-stationary data yields spurious results. Always difference first if needed.

Advanced Extensions

From heteroskedasticity to multivariate systems and deep learning

ARCH & GARCH

Standard AR models assume constant variance (σ2\sigma^2). Financial data often exhibits volatility clustering.

GARCH(1,1) Model:

σt2=α0+α1ϵt12+β1σt12\sigma_t^2 = \alpha_0 + \alpha_1 \epsilon_{t-1}^2 + \beta_1 \sigma_{t-1}^2

Used extensively in risk management (Value-at-Risk) and option pricing.

Vector AR (VAR)

Extends AR to kk variables. Xt\mathbf{X}_t is a vector, Aj\mathbf{A}_j are matrices.

VAR(p) Model:

Xt=j=1pAjXtj+ϵt\mathbf{X}_t = \sum_{j=1}^p \mathbf{A}_j \mathbf{X}_{t-j} + \mathbf{\epsilon}_t

Captures feedback loops between variables (e.g., GDP, Inflation, Interest Rates).

Deep Autoregression

Modern approaches replace linear combinations with non-linear neural networks.

Neural AR:

Xt=fNN(Xt1,,Xtp)+ϵtX_t = f_{NN}(X_{t-1}, \ldots, X_{t-p}) + \epsilon_t

Includes RNNs, LSTMs, and Transformers (e.g., Temporal Fusion Transformer).

Levinson-Durbin Recursion

Efficient recursive algorithm for solving Yule-Walker equations

Computational Efficiency

Solving the Yule-Walker system Γpa=γp\Gamma_p \mathbf{a} = \mathbf{\gamma}_p directly via matrix inversion (Gaussian elimination) requires O(p3)O(p^3) operations.

The Levinson Advantage:

By exploiting the Toeplitz structure of Γp\Gamma_p, the Levinson-Durbin algorithm reduces complexity to O(p2)O(p^2).

This is crucial for fitting high-order AR models (e.g., in speech processing or geophysical signal analysis).

The Algorithm

Initialization:

ϕ11=ρ1,σ12=γ0(1ϕ112)\phi_{11} = \rho_1, \quad \sigma_1^2 = \gamma_0(1 - \phi_{11}^2)

Recursion (for k = 2 to p):

ϕkk=ρkj=1k1ϕk1,jρkj1j=1k1ϕk1,jρj\phi_{kk} = \frac{\rho_k - \sum_{j=1}^{k-1} \phi_{k-1,j} \rho_{k-j}}{1 - \sum_{j=1}^{k-1} \phi_{k-1,j} \rho_j}
ϕkj=ϕk1,jϕkkϕk1,kj\phi_{kj} = \phi_{k-1,j} - \phi_{kk} \phi_{k-1,k-j}
σk2=σk12(1ϕkk2)\sigma_k^2 = \sigma_{k-1}^2 (1 - \phi_{kk}^2)

Note: ϕkk\phi_{kk} is the partial autocorrelation at lag k.

Forecasting with AR Models

Optimal linear prediction and mean squared error analysis

One-Step Ahead Prediction

The optimal predictor (minimizing MSE) of Xn+1X_{n+1} given history Fn\mathcal{F}_n is the conditional expectation:

X^n+1=E[Xn+1Xn,Xn1,]=j=1pajXn+1j\hat{X}_{n+1} = E[X_{n+1} | X_n, X_{n-1}, \ldots] = \sum_{j=1}^p a_j X_{n+1-j}

The prediction error is simply the white noise term ϵn+1\epsilon_{n+1}, with variance σ2\sigma^2.

Multi-Step Ahead Prediction

For hh-step ahead forecast X^n+h\hat{X}_{n+h}, we iterate the AR equation, replacing future values with their forecasts:

X^n+h=a1X^n+h1++apX^n+hp\hat{X}_{n+h} = a_1 \hat{X}_{n+h-1} + \cdots + a_p \hat{X}_{n+h-p}

where X^n+k=Xn+k\hat{X}_{n+k} = X_{n+k} if k0k \le 0 (observed values).

Forecast Error Variance

The error en(h)=Xn+hX^n+he_n(h) = X_{n+h} - \hat{X}_{n+h} can be written using the MA(infty\\infty) representation:

en(h)=j=0h1ψjϵn+hje_n(h) = \sum_{j=0}^{h-1} \psi_j \epsilon_{n+h-j}

MSE: σ2(h)=σ2j=0h1ψj2\sigma^2(h) = \sigma^2 \sum_{j=0}^{h-1} \psi_j^2

Convergence

As hh \to \infty, the forecast converges to the process mean (0), and the variance converges to the process variance γ0\gamma_0. This reflects the "short memory" of stationary AR processes.

Historical Development

The evolution of autoregressive modeling

1927G.U. Yule

Sunspot Modeling

Yule introduced the autoregressive model to explain the cyclical but irregular behavior of sunspot numbers, proposing a 'disturbed pendulum' mechanism.

1938Herman Wold

Wold Decomposition

Proved that any stationary process can be decomposed into a deterministic part and a stochastic part (infinite moving average), linking AR and MA representations.

1940sMann & Wald

Statistical Inference

Developed the asymptotic theory for parameter estimation in AR processes, establishing the consistency and normality of least squares estimators.

1960sBox & Jenkins

ARIMA Methodology

Systematized the iterative cycle of identification, estimation, and diagnostic checking, making ARMA/ARIMA models the standard for forecasting.

1980sSims & Granger

VAR & Cointegration

Extended AR to multivariate systems (VAR) and non-stationary cointegrated systems, revolutionizing macroeconometrics.

Frequently Asked Questions

Why must AR characteristic roots lie outside the unit circle?

Roots outside the unit circle (> 1 in absolute value) ensure exponential decay of the homogeneous solution, guaranteeing stationarity. If a root lies on or inside the unit circle, the process becomes non-stationary with unbounded variance.

What is the intuition behind the Yule-Walker equations?

They express a fundamental consistency relationship: in a stationary AR process, the autocovariance structure must satisfy the same linear relationship as the process itself. This allows us to estimate AR coefficients directly from sample autocovariances.

How does the PACF help identify AR model order?

For an AR(p) process, the PACF truncates after lag p—i.e., φ_kk = 0 for k > p. This diagnostic property allows model order selection: the last significant PACF lag indicates the appropriate AR order. This contrasts with the ACF, which decays gradually.

Can an AR(p) model be represented as an MA(∞) process?

Yes, via the Wold decomposition. When all characteristic roots lie outside the unit circle, we can invert A(B) to get X_t = A⁻¹(B)ε_t = ∑ψ_jε_{t-j}. The ψ_j coefficients decay exponentially, ensuring the infinite sum converges.

What makes the Levinson-Durbin algorithm efficient?

It exploits the Toeplitz structure of the autocovariance matrix, reducing computation from O(p³) to O(p²). The algorithm recursively builds solutions for orders 1 through p, computing PACF coefficients as byproducts—essential for large-scale time series analysis.

Chapter Summary

Core Concepts

  • Stationarity: Requires all characteristic roots to lie outside the unit circle.
  • Yule-Walker: Links model parameters to autocovariances, enabling estimation.
  • PACF: The primary tool for identifying AR order (cuts off after lag p).

Practical Application

  • Spectral Analysis: Reveals dominant cycles and frequency components.
  • Diagnostics: Residuals must be white noise (Ljung-Box test).
  • Extensions: Basis for ARIMA, GARCH, and VAR models.

Further Reading

Recommended resources for deeper study

Time Series: Theory and Methods
Brockwell & Davis (1991)

Chapter 3 provides the definitive mathematical treatment of ARMA processes and difference equations.

Time Series Analysis
James D. Hamilton (1994)

Chapters 1-3 offer excellent intuition on stationarity, lag operators, and dynamic multipliers.

Forecasting: Principles and Practice
Hyndman & Athanasopoulos (2021)

A modern, practical guide to using AR models for forecasting with R/Python examples.

Analysis of Financial Time Series
Ruey S. Tsay (2010)

Focuses on applications in finance, including AR-GARCH models and volatility analysis.