MathIsimple

Autoregressive Models

AR Processes, Difference Equations & Spectral Analysis

2-3 hours
Advanced
Time Series

Essential Definitions

Core concepts and mathematical foundations

Autoregressive Model AR(p)

Formal Definition

A stochastic process {Xt}\{X_t\} is an autoregressive process of order pp if it satisfies:

Xt=j=1pajXtj+ϵt,tZX_t = \sum_{j=1}^{p} a_j X_{t-j} + \epsilon_t, \quad t \in \mathbb{Z}

where {ϵt}WN(0,σ2)\{\epsilon_t\} \sim WN(0,\sigma^2) is white noise and a1,,apa_1, \ldots, a_p are real coefficients with ap0a_p \neq 0.

Interpretation

The current value is a linear combination of past values plus an innovation term. This models persistence and momentum in time series data.

Key Property

AR models exhibit short memory: influence of past values decays exponentially over time, controlled by the characteristic roots.

Lag Operator (B)

Operator Definition

The lag operator (backshift operator) BB is defined by:

BXt=Xt1B X_t = X_{t-1}BnXt=XtnB^n X_t = X_{t-n}
Properties
  • Linearity: B(aXt+bYt)=aBXt+bBYtB(aX_t + bY_t) = aB X_t + bB Y_t
  • Constants: Bc=cB c = c for any constant cc
  • Composition: BnBm=Bn+mB^n B^m = B^{n+m}
  • Polynomial Action: For ψ(z)=cjzj\psi(z) = \sum c_j z^j, ψ(B)Xt=cjXtj\psi(B)X_t = \sum c_j X_{t-j}
AR(p) in Operator Form

Using the lag operator, AR(p) becomes:

A(B)Xt=ϵtA(B) X_t = \epsilon_t

where A(z)=1j=1pajzjA(z) = 1 - \sum_{j=1}^p a_j z^j is the characteristic polynomial.

Stationarity Condition (Minimum Phase)

Critical Requirement

An AR(p) process is weakly stationary if and only if:

All roots of A(z)=0 satisfy z>1\text{All roots of } A(z) = 0 \text{ satisfy } |z| > 1

Equivalently, all roots lie outside the unit circle in the complex plane.

|z| > 1

Stationary: Solutions decay exponentially to zero, ensuring bounded variance.

|z| = 1

Unit Root: Persistent oscillations, non-stationary (requires differencing).

|z| < 1

Explosive: Solutions grow exponentially, variance → ∞.

Key Theorems & Proofs

Rigorous mathematical foundations with step-by-step derivations

Theorem 1: AR(p) Stationarity Criterion
Necessary and sufficient condition for weak stationarity

Theorem Statement:

The AR(p) process Xt=j=1pajXtj+ϵtX_t = \sum_{j=1}^p a_j X_{t-j} + \epsilon_t is weakly stationary if and only if all roots of the characteristic equation A(z)=0A(z) = 0 satisfy z>1|z| > 1.

Proof:

  1. Setup: Consider the homogeneous difference equation A(B)Xt=0A(B)X_t = 0. The characteristic equation is 1j=1pajzj=01 - \sum_{j=1}^p a_j z^j = 0.

    Let z1,,zkz_1, \ldots, z_k denote the distinct roots with multiplicities r(1),,r(k)r(1), \ldots, r(k) where r(j)=p\sum r(j) = p.

  2. General solution form: The solution to the homogeneous equation is:
    Xt(h)=j=1km=0r(j)1Cj,mtmzjtX_t^{(h)} = \sum_{j=1}^k \sum_{m=0}^{r(j)-1} C_{j,m} t^m z_j^{-t}

    where Cj,mC_{j,m} are constants determined by initial conditions.

  3. Stationarity requires convergence: For the particular solution to exist and be stationary, we need:
    limtXt(h)=0\lim_{t \to \infty} X_t^{(h)} = 0

    This ensures the process doesn't explode or maintain unit root oscillations.

  4. Convergence condition: The term tmzjt0t^m z_j^{-t} \to 0 as tt \to \infty if and only if zj1<1|z_j^{-1}| < 1, i.e., zj>1|z_j| > 1.

    Polynomial growth tmt^m is dominated by exponential decay zjt|z_j|^{-t} when zj>1|z_j| > 1.

  5. Wold representation existence: When all zj>1|z_j| > 1, we can invert the operator:
    Xt=A(B)1ϵt=j=0ψjϵtjX_t = A(B)^{-1} \epsilon_t = \sum_{j=0}^{\infty} \psi_j \epsilon_{t-j}

    with ψj<\sum |\psi_j| < \infty (absolutely summable), guaranteeing finite variance.

  6. Conclusion: The process is weakly stationary with:
    E[Xt]=0,Var(Xt)=σ2j=0ψj2<E[X_t] = 0, \quad \text{Var}(X_t) = \sigma^2 \sum_{j=0}^{\infty} \psi_j^2 < \infty

Practical Implication:

Before fitting an AR model, verify that estimated characteristic roots lie outside the unit circle. If not, the data likely requires differencing (ARIMA modeling) or has structural breaks.

Theorem 2: Yule-Walker Equations
Autocovariance structure and parameter estimation

Theorem Statement:

For a stationary AR(p) process, the autocovariance function satisfies:

γk=j=1pajγkj,k1\gamma_k = \sum_{j=1}^p a_j \gamma_{k-j}, \quad k \geq 1

with variance:

σ2=γ0j=1pajγj\sigma^2 = \gamma_0 - \sum_{j=1}^p a_j \gamma_j

Proof:

  1. Start from AR definition: Multiply both sides by XtkX_{t-k}:
    XtXtk=j=1pajXtjXtk+ϵtXtkX_t X_{t-k} = \sum_{j=1}^p a_j X_{t-j} X_{t-k} + \epsilon_t X_{t-k}
  2. Take expectations: Using linearity of expectation:
    E[XtXtk]=j=1pajE[XtjXtk]+E[ϵtXtk]E[X_t X_{t-k}] = \sum_{j=1}^p a_j E[X_{t-j} X_{t-k}] + E[\epsilon_t X_{t-k}]
  3. Apply white noise orthogonality: For k1k \geq 1, ϵt\epsilon_t is uncorrelated with XtkX_{t-k}:
    E[ϵtXtk]=E[ϵt]E[Xtk]=0E[\epsilon_t X_{t-k}] = E[\epsilon_t] E[X_{t-k}] = 0

    since ϵt\epsilon_t is independent of all past XX values.

  4. Substitute autocovariance: By definition γk=E[XtXtk]\gamma_k = E[X_t X_{t-k}]:
    γk=j=1pajγkj,k1\gamma_k = \sum_{j=1}^p a_j \gamma_{k-j}, \quad k \geq 1
  5. Variance equation (k=0 case): When k=0k=0, E[ϵtXt]0E[\epsilon_t X_t] \neq 0:
    γ0=j=1pajγj+E[ϵtXt]\gamma_0 = \sum_{j=1}^p a_j \gamma_j + E[\epsilon_t X_t]

    Since Xt=ajXtj+ϵtX_t = \sum a_j X_{t-j} + \epsilon_t, we have E[ϵtXt]=σ2E[\epsilon_t X_t] = \sigma^2.

  6. Final variance formula: Rearranging:
    σ2=γ0j=1pajγj\sigma^2 = \gamma_0 - \sum_{j=1}^p a_j \gamma_j
Matrix Formulation

For k=1,,pk = 1, \ldots, p, the Yule-Walker system in matrix form:

[γ0γ1γp1γ1γ0γp2γp1γp2γ0][a1a2ap]=[γ1γ2γp]\begin{bmatrix} \gamma_0 & \gamma_1 & \cdots & \gamma_{p-1} \\ \gamma_1 & \gamma_0 & \cdots & \gamma_{p-2} \\ \vdots & \vdots & \ddots & \vdots \\ \gamma_{p-1} & \gamma_{p-2} & \cdots & \gamma_0 \end{bmatrix} \begin{bmatrix} a_1 \\ a_2 \\ \vdots \\ a_p \end{bmatrix} = \begin{bmatrix} \gamma_1 \\ \gamma_2 \\ \vdots \\ \gamma_p \end{bmatrix}

The coefficient matrix Γp\Gamma_p is a Toeplitz matrix (constant diagonals), guaranteed positive definite for stationary processes.

Theorem 3: PACF Truncation Property
Diagnostic signature for AR model order

Theorem Statement:

For an AR(p) process, the partial autocorrelation function (PACF) satisfies:

ϕkk={akkp0k>p\phi_{kk} = \begin{cases} a_k & k \leq p \\ 0 & k > p \end{cases}

Proof Sketch:

  1. PACF definition: ϕkk\phi_{kk} is the last coefficient in the regression of XtX_t on Xt1,,XtkX_{t-1}, \ldots, X_{t-k}.
  2. For k ≤ p: The optimal k-step predictor includes lags 1 through k, giving ϕkk=ak\phi_{kk} = a_k.
  3. For k > p: Lags beyond p add no predictive power (already captured by lags 1 to p), thus ϕkk=0\phi_{kk} = 0.

Model Identification Rule:

Plot sample PACF with 95% confidence bands ±1.96/n\pm 1.96/\sqrt{n}. The order p is identified where PACF last exceeds the bands before remaining within them.

Worked Examples

Step-by-step solutions for model analysis and parameter estimation

Example 1: Complete Analysis of AR(1) Process
Fundamental
Deriving properties for Xt=0.8Xt1+ϵtX_t = 0.8 X_{t-1} + \epsilon_t

Problem:

Consider the process Xt=0.8Xt1+ϵtX_t = 0.8 X_{t-1} + \epsilon_t where ϵtWN(0,1)\epsilon_t \sim WN(0, 1). Determine stationarity, calculate the mean, variance, and the first 3 values of the ACF. Find the spectral density.

1
Stationarity Check

The characteristic polynomial is A(z)=10.8zA(z) = 1 - 0.8z.

Root: 10.8z=0    z=1/0.8=1.251 - 0.8z = 0 \implies z = 1/0.8 = 1.25.

Since z=1.25>1|z| = 1.25 > 1, the root lies outside the unit circle.

The process is weakly stationary.

2
Moments Calculation

Mean

E[Xt]=0.8E[Xt1]+E[ϵt]E[X_t] = 0.8 E[X_{t-1}] + E[\epsilon_t]μ=0.8μ+0    0.2μ=0    μ=0\mu = 0.8 \mu + 0 \implies 0.2\mu = 0 \implies \mu = 0

Variance

Var(Xt)=0.82Var(Xt1)+Var(ϵt)\text{Var}(X_t) = 0.8^2 \text{Var}(X_{t-1}) + \text{Var}(\epsilon_t)γ0=0.64γ0+1    0.36γ0=1\gamma_0 = 0.64 \gamma_0 + 1 \implies 0.36 \gamma_0 = 1γ0=1/0.362.778\gamma_0 = 1/0.36 \approx 2.778

3
Autocorrelation Function (ACF)

For AR(1), ρk=ϕk=0.8k\rho_k = \phi^k = 0.8^k.

Lag 1ρ1=0.81=0.8\rho_1 = 0.8^1 = 0.8
Lag 2ρ2=0.82=0.64\rho_2 = 0.8^2 = 0.64
Lag 3ρ3=0.83=0.512\rho_3 = 0.8^3 = 0.512

Note the geometric decay, characteristic of AR(1) processes.

4
Spectral Density

Using the formula f(λ)=σ22π11ϕeiλ2f(\lambda) = \frac{\sigma^2}{2\pi} \frac{1}{|1 - \phi e^{-i\lambda}|^2}:

f(λ)=12π110.8eiλ2=12π(1+0.822(0.8)cosλ)f(\lambda) = \frac{1}{2\pi} \frac{1}{|1 - 0.8 e^{-i\lambda}|^2} = \frac{1}{2\pi(1 + 0.8^2 - 2(0.8)\cos\lambda)}f(λ)=12π(1.641.6cosλ)f(\lambda) = \frac{1}{2\pi(1.64 - 1.6\cos\lambda)}

Analysis: The denominator is minimized when cosλ=1\cos\lambda = 1 (i.e., λ=0\lambda = 0), maximizing f(λ)f(\lambda). This indicates a "low-pass" filter effect, dominating low frequencies (smooth trends).

Example 2: AR(2) with Real Roots
Intermediate
Analyzing Xt=0.7Xt10.1Xt2+ϵtX_t = 0.7X_{t-1} - 0.1X_{t-2} + \epsilon_t

Problem:

For the model Xt0.7Xt1+0.1Xt2=ϵtX_t - 0.7X_{t-1} + 0.1X_{t-2} = \epsilon_t with σ2=1\sigma^2=1, find the characteristic roots, check stationarity, and calculate γ0,γ1\gamma_0, \gamma_1.

1. Roots Calculation

Characteristic equation: 10.7z+0.1z2=01 - 0.7z + 0.1z^2 = 0

Using quadratic formula for 0.1z20.7z+1=00.1z^2 - 0.7z + 1 = 0:

z=0.7±0.490.40.2=0.7±0.30.2z = \frac{0.7 \pm \sqrt{0.49 - 0.4}}{0.2} = \frac{0.7 \pm 0.3}{0.2}
z1=1.0/0.2=5z_1 = 1.0/0.2 = 5
z2=0.4/0.2=2z_2 = 0.4/0.2 = 2

Both z1>1|z_1| > 1 and z2>1|z_2| > 1. The process is stationary.

2. Yule-Walker System

The Yule-Walker equations for AR(2):

γ0=0.7γ10.1γ2+σ2\gamma_0 = 0.7\gamma_1 - 0.1\gamma_2 + \sigma^2γ1=0.7γ00.1γ1\gamma_1 = 0.7\gamma_0 - 0.1\gamma_1γ2=0.7γ10.1γ0\gamma_2 = 0.7\gamma_1 - 0.1\gamma_0

3. Solving for Autocovariances

From eq (2): γ1(1+0.1)=0.7γ0    1.1γ1=0.7γ0    ρ1=0.71.10.636\gamma_1(1 + 0.1) = 0.7\gamma_0 \implies 1.1\gamma_1 = 0.7\gamma_0 \implies \rho_1 = \frac{0.7}{1.1} \approx 0.636

Substitute γ1\gamma_1 and γ2\gamma_2 into eq (1):

γ0=0.7(0.636γ0)0.1(0.7(0.636γ0)0.1γ0)+1\gamma_0 = 0.7(0.636\gamma_0) - 0.1(0.7(0.636\gamma_0) - 0.1\gamma_0) + 1

Solving this linear system yields:

γ01.88,γ11.20\gamma_0 \approx 1.88, \quad \gamma_1 \approx 1.20
Example 3: AR(2) with Complex Roots (Pseudo-Cyclic)
Advanced
Analyzing Xt=1.5Xt10.75Xt2+ϵtX_t = 1.5X_{t-1} - 0.75X_{t-2} + \epsilon_t

Problem:

Investigate the behavior of Xt=1.5Xt10.75Xt2+ϵtX_t = 1.5X_{t-1} - 0.75X_{t-2} + \epsilon_t. Determine the period of the pseudo-cycles.

1. Characteristic Roots

11.5z+0.75z2=01 - 1.5z + 0.75z^2 = 0

Discriminant Δ=2.253=0.75<0\Delta = 2.25 - 3 = -0.75 < 0. Roots are complex.

z=1.5±i0.751.5=1±i33z = \frac{1.5 \pm i\sqrt{0.75}}{1.5} = 1 \pm i\frac{\sqrt{3}}{3}

Modulus z=1+1/3=4/31.15>1|z| = \sqrt{1 + 1/3} = \sqrt{4/3} \approx 1.15 > 1. Stationary.

2. Cycle Analysis

The ACF exhibits damped cosine waves: ρkArkcos(kλ0+ϕ)\rho_k \approx A r^k \cos(k\lambda_0 + \phi).

Frequency λ0\lambda_0 is determined by the argument of the roots z1z^{-1}.

cosλ0=a12a2=1.520.75=1.52(0.866)0.866\cos \lambda_0 = \frac{a_1}{2\sqrt{-a_2}} = \frac{1.5}{2\sqrt{0.75}} = \frac{1.5}{2(0.866)} \approx 0.866λ0=arccos(0.866)=π/6\lambda_0 = \arccos(0.866) = \pi/6

Period: P=2π/λ0=12P = 2\pi / \lambda_0 = 12.

The process exhibits pseudo-cycles of length 12.

Example 4: Yule-Walker Parameter Estimation
Practical
Estimating AR(2) coefficients from sample data

Problem:

A time series yields sample autocorrelations ρ^1=0.8\hat{\rho}_1 = 0.8 and ρ^2=0.5\hat{\rho}_2 = 0.5. Estimate the parameters a^1,a^2\hat{a}_1, \hat{a}_2 for an AR(2) model.

1. Setup Yule-Walker System

For AR(2), the normalized Yule-Walker equations (using correlations) are:

{ρ1=a1+a2ρ1ρ2=a1ρ1+a2\begin{cases} \rho_1 = a_1 + a_2 \rho_1 \\ \rho_2 = a_1 \rho_1 + a_2 \end{cases}

2. Substitute Sample Values

{0.8=a^1+0.8a^20.5=0.8a^1+a^2\begin{cases} 0.8 = \hat{a}_1 + 0.8 \hat{a}_2 \\ 0.5 = 0.8 \hat{a}_1 + \hat{a}_2 \end{cases}

3. Solve Linear System

From eq 1: a^1=0.80.8a^2\hat{a}_1 = 0.8 - 0.8\hat{a}_2. Substitute into eq 2:

0.5=0.8(0.80.8a^2)+a^20.5 = 0.8(0.8 - 0.8\hat{a}_2) + \hat{a}_20.5=0.640.64a^2+a^20.5 = 0.64 - 0.64\hat{a}_2 + \hat{a}_20.50.64=0.36a^2    0.14=0.36a^20.5 - 0.64 = 0.36\hat{a}_2 \implies -0.14 = 0.36\hat{a}_2a^2=0.14/0.360.389\hat{a}_2 = -0.14/0.36 \approx -0.389

Now find a^1\hat{a}_1:

a^1=0.80.8(0.389)=0.8+0.311=1.111\hat{a}_1 = 0.8 - 0.8(-0.389) = 0.8 + 0.311 = 1.111

Estimated Model:

Xt=1.11Xt10.39Xt2+ϵtX_t = 1.11 X_{t-1} - 0.39 X_{t-2} + \epsilon_t

Check stationarity: a^1+a^2=0.72<1\hat{a}_1 + \hat{a}_2 = 0.72 < 1, a^2a^1=1.5<1\hat{a}_2 - \hat{a}_1 = -1.5 < 1, a^2<1|\hat{a}_2| < 1. Valid stationary model.

Spectral Density Analysis

Understanding AR processes in the frequency domain

The Spectral Density Function

The spectral density f(λ)f(\lambda) describes how the variance of the time series is distributed across different frequencies λ[π,π]\lambda \in [-\pi, \pi].

General Formula for AR(p):

f(λ)=σ22π1A(eiλ)2=σ22π11j=1pajeijλ2f(\lambda) = \frac{\sigma^2}{2\pi} \frac{1}{|A(e^{-i\lambda})|^2} = \frac{\sigma^2}{2\pi} \frac{1}{|1 - \sum_{j=1}^p a_j e^{-ij\lambda}|^2}
Key Insights:
  • Peaks: Occur at frequencies where A(eiλ)|A(e^{-i\lambda})| is small (roots close to unit circle).
  • Low Frequency: Positive AR coefficients (e.g., a1>0a_1 > 0) boost low frequencies (trends).
  • High Frequency: Negative AR coefficients (e.g., a1<0a_1 < 0) boost high frequencies (rapid oscillation).
Frequency Domain Signatures
AR(1) Spectra

Positive φ: Peak at λ=0\lambda=0. Looks like "red noise" (smooth).
Negative φ: Peak at λ=π\lambda=\pi. Looks like "blue noise" (jagged).

AR(2) Pseudo-Cyclic Spectra

With complex roots re±iλ0re^{\pm i\lambda_0}, the spectrum has a peak near λ0\lambda_0.
The closer rr is to 1, the sharper the peak (stronger periodicity).

"The spectral density is the Fourier transform of the autocovariance function. They contain equivalent information but offer different perspectives."

Model Diagnostics & Validation

Ensuring model adequacy through residual analysis and order selection

Order Selection Criteria
PACF Method

Look for the lag pp where PACF cuts off.
Rule: ϕkk>1.96/n|\phi_{kk}| > 1.96/\sqrt{n} for kpk \le p and 0\approx 0 for k>pk > p.

AIC (Akaike)
AIC(p)=lnσ^p2+2pnAIC(p) = \ln \hat{\sigma}_p^2 + \frac{2p}{n}

Penalizes complexity. Good for prediction, tends to overestimate order slightly.

BIC (Bayesian)
BIC(p)=lnσ^p2+plnnnBIC(p) = \ln \hat{\sigma}_p^2 + \frac{p \ln n}{n}

Stronger penalty. Consistent estimator of true order pp as nn \to \infty.

Residual Whiteness Test

If the model is adequate, residuals ϵ^t\hat{\epsilon}_t should behave like white noise.

  • ACF of Residuals: Should be within bounds ±1.96/n\pm 1.96/\sqrt{n} for all lags k>0k > 0.
  • Ljung-Box Test: Q=n(n+2)k=1mρ^k2(ϵ^)nkQ = n(n+2)\sum_{k=1}^m \frac{\hat{\rho}_k^2(\hat{\epsilon})}{n-k}.
Common Pitfalls
  • Overfitting: Including too many lags increases variance of estimates.
  • Underfitting: Missing lags leaves serial correlation in residuals (biased estimates).
  • Unit Roots: Applying AR to non-stationary data yields spurious results. Always difference first if needed.

Advanced Extensions

From heteroskedasticity to multivariate systems and deep learning

ARCH & GARCH

Standard AR models assume constant variance (σ2\sigma^2). Financial data often exhibits volatility clustering.

GARCH(1,1) Model:

σt2=α0+α1ϵt12+β1σt12\sigma_t^2 = \alpha_0 + \alpha_1 \epsilon_{t-1}^2 + \beta_1 \sigma_{t-1}^2

Used extensively in risk management (Value-at-Risk) and option pricing.

Vector AR (VAR)

Extends AR to kk variables. Xt\mathbf{X}_t is a vector, Aj\mathbf{A}_j are matrices.

VAR(p) Model:

Xt=j=1pAjXtj+ϵt\mathbf{X}_t = \sum_{j=1}^p \mathbf{A}_j \mathbf{X}_{t-j} + \mathbf{\epsilon}_t

Captures feedback loops between variables (e.g., GDP, Inflation, Interest Rates).

Deep Autoregression

Modern approaches replace linear combinations with non-linear neural networks.

Neural AR:

Xt=fNN(Xt1,,Xtp)+ϵtX_t = f_{NN}(X_{t-1}, \ldots, X_{t-p}) + \epsilon_t

Includes RNNs, LSTMs, and Transformers (e.g., Temporal Fusion Transformer).

Levinson-Durbin Recursion

Efficient recursive algorithm for solving Yule-Walker equations

Computational Efficiency

Solving the Yule-Walker system Γpa=γp\Gamma_p \mathbf{a} = \mathbf{\gamma}_p directly via matrix inversion (Gaussian elimination) requires O(p3)O(p^3) operations.

The Levinson Advantage:

By exploiting the Toeplitz structure of Γp\Gamma_p, the Levinson-Durbin algorithm reduces complexity to O(p2)O(p^2).

This is crucial for fitting high-order AR models (e.g., in speech processing or geophysical signal analysis).

The Algorithm

Initialization:

ϕ11=ρ1,σ12=γ0(1ϕ112)\phi_{11} = \rho_1, \quad \sigma_1^2 = \gamma_0(1 - \phi_{11}^2)

Recursion (for k = 2 to p):

ϕkk=ρkj=1k1ϕk1,jρkj1j=1k1ϕk1,jρj\phi_{kk} = \frac{\rho_k - \sum_{j=1}^{k-1} \phi_{k-1,j} \rho_{k-j}}{1 - \sum_{j=1}^{k-1} \phi_{k-1,j} \rho_j}ϕkj=ϕk1,jϕkkϕk1,kj\phi_{kj} = \phi_{k-1,j} - \phi_{kk} \phi_{k-1,k-j}σk2=σk12(1ϕkk2)\sigma_k^2 = \sigma_{k-1}^2 (1 - \phi_{kk}^2)

Note: ϕkk\phi_{kk} is the partial autocorrelation at lag k.

Forecasting with AR Models

Optimal linear prediction and mean squared error analysis

One-Step Ahead Prediction

The optimal predictor (minimizing MSE) of Xn+1X_{n+1} given history Fn\mathcal{F}_n is the conditional expectation:

X^n+1=E[Xn+1Xn,Xn1,]=j=1pajXn+1j\hat{X}_{n+1} = E[X_{n+1} | X_n, X_{n-1}, \ldots] = \sum_{j=1}^p a_j X_{n+1-j}

The prediction error is simply the white noise term ϵn+1\epsilon_{n+1}, with variance σ2\sigma^2.

Multi-Step Ahead Prediction

For hh-step ahead forecast X^n+h\hat{X}_{n+h}, we iterate the AR equation, replacing future values with their forecasts:

X^n+h=a1X^n+h1++apX^n+hp\hat{X}_{n+h} = a_1 \hat{X}_{n+h-1} + \cdots + a_p \hat{X}_{n+h-p}

where X^n+k=Xn+k\hat{X}_{n+k} = X_{n+k} if k0k \le 0 (observed values).

Forecast Error Variance

The error en(h)=Xn+hX^n+he_n(h) = X_{n+h} - \hat{X}_{n+h} can be written using the MA(\infty) representation:

en(h)=j=0h1ψjϵn+hje_n(h) = \sum_{j=0}^{h-1} \psi_j \epsilon_{n+h-j}

MSE: σ2(h)=σ2j=0h1ψj2\sigma^2(h) = \sigma^2 \sum_{j=0}^{h-1} \psi_j^2

Convergence

As hh \to \infty, the forecast converges to the process mean (0), and the variance converges to the process variance γ0\gamma_0. This reflects the "short memory" of stationary AR processes.

Practice Quiz
10
Questions
0
Correct
0%
Accuracy
1
What is the general form of an AR(p) model?
Not attempted
2
For an AR(1) process Xt=ϕXt1+εtX_t = \phi X_{t-1} + \varepsilon_t to be stationary, what condition must ϕ\phi satisfy?
Not attempted
3
What do the Yule-Walker equations allow us to estimate?
Not attempted
4
For an AR(p) process, what is the behavior of the PACF?
Not attempted
5
What is the characteristic polynomial of an AR(2) process Xt=ϕ1Xt1+ϕ2Xt2+εtX_t = \phi_1 X_{t-1} + \phi_2 X_{t-2} + \varepsilon_t?
Not attempted
6
For AR(1) with ϕ=0.5\phi = 0.5, what is the autocorrelation at lag k?
Not attempted
7
What is the variance of an AR(1) process Xt=ϕXt1+εtX_t = \phi X_{t-1} + \varepsilon_t with ϕ<1|\phi| < 1?
Not attempted
8
What is the lag operator B defined as?
Not attempted
9
Can an AR(p) process be represented as an infinite MA process?
Not attempted
10
What is the relationship between AIC and model complexity?
Not attempted

Frequently Asked Questions

Why must AR characteristic roots lie outside the unit circle?

Roots outside the unit circle (> 1 in absolute value) ensure exponential decay of the homogeneous solution, guaranteeing stationarity. If a root lies on or inside the unit circle, the process becomes non-stationary with unbounded variance.

What is the intuition behind the Yule-Walker equations?

They express a fundamental consistency relationship: in a stationary AR process, the autocovariance structure must satisfy the same linear relationship as the process itself. This allows us to estimate AR coefficients directly from sample autocovariances.

How does the PACF help identify AR model order?

For an AR(p) process, the PACF truncates after lag p—i.e., φ_kk = 0 for k > p. This diagnostic property allows model order selection: the last significant PACF lag indicates the appropriate AR order. This contrasts with the ACF, which decays gradually.

Can an AR(p) model be represented as an MA(∞) process?

Yes, via the Wold decomposition. When all characteristic roots lie outside the unit circle, we can invert A(B) to get X_t = A⁻¹(B)ε_t = ∑ψ_jε_{t-j}. The ψ_j coefficients decay exponentially, ensuring the infinite sum converges.

What makes the Levinson-Durbin algorithm efficient?

It exploits the Toeplitz structure of the autocovariance matrix, reducing computation from O(p³) to O(p²). The algorithm recursively builds solutions for orders 1 through p, computing PACF coefficients as byproducts—essential for large-scale time series analysis.

What is the difference between ACF and PACF?

ACF measures total correlation (direct + indirect) at each lag. PACF measures only the direct correlation after removing effects of intermediate lags. For AR(p): ACF decays, PACF cuts off after lag p. This pattern is crucial for model identification.

Ask AI ✨
MathIsimple – Simple, Friendly Math Tools & Learning