AR Processes, Difference Equations & Spectral Analysis
Core concepts and mathematical foundations
A stochastic process is an autoregressive process of order if it satisfies:
where is white noise and are real coefficients with .
The current value is a linear combination of past values plus an innovation term. This models persistence and momentum in time series data.
AR models exhibit short memory: influence of past values decays exponentially over time, controlled by the characteristic roots.
The lag operator (backshift operator) is defined by:
Using the lag operator, AR(p) becomes:
where is the characteristic polynomial.
An AR(p) process is weakly stationary if and only if:
Equivalently, all roots lie outside the unit circle in the complex plane.
Stationary: Solutions decay exponentially to zero, ensuring bounded variance.
Unit Root: Persistent oscillations, non-stationary (requires differencing).
Explosive: Solutions grow exponentially, variance → ∞.
Rigorous mathematical foundations with step-by-step derivations
Theorem Statement:
The AR(p) process is weakly stationary if and only if all roots of the characteristic equation satisfy .
Proof:
Let denote the distinct roots with multiplicities where .
where are constants determined by initial conditions.
This ensures the process doesn't explode or maintain unit root oscillations.
Polynomial growth is dominated by exponential decay when .
with (absolutely summable), guaranteeing finite variance.
Practical Implication:
Before fitting an AR model, verify that estimated characteristic roots lie outside the unit circle. If not, the data likely requires differencing (ARIMA modeling) or has structural breaks.
Theorem Statement:
For a stationary AR(p) process, the autocovariance function satisfies:
with variance:
Proof:
since is independent of all past values.
Since , we have .
For , the Yule-Walker system in matrix form:
The coefficient matrix is a Toeplitz matrix (constant diagonals), guaranteed positive definite for stationary processes.
Theorem Statement:
For an AR(p) process, the partial autocorrelation function (PACF) satisfies:
Proof Sketch:
Model Identification Rule:
Plot sample PACF with 95% confidence bands . The order p is identified where PACF last exceeds the bands before remaining within them.
Step-by-step solutions for model analysis and parameter estimation
Problem:
Consider the process where . Determine stationarity, calculate the mean, variance, and the first 3 values of the ACF. Find the spectral density.
The characteristic polynomial is .
Root: .
Since , the root lies outside the unit circle.
Mean
Variance
For AR(1), .
Note the geometric decay, characteristic of AR(1) processes.
Using the formula :
Analysis: The denominator is minimized when (i.e., ), maximizing . This indicates a "low-pass" filter effect, dominating low frequencies (smooth trends).
Problem:
For the model with , find the characteristic roots, check stationarity, and calculate .
Characteristic equation:
Using quadratic formula for :
Both and . The process is stationary.
The Yule-Walker equations for AR(2):
From eq (2):
Substitute and into eq (1):
Solving this linear system yields:
Problem:
Investigate the behavior of . Determine the period of the pseudo-cycles.
Discriminant . Roots are complex.
Modulus . Stationary.
The ACF exhibits damped cosine waves: .
Frequency is determined by the argument of the roots .
Period: .
The process exhibits pseudo-cycles of length 12.
Problem:
A time series yields sample autocorrelations and . Estimate the parameters for an AR(2) model.
For AR(2), the normalized Yule-Walker equations (using correlations) are:
From eq 1: . Substitute into eq 2:
Now find :
Estimated Model:
Check stationarity: , , . Valid stationary model.
Understanding AR processes in the frequency domain
The spectral density describes how the variance of the time series is distributed across different frequencies .
General Formula for AR(p):
Positive φ: Peak at . Looks like "red noise" (smooth).
Negative φ: Peak at . Looks like "blue noise" (jagged).
With complex roots , the spectrum has a peak near .
The closer is to 1, the sharper the peak (stronger periodicity).
Ensuring model adequacy through residual analysis and order selection
Look for the lag where PACF cuts off.
Rule: for and for .
Penalizes complexity. Good for prediction, tends to overestimate order slightly.
Stronger penalty. Consistent estimator of true order as .
If the model is adequate, residuals should behave like white noise.
From heteroskedasticity to multivariate systems and deep learning
Standard AR models assume constant variance (). Financial data often exhibits volatility clustering.
GARCH(1,1) Model:
Used extensively in risk management (Value-at-Risk) and option pricing.
Extends AR to variables. is a vector, are matrices.
VAR(p) Model:
Captures feedback loops between variables (e.g., GDP, Inflation, Interest Rates).
Modern approaches replace linear combinations with non-linear neural networks.
Neural AR:
Includes RNNs, LSTMs, and Transformers (e.g., Temporal Fusion Transformer).
Efficient recursive algorithm for solving Yule-Walker equations
Solving the Yule-Walker system directly via matrix inversion (Gaussian elimination) requires operations.
The Levinson Advantage:
By exploiting the Toeplitz structure of , the Levinson-Durbin algorithm reduces complexity to .
This is crucial for fitting high-order AR models (e.g., in speech processing or geophysical signal analysis).
Initialization:
Recursion (for k = 2 to p):
Note: is the partial autocorrelation at lag k.
Optimal linear prediction and mean squared error analysis
The optimal predictor (minimizing MSE) of given history is the conditional expectation:
The prediction error is simply the white noise term , with variance .
For -step ahead forecast , we iterate the AR equation, replacing future values with their forecasts:
where if (observed values).
The error can be written using the MA() representation:
MSE:
As , the forecast converges to the process mean (0), and the variance converges to the process variance . This reflects the "short memory" of stationary AR processes.
Roots outside the unit circle (> 1 in absolute value) ensure exponential decay of the homogeneous solution, guaranteeing stationarity. If a root lies on or inside the unit circle, the process becomes non-stationary with unbounded variance.
They express a fundamental consistency relationship: in a stationary AR process, the autocovariance structure must satisfy the same linear relationship as the process itself. This allows us to estimate AR coefficients directly from sample autocovariances.
For an AR(p) process, the PACF truncates after lag p—i.e., φ_kk = 0 for k > p. This diagnostic property allows model order selection: the last significant PACF lag indicates the appropriate AR order. This contrasts with the ACF, which decays gradually.
Yes, via the Wold decomposition. When all characteristic roots lie outside the unit circle, we can invert A(B) to get X_t = A⁻¹(B)ε_t = ∑ψ_jε_{t-j}. The ψ_j coefficients decay exponentially, ensuring the infinite sum converges.
It exploits the Toeplitz structure of the autocovariance matrix, reducing computation from O(p³) to O(p²). The algorithm recursively builds solutions for orders 1 through p, computing PACF coefficients as byproducts—essential for large-scale time series analysis.
ACF measures total correlation (direct + indirect) at each lag. PACF measures only the direct correlation after removing effects of intermediate lags. For AR(p): ACF decays, PACF cuts off after lag p. This pattern is crucial for model identification.