MathIsimple – Simple, Friendly Math Tools & Learning

ARMA Foundations

Understanding the mixed autoregressive-moving average structure

The ARMA(p,q) Model

Definition

An ARMA(p,q) process is a stationary time series that satisfies:

X_t - \phi_1 X_{t-1} - \dots - \phi_p X_{t-p} = \epsilon_t + \theta_1\epsilon_{t-1} + \dots + \theta_q\epsilon_{t-q}

Using the backshift operator $B$ , we can write this compactly as:

\phi(B)X_t = \theta(B)\epsilon_t

where $\phi(B) = 1 - \phi_1 B - \dots - \phi_p B^p$ and $\theta(B) = 1 + \theta_1 B + \dots + \theta_q B^q$ .

Why ARMA? The Parsimony Principle

A key advantage of ARMA models is parameter parsimony. Many processes that would require high-order AR or MA models can be represented with far fewer parameters using ARMA.

Example:

ARMA(1,1) uses 2 parameters but can mimic an MA(∞) or AR(∞) process. This reduces overfitting risk and improves forecast accuracy.

Stationarity & Invertibility

For an ARMA(p,q) process:

Stationary if all roots of $\phi(z)=0$ lie outside the unit circle.
Invertible if all roots of $\theta(z)=0$ lie outside the unit circle.

Both conditions are required for a well-posed ARMA model.

The Common Roots Problem

Parameter Redundancy

If $\phi(B)$ and $\theta(B)$ have a common factor, the model is over-parameterized. For example, if both have a root at $z=c$ , we can cancel the factor $(1-cB)$ :

\frac{\theta(B)}{\phi(B)} = \frac{(1-cB)\tilde{\theta}(B)}{(1-cB)\tilde{\phi}(B)} = \frac{\tilde{\theta}(B)}{\tilde{\phi}(B)}

This produces a simpler ARMA model with the same statistical properties. Always enforce stationarity and invertibility constraints to avoid this issue.

Theoretical Properties

ACF and PACF Behavior

Unlike AR (PACF cuts off) or MA (ACF cuts off), ARMA models have a characteristic where both ACF and PACF tail off.

Key Signatures:

ACF: Exponential decay or damped sine wave
PACF: Exponential decay or damped sine wave
No clear cut-off point in either

This makes visual identification more challenging than for pure AR or MA models.

The Psi-Weights Representation

Any stationary and invertible ARMA(p,q) process can be written as an MA(∞) process:

X_t = \sum_{j=0}^{\infty} \psi_j \epsilon_{t-j} = \psi(B)\epsilon_t

where $\psi(B) = \theta(B)/\phi(B)$ . The $\psi_j$ coefficients are called the impulse response function.

These weights show how a shock at time t affects future values.

ARMA(1,1) Deep Dive

Detailed analysis of the simplest mixed model

Model Structure & Properties

Basic Form

X_t = \phi X_{t-1} + \epsilon_t + \theta\epsilon_{t-1}

where $|\phi| < 1$ (stationarity) and $|\theta| < 1$ (invertibility).

Autocovariance Derivation

Multiply both sides by $X_{t-k}$ and take expectations:

For $k=0$ :

\gamma_0 = \phi\gamma_1 + \sigma^2(1 + 2\phi\theta + \theta^2)

For $k=1$ :

\gamma_1 = \phi\gamma_0 + \sigma^2\theta

For $k \geq 2$ :

\gamma_k = \phi\gamma_{k-1}

ACF Formula

Solving the Yule-Walker equations:

\rho_1 = \frac{(1+\phi\theta)(\phi+\theta)}{1 + 2\phi\theta + \theta^2}

\rho_k = \phi^{k-1}\rho_1, \quad k \geq 2

Note: ACF decays exponentially after lag 1, with the initial value depending on both $\phi$ and $\theta$ .

Parameter Redundancy Example

The Problem

Consider ARMA(1,1) with $\phi = 0.5$ and $\theta = 0.5$ :

(1 - 0.5B)X_t = (1 + 0.5B)\epsilon_t

This is problematic because both polynomials have a root at $z = 2$ . The model simplifies to:

X_t = \epsilon_t

So the ARMA(1,1) reduces to white noise! This demonstrates why we must enforce stationarity ( $|\phi| < 1$ ) and invertibility ( $|\theta| < 1$ ) constraints.

Model Identification & Estimation

Extended Autocorrelation Function (EACF)

The EACF is a powerful tool for identifying ARMA(p,q) orders. It's based on the idea that after removing the AR effects, the residuals should exhibit MA(q) behavior.

Procedure:

Fit AR(k) models for k = 0, 1, 2, ...
Compute ACF of residuals from each fit
Construct EACF table: rows are AR order, columns are MA lag
Look for upper-left triangle of 'O's (zeros) - indicates (p,q)

Pattern: The first 'O' in each row indicates a potential q, and the row number indicates p.

Forecasting with ARMAModels

Constructing optimal forecasts using the recursive algorithm

Recursive Forecast Equations

The h-step Ahead Forecast

For an ARMA(p,q) model, the h-step ahead forecast $\hat{X}_{n+h|n}$ (forecast made at time n for time n+h) is:

\hat{X}_{n+h|n} = \sum_{i=1}^p \phi_i \hat{X}_{n+h-i|n} + \sum_{j=h}^q \theta_j \hat{\epsilon}_{n+h-j}

with the convention that:

$\hat{X}_{n+k|n} = X_{n+k}$ for $k \leq 0$ (use observed values)
$\hat{\epsilon}_{n+k} = 0$ for $k > 0$ (future errors have expectation zero)
$\hat{\epsilon}_k$ = residual from fitting the model for $k \leq n$

AR Component Contribution

The AR part depends on past predicted values. For h = 1, we use the last p observed values directly. For h > p, we use predicted values recursively.

Example (AR(1) Component):

$\hat{X}_{n+1|n} = \phi X_n + \theta\hat{\epsilon}_n$
$\hat{X}_{n+2|n} = \phi \hat{X}_{n+1|n}$

MA Component Contribution

The MA part uses past residuals (innovations). These are only available for lags up to n. Beyond the MA order q, the contribution vanishes.

Key Insight:

For h > q, the MA component makes no direct contribution (all $\hat{\epsilon}_{n+h-j} = 0$ ), and forecasts depend only on the AR structure.

Long-term Forecast Behavior

Convergence to the Mean

As $h \to \infty$ , the ARMA forecast converges to the unconditional mean of the process (usually assumed to be zero for centered data):

\lim_{h \to \infty} \hat{X}_{n+h|n} = E[X_t] = \mu

This convergence is exponential at a rate determined by the AR roots. The closer the roots are to the unit circle, the slower the convergence.

Practical Application: GDP Growth Modeling

Applying ARMA to quarterly economic data

Scenario: Quarterly GDP Growth Rate

After removing trend and seasonal components from quarterly GDP data, we are left with the cyclical component $\{Y_t\}$ . This represents short-term fluctuations around the trend. Economic theory suggests that shocks have both immediate (MA) and persistent (AR) effects.

1Data Inspection

Plot ACF and PACF of the cyclical component.

ACF: Decays slowly (no clear cut-off).
PACF: Decays slowly (no clear cut-off).

Conclusion: Consider ARMA models.

2Model Selection

Fit ARMA(p,q) for $p, q \in \{0, 1, 2\}$ and compute AIC/BIC.

Best Model (BIC):

ARMA(2,1) with BIC = -142.3

ARMA(2,1) balances fit and parsimony.

3Parameter Estimation

Using MLE, we estimate:

\hat{\phi}_1 = 0.62

\hat{\phi}_2 = 0.25

\hat{\theta}_1 = -0.38

All roots outside unit circle → stationary & invertible ✓

4Diagnostics

Check residuals for white noise:

Ljung-Box: p = 0.42 > 0.05 ✓
ACF of residuals: All within bounds ✓

Model fits well!

Wold Representation & MA(∞) Form

The infinite moving average representation and coefficient computation

The MA(∞) Representation

From Operator Form to Wold Form

Starting from the ARMA equation $\phi(B)X_t = \theta(B)\epsilon_t$ , we can invert the AR operator (assuming stationarity):

X_t = \phi^{-1}(B)\theta(B)\epsilon_t = \Psi(B)\epsilon_t = \sum_{j=0}^{\infty} \psi_j \epsilon_{t-j}

where $\Psi(B) = \sum_{j=0}^{\infty} \psi_j B^j$ is the Wold coefficient polynomial.

Recursive Formula for Wold Coefficients

The coefficients $\psi_j$ can be computed recursively:

\psi_0 = 1

\psi_j = \sum_{k=1}^{\min(j,p)} \phi_k \psi_{j-k} + \theta_j, \quad j \geq 1

where $\theta_j = 0$ for $j > q$ . This formula is crucial for practical computation.

Programming Implementation

Example: Computing first 5 Wold coefficients for ARMA(2,1) with $\phi_1=0.5, \phi_2=-0.3, \theta_1=0.7$ :

\psi_0 = 1

\psi_1 = 0.5 \cdot 1 + 0.7 = 1.2

\psi_2 = 0.5 \cdot 1.2 - 0.3 \cdot 1 = 0.3

\psi_3 = 0.5 \cdot 0.3 - 0.3 \cdot 1.2 = -0.21

\psi_4 = 0.5 \cdot (-0.21) - 0.3 \cdot 0.3 = -0.195

Extended Yule-Walker Equations

Autocovariance function structure for ARMA processes

The General ARMA Autocovariance Function

Core Equation

For an ARMA(p,q) process, the autocovariance function satisfies:

\gamma_k = \sum_{j=1}^p \phi_j \gamma_{k-j} + \sigma^2 \sum_{j=0}^q \theta_j \psi_{j-k}, \quad k \in \mathbb{Z}

where $\psi_j$ are the Wold coefficients and $\theta_0 = 1$ .

Special Case: k > q

When $k > q$ , all $\psi_{j-k} = 0$ for $j \leq q$ , so the equation simplifies to:

\gamma_k = \sum_{j=1}^p \phi_j \gamma_{k-j}

This is identical to the pure AR(p) Yule-Walker equation! The MA part only affects lags up to q.

Matrix Form for Parameter Estimation

For lags $k = q+1, \ldots, q+p$ , we can write:

\Gamma_{p,q} \cdot \boldsymbol{\phi} = \boldsymbol{\gamma}

where $\Gamma_{p,q} = (\gamma_{q+i-j})_{i,j=1,\ldots,p}$ and $\boldsymbol{\gamma} = (\gamma_{q+1}, \ldots, \gamma_{q+p})^T$ .

Two-Stage Parameter Identification Process

1

Solve for AR Parameters

Use Y-W equations with sample autocovariances to estimate φ₁, ..., φₚ.

2

Construct Auxiliary Series

Compute Yₜ = φ(B)Xₜ, which should be an MA(q) process.

3

Estimate MA Parameters

Apply MA estimation methods to Yₜ to get θ₁, ..., θ_q.

4

Refine & Validate

Use MLE for final parameter refinement and check residuals.

Spectral Analysis of ARMA Models

Frequency domain representation and rational spectral density

The Rational Spectral Density Function

Definition

The spectral density of an ARMA(p,q) process has an elegant rational form:

f(\lambda) = \frac{\sigma^2}{2\pi} \left|\frac{\theta(e^{i\lambda})}{\phi(e^{i\lambda})}\right|^2

where $\lambda \in [-\pi, \pi]$ is the frequency. This ratio of polynomials evaluated on the unit circle characterizes the process completely in the frequency domain.

AR Component Influence

•Denominator poles: The roots of $\phi(z)=0$ create spectral peaks.
•Near-unit-circle roots: Produce sharp resonances at specific frequencies.
•Interpretation: AR components amplify certain frequencies, creating oscillatory behavior.

MA Component Influence

•Numerator zeros: The roots of $\theta(z)=0$ create spectral troughs.
•Near-unit-circle roots: Produce sharp notches that filter out frequencies.
•Interpretation: MA components attenuate certain frequencies, smoothing the series.

Comprehensive Case Studies

Real-world ARMA model applications with complete workflows

Case Study 1: ARMA(4,2) Spectral Characteristics

Model Specification

Consider an ARMA(4,2) process with the following parameters:

AR Parameters:

\phi_1 = -0.9, \phi_2 = -1.4, \phi_3 = -0.7, \phi_4 = -0.6

MA Parameters:

\theta_1 = 0.5, \theta_2 = -0.4

Root Analysis

AR Polynomial Roots:

|z₁| = 1.23 ✓
|z₂| = 1.18 ✓
|z₃| = 1.45 ✓
|z₄| = 1.67 ✓

Stationary ✓

MA Root Check

MA Polynomial Roots:

|z₁| = 2.25 ✓
|z₂| = 1.11 ✓

Invertible ✓

Spectral Peaks

Dominant Frequencies:

λ = 1.5 (strong)
λ = 2.2 (moderate)

Indicates cyclic behavior at these frequencies.

Case Study 2: Model Reconstruction from Sample Autocovariances

Initial Data

Suppose we have estimated the following sample autocovariances from 500 observations:

(\hat{\gamma}_0, \hat{\gamma}_1, \hat{\gamma}_2, \hat{\gamma}_3, \hat{\gamma}_4) = (4.61, -1.06, 0.29, 0.69, -0.12)

Step 1: Solve for AR Parameters

Using the matrix equation $\Gamma_{2,2} \boldsymbol{\phi} = \boldsymbol{\gamma}$ :

\begin{pmatrix} 0.29 & -1.06 \\ 0.69 & 0.29 \end{pmatrix} \begin{pmatrix} \phi_1 \\ \phi_2 \end{pmatrix} = \begin{pmatrix} 0.69 \\ -0.12 \end{pmatrix}

Solution: $\hat{\phi}_1 = 0.0894, \hat{\phi}_2 = -0.6265$

Step 2: Iterative MA Parameter Estimation

Construct $Y_t = X_t - 0.0894X_{t-1} + 0.6265X_{t-2}$ and estimate its MA structure:

Iteration k	$\theta_1$	$\theta_2$	$\sigma^2$
6	-0.3171	0.7375	4.4379
12	-0.3281	0.7986	4.0981
20	-0.3333	0.8122	4.0299
51	-0.3334	0.8158	4.0119

Convergence achieved after 51 iterations. Final model: $X_t = 0.0894X_{t-1} - 0.6265X_{t-2} + \epsilon_t - 0.3334\epsilon_{t-1} + 0.8158\epsilon_{t-2}$

ARIMA Extension & Best Practices

From ARMA to ARIMA: Handling non-stationary series

ARIMA(p,d,q): The Integrated ARMA Model

Definition

When the original series $\{X_t\}$ is non-stationary, we apply differencing $d$ times to achieve stationarity:

\phi(B)(1-B)^d X_t = \theta(B)\epsilon_t

The differenced series $Y_t = (1-B)^d X_t$ follows an ARMA(p,q) model. This is called an ARIMA(p,d,q) model.

Common Differencing Orders

d=1: Removes linear trend
d=2: Removes quadratic trend
d≥3: Rarely needed in practice

Overdifferencing can introduce spurious ACF structure.

Unified Framework

ARIMA unifies many classical models:

• ARIMA(0,1,0) = Random Walk
• ARIMA(0,1,1) = Exponential Smoothing
• ARIMA(0,2,2) = Holt's Linear Method

Forecasting Adjustment

After forecasting $Y_t$ , we must "un-difference" to get forecasts for $X_t$ :

\hat{X}_{n+h} = \hat{Y}_{n+h} + \sum_{i=1}^d \binom{d}{i} X_{n-i+1}

Complete ARMA/ARIMA Modeling Workflow

Model Identification

1.Stationarity Tests: ADF test, KPSS test. If non-stationary, difference until stationary.
2.ACF/PACF Analysis: ACF cuts off at q → MA(q); PACF cuts off at p → AR(p); Both tail off → ARMA(p,q).
3.Model Selection: Fit multiple candidates and compare using AIC, BIC.

Estimation & Validation

4.Parameter Estimation: Method of Moments (Y-W), MLE, or Conditional Least Squares.
5.Diagnostics: Ljung-Box test on residuals, check for normality, plot residual ACF.
6.Validation: Out-of-sample forecast evaluation, rolling-window cross-validation.

⚡ Pro Tips

• Start simple: Try AR(1), MA(1), ARMA(1,1) before higher orders.
• Enforce constraints: Always check stationarity and invertibility of estimated parameters.
• Avoid overfitting: Prefer lower-order models with similar AIC/BIC.
• Transform if needed: Log-transform for stabilizing variance, seasonal differencing for seasonality.

Special Case: IMA(1,1) Model Analysis

Model Definition & Expansion

The IMA(1,1) model is widely used in business and economics:

X_t = X_{t-1} + \epsilon_t - \theta\epsilon_{t-1}

Assuming observations start at time $-m$ (before which all values are 0), we can recursively expand:

X_t = \epsilon_t + (1-\theta)\epsilon_{t-1} + \cdots + (1-\theta)\epsilon_{-m} - \theta\epsilon_{-m-1}

Variance Structure

Computing the variance of $X_t$ :

\text{Var}(X_t) = [1 + \theta^2 + (1-\theta)^2(t+m)]\sigma^2

As $t \to \infty$ , $\text{Var}(X_t) \to \infty$ , confirming non-stationarity.

Correlation Properties

For large $m$ and moderate $k$ :

\text{Corr}(X_t, X_{t+k}) \approx \frac{t+m}{t+m+k}

High positive correlations across multiple lags reflect persistent trends common in economic data.

Seasonal ARIMA Models

Modeling periodic patterns in time series data

SARIMA(p,d,q)×(P,D,Q)ₛ Framework

Model Definition

A Seasonal ARIMA model combines non-seasonal and seasonal components:

\phi_p(B)\Phi_P(B^s)(1-B)^d(1-B^s)^D X_t = \theta_q(B)\Theta_Q(B^s)\epsilon_t

Non-seasonal components:

• p: AR order
• d: differencing order
• q: MA order

Seasonal components:

• P: Seasonal AR order
• D: Seasonal differencing
• Q: Seasonal MA order
• s: Seasonal period

Seasonal MA Example

For monthly data (s=12), a simple seasonal MA(1)₁₂ model:

X_t = \epsilon_t + \Theta \epsilon_{t-12}

ACF is non-zero only at lag 12, capturing annual cyclicality.

Multiplicative Structure

Expanding (1-θB)(1-ΘB¹²):

X_t = \epsilon_t - \theta\epsilon_{t-1} - \Theta\epsilon_{t-12} + \theta\Theta\epsilon_{t-13}

Creates dependencies at lags 1, 12, and 13.

Practical Seasonal Modeling Workflow

1

Identify Seasonality

Plot time series and ACF. Look for repeating patterns at regular intervals (e.g., every 12 months).

2

Apply Seasonal Differencing

Compute ∇ₛ = (1-Bˢ)Xₜ to remove seasonal trend. Check if one differencing is enough.

3

Apply Non-seasonal Differencing

If trend remains after seasonal differencing, apply ∇ᵈ. Typically d=0 or 1.

4

Identify P and Q

Examine ACF/PACF at seasonal lags (s, 2s, 3s, ...). Spikes suggest P or Q orders.

5

Identify p and q

Examine ACF/PACF at non-seasonal lags (1, 2, 3, ...) for non-seasonal components.

6

Estimate & Diagnose

Fit model using MLE. Check residuals: should be white noise with no seasonal pattern.

Frequently Asked Questions

When should I use ARMA instead of pure AR or pure MA?

Use ARMA when both ACF and PACF tail off without a clear cut-off point. The principle of parsimony suggests that if you can achieve the same fit with fewer parameters using ARMA(p,q) instead of a high-order AR or MA model, you should prefer ARMA. For example, ARMA(1,1) can often mimic the behavior of AR(∞) or MA(∞) with just 2 parameters.

What is parameter redundancy and why is it a problem?

Parameter redundancy occurs when the AR and MA polynomials share common roots. This makes the model non-identifiable - multiple parameter sets produce identical time series. For example, if φ(B) and θ(B) both have a root at B=0.5, we can cancel them out and get a simpler model. Always check for common roots and enforce stationarity/invertibility to avoid this issue.

How do I choose p and q for an ARMA model?

Start with examining the ACF and PACF plots. If both tail off, consider ARMA. Use information criteria like AIC or BIC to compare models with different (p,q) orders. The Extended ACF (EACF) can also help identify the orders. Finally, check residual diagnostics - a good model should leave white noise residuals.

Can ARMA models handle non-stationary data?

No. ARMA models require stationary data. If your data has trends or changing variance, you need to either (1) difference the data first (leading to ARIMA models), (2) detrend using regression, or (3) apply transformations like log or Box-Cox. Only after achieving stationarity can you fit an ARMA model.

Why is ARMA forecasting more complex than AR?

Because ARMA forecasts depend on both past values AND past errors (innovations). Past errors are unobserved, so we must estimate them recursively using the Innovations Algorithm. This makes the forecast computation iterative, unlike AR models where we can directly use past observations. However, modern software handles this complexity automatically.

Chapter Summary

Core Concepts

• ARMA(p,q): Combines AR and MA for parsimonious modeling.
• Both ACF & PACF tail off: Distinct from pure AR or MA.
• Parameter Redundancy: Avoid common roots in AR and MA polynomials.

Practical Skills

• Identification: Use EACF for initial guidance.
• Selection: Compare models using AIC/BIC.
• Estimation: MLE with numerical optimization.

ARMA Models: Merging AR and MA

Learning Objectives

ARMA Foundations

Definition

Why ARMA? The Parsimony Principle

Stationarity & Invertibility

Parameter Redundancy

Theoretical Properties

ARMA(1,1) Deep Dive

Basic Form

Autocovariance Derivation

ACF Formula

The Problem

Model Identification & Estimation

Forecasting with ARMAModels

The h-step Ahead Forecast

AR Component Contribution

MA Component Contribution

Convergence to the Mean

Practical Application: GDP Growth Modeling

Further Reading

Wold Representation & MA(∞) Form

From Operator Form to Wold Form

Recursive Formula for Wold Coefficients

Programming Implementation

Extended Yule-Walker Equations

Core Equation

Special Case: k > q

Matrix Form for Parameter Estimation

Solve for AR Parameters

Construct Auxiliary Series

Estimate MA Parameters

Refine & Validate

Spectral Analysis of ARMA Models

Definition

AR Component Influence

MA Component Influence

Comprehensive Case Studies

Model Specification

Root Analysis

MA Root Check

Spectral Peaks

Initial Data

Step 1: Solve for AR Parameters

Step 2: Iterative MA Parameter Estimation

ARIMA Extension & Best Practices

Definition

Common Differencing Orders

Unified Framework

Forecasting Adjustment

Model Identification

Estimation & Validation

⚡ Pro Tips

Model Definition & Expansion

Variance Structure

Correlation Properties

Seasonal ARIMA Models

Model Definition

Seasonal MA Example

Multiplicative Structure

Identify Seasonality

Apply Seasonal Differencing

Apply Non-seasonal Differencing

Identify P and Q

Identify p and q

Estimate & Diagnose

Frequently Asked Questions

Core Concepts

Practical Skills