Discover the power of combining autoregressive and moving average components. Master parameter parsimony, model identification, and optimal forecasting for complex time series.
Define ARMA(p,q) processes and understand why they're parsimonious
Derive stationarity and invertibility conditions for ARMA models
Analyze ARMA(1,1) in detail including ACF and parameter redundancy
Identify ARMA models using EACF, AIC, and BIC criteria
Estimate parameters using Maximum Likelihood Estimation
Forecast with ARMA models using recursive algorithms
Apply ARMA models to real-world economic data
Understanding the mixed autoregressive-moving average structure
An ARMA(p,q) process is a stationary time series that satisfies:
Using the backshift operator , we can write this compactly as:
where and .
A key advantage of ARMA models is parameter parsimony. Many processes that would require high-order AR or MA models can be represented with far fewer parameters using ARMA.
Example:
ARMA(1,1) uses 2 parameters but can mimic an MA(∞) or AR(∞) process. This reduces overfitting risk and improves forecast accuracy.
For an ARMA(p,q) process:
Both conditions are required for a well-posed ARMA model.
If and have a common factor, the model is over-parameterized. For example, if both have a root at , we can cancel the factor :
This produces a simpler ARMA model with the same statistical properties. Always enforce stationarity and invertibility constraints to avoid this issue.
Unlike AR (PACF cuts off) or MA (ACF cuts off), ARMA models have a characteristic where both ACF and PACF tail off.
Key Signatures:
This makes visual identification more challenging than for pure AR or MA models.
Any stationary and invertible ARMA(p,q) process can be written as an MA(∞) process:
where . The coefficients are called the impulse response function.
These weights show how a shock at time t affects future values.
Detailed analysis of the simplest mixed model
where (stationarity) and (invertibility).
Multiply both sides by and take expectations:
For :
For :
For :
Solving the Yule-Walker equations:
Note: ACF decays exponentially after lag 1, with the initial value depending on both and .
Consider ARMA(1,1) with and :
This is problematic because both polynomials have a root at . The model simplifies to:
So the ARMA(1,1) reduces to white noise! This demonstrates why we must enforce stationarity () and invertibility () constraints.
The EACF is a powerful tool for identifying ARMA(p,q) orders. It's based on the idea that after removing the AR effects, the residuals should exhibit MA(q) behavior.
Procedure:
Pattern: The first 'O' in each row indicates a potential q, and the row number indicates p.
Constructing optimal forecasts using the recursive algorithm
For an ARMA(p,q) model, the h-step ahead forecast (forecast made at time n for time n+h) is:
with the convention that:
The AR part depends on past predicted values. For h = 1, we use the last p observed values directly. For h > p, we use predicted values recursively.
Example (AR(1) Component):
The MA part uses past residuals (innovations). These are only available for lags up to n. Beyond the MA order q, the contribution vanishes.
Key Insight:
For h > q, the MA component makes no direct contribution (all ), and forecasts depend only on the AR structure.
As , the ARMA forecast converges to the unconditional mean of the process (usually assumed to be zero for centered data):
This convergence is exponential at a rate determined by the AR roots. The closer the roots are to the unit circle, the slower the convergence.
Applying ARMA to quarterly economic data
After removing trend and seasonal components from quarterly GDP data, we are left with the cyclical component . This represents short-term fluctuations around the trend. Economic theory suggests that shocks have both immediate (MA) and persistent (AR) effects.
Plot ACF and PACF of the cyclical component.
Conclusion: Consider ARMA models.
Fit ARMA(p,q) for and compute AIC/BIC.
Best Model (BIC):
ARMA(2,1) with BIC = -142.3
ARMA(2,1) balances fit and parsimony.
Using MLE, we estimate:
All roots outside unit circle → stationary & invertible ✓
Check residuals for white noise:
Model fits well!
Chapter 3 provides detailed coverage of ARMA models, including spectral analysis and advanced estimation techniques using state-space methods.
Freely available online textbook with excellent practical guidance on ARMA model selection and forecasting using R's forecast package.
The classic reference that introduced the Box-Jenkins methodology. Essential for understanding the philosophical foundation of ARMA modeling.
An accessible introduction with a focus on practical problem-solving. Strong coverage of model diagnostics and residual analysis.
The infinite moving average representation and coefficient computation
Starting from the ARMA equation , we can invert the AR operator (assuming stationarity):
where is the Wold coefficient polynomial.
The coefficients can be computed recursively:
where for . This formula is crucial for practical computation.
Example: Computing first 5 Wold coefficients for ARMA(2,1) with :
Autocovariance function structure for ARMA processes
For an ARMA(p,q) process, the autocovariance function satisfies:
where are the Wold coefficients and .
When , all for , so the equation simplifies to:
This is identical to the pure AR(p) Yule-Walker equation! The MA part only affects lags up to q.
For lags , we can write:
where and .
Use Y-W equations with sample autocovariances to estimate φ₁, ..., φₚ.
Compute Yₜ = φ(B)Xₜ, which should be an MA(q) process.
Apply MA estimation methods to Yₜ to get θ₁, ..., θ_q.
Use MLE for final parameter refinement and check residuals.
Frequency domain representation and rational spectral density
The spectral density of an ARMA(p,q) process has an elegant rational form:
where is the frequency. This ratio of polynomials evaluated on the unit circle characterizes the process completely in the frequency domain.
Real-world ARMA model applications with complete workflows
Consider an ARMA(4,2) process with the following parameters:
AR Parameters:
MA Parameters:
AR Polynomial Roots:
Stationary ✓
MA Polynomial Roots:
Invertible ✓
Dominant Frequencies:
Indicates cyclic behavior at these frequencies.
Suppose we have estimated the following sample autocovariances from 500 observations:
Using the matrix equation :
Solution:
Construct and estimate its MA structure:
| Iteration k | |||
|---|---|---|---|
| 6 | -0.3171 | 0.7375 | 4.4379 |
| 12 | -0.3281 | 0.7986 | 4.0981 |
| 20 | -0.3333 | 0.8122 | 4.0299 |
| 51 | -0.3334 | 0.8158 | 4.0119 |
Convergence achieved after 51 iterations. Final model:
From ARMA to ARIMA: Handling non-stationary series
When the original series is non-stationary, we apply differencing times to achieve stationarity:
The differenced series follows an ARMA(p,q) model. This is called an ARIMA(p,d,q) model.
Overdifferencing can introduce spurious ACF structure.
ARIMA unifies many classical models:
After forecasting , we must "un-difference" to get forecasts for :
The IMA(1,1) model is widely used in business and economics:
Assuming observations start at time (before which all values are 0), we can recursively expand:
Computing the variance of :
As , , confirming non-stationarity.
For large and moderate :
High positive correlations across multiple lags reflect persistent trends common in economic data.
Modeling periodic patterns in time series data
A Seasonal ARIMA model combines non-seasonal and seasonal components:
Non-seasonal components:
Seasonal components:
For monthly data (s=12), a simple seasonal MA(1)₁₂ model:
ACF is non-zero only at lag 12, capturing annual cyclicality.
Expanding (1-θB)(1-ΘB¹²):
Creates dependencies at lags 1, 12, and 13.
Plot time series and ACF. Look for repeating patterns at regular intervals (e.g., every 12 months).
Compute ∇ₛ = (1-Bˢ)Xₜ to remove seasonal trend. Check if one differencing is enough.
If trend remains after seasonal differencing, apply ∇ᵈ. Typically d=0 or 1.
Examine ACF/PACF at seasonal lags (s, 2s, 3s, ...). Spikes suggest P or Q orders.
Examine ACF/PACF at non-seasonal lags (1, 2, 3, ...) for non-seasonal components.
Fit model using MLE. Check residuals: should be white noise with no seasonal pattern.
Use ARMA when both ACF and PACF tail off without a clear cut-off point. The principle of parsimony suggests that if you can achieve the same fit with fewer parameters using ARMA(p,q) instead of a high-order AR or MA model, you should prefer ARMA. For example, ARMA(1,1) can often mimic the behavior of AR(∞) or MA(∞) with just 2 parameters.
Parameter redundancy occurs when the AR and MA polynomials share common roots. This makes the model non-identifiable - multiple parameter sets produce identical time series. For example, if φ(B) and θ(B) both have a root at B=0.5, we can cancel them out and get a simpler model. Always check for common roots and enforce stationarity/invertibility to avoid this issue.
Start with examining the ACF and PACF plots. If both tail off, consider ARMA. Use information criteria like AIC or BIC to compare models with different (p,q) orders. The Extended ACF (EACF) can also help identify the orders. Finally, check residual diagnostics - a good model should leave white noise residuals.
No. ARMA models require stationary data. If your data has trends or changing variance, you need to either (1) difference the data first (leading to ARIMA models), (2) detrend using regression, or (3) apply transformations like log or Box-Cox. Only after achieving stationarity can you fit an ARMA model.
Because ARMA forecasts depend on both past values AND past errors (innovations). Past errors are unobserved, so we must estimate them recursively using the Innovations Algorithm. This makes the forecast computation iterative, unlike AR models where we can directly use past observations. However, modern software handles this complexity automatically.