Mathematical foundations of time series: stationarity, autocovariance structure, and spectral analysis
Rigorous mathematical foundations of stationary processes
A stochastic process {Xₜ : t ∈ ℤ} is weakly stationary if:
Conditions:
Weak stationarity ensures the process has constant statistical structure over time, making it amenable to forecasting and modeling.
For a stationary process with mean μ, the autocovariance function at lag k is:
Properties:
The autocovariance function captures the linear dependence structure between observations at different time points.
The autocorrelation function is the normalized autocovariance:
Properties:
ACF provides a scale-free measure of temporal dependence, crucial for identifying model structure in ARMA processes.
A sequence {εₜ} is white noise WN(μ, σ²) if:
Types:
White noise is the building block for linear time series models (MA, AR, ARMA). Independent white noise is stronger than uncorrelated white noise.
Step-by-step mathematical derivations of fundamental theorems
Affine transformations maintain stationary structure
Let be stationary with mean and ACF . Define . Then is stationary with mean and ACF .
Show that the second moment of Y_t exists and is finite.
Verify that the mean does not depend on time t.
Compute the autocovariance function at lag k.
Confirm autocovariance depends solely on lag k, not on t.
Setting a = 1/√(γ_X(0)) and b = -μ/√(γ_X(0)) gives standardized process.
All three stationarity conditions are satisfied, completing the proof.
Essential mathematical properties that all ACFs must satisfy
The autocovariance function satisfies:
Covariance is symmetric in its arguments by definition.
Consider arbitrary coefficients a₁,...,aₙ and times t₁,...,tₙ.
By linearity of expectation, we can move E inside the double sum.
Expectation of a square is always non-negative.
Apply Cauchy-Schwarz inequality to covariances.
Taking square roots gives the desired inequality.
Any sequence satisfying these three properties can be realized as the autocovariance of some stationary process. This profound result connects time-domain analysis to spectral theory via the spectral representation theorem.
Orthogonal processes combine to form stationary processes
If and are stationary with for all , then is stationary with .
Use Cauchy-Schwarz to bound cross-product term.
Expectation is linear.
Use bilinearity of covariance.
Expand into four terms.
Cross-covariance vanishes by assumption.
Result depends only on lag k, confirming stationarity.
This theorem justifies additive decompositions like (trend + seasonal + noise). If components are uncorrelated, we can analyze each separately and simply add their autocovariances.
Complete step-by-step solutions with rigorous calculations
Problem:
Let and define where are constants. Prove that is stationary and find its autocovariance function.
Solution:
Using substitution :
Apply product-to-sum formula:
First term integrates to zero (over full period):
Second term:
This depends only on lag , confirming stationarity. Variance is .
Key Insight:
Random phase uniformly distributed ensures stationarity despite the deterministic cosine structure. The ACF perfectly oscillates, never decaying—typical of pure periodic signals.
Problem:
Let and define . Prove stationarity and derive the autocovariance function.
Solution:
Since white noise:
Non-zero only when , i.e.,
For :
For :
By symmetry:
Key Insight:
MA(q) processes are always stationary (finite weights guarantee finite variance). The ACF cuts off after lag , a diagnostic signature used in model identification. Variance is .
Problem:
Solution:
With finite support :
where counts pairs such that with
Simplified form:
Interpretation:
The rectangular filter smooths the original process by averaging nearby values. The filtered ACF is a weighted average of the original ACF, with weights forming a triangular kernel. Larger increases smoothing but reduces responsiveness to changes.
Problem:
For where , find the spectral density function.
Solution:
For linear process :
Frequency Domain Interpretation:
If , spectral density peaks at (low frequencies dominate). If , it peaks at (high frequencies dominate). This explains why produces oscillatory behavior.
Deeper exploration of linear processes, spectral theory, and ergodicity
A process is a linear process if it can be written as:
where and (absolutely summable coefficients).
If for :
Current value depends only on current and past innovations (causal / one-sided).
For linear process with :
Absolutely summable ensures absolute convergence of .
Any zero-mean stationary process can be uniquely decomposed as:
where is white noise, is deterministic (perfectly predictable from past), and they are uncorrelated. This separates the stochastic and deterministic components.
Wold decomposition shows that purely nondeterministic processes (Vₜ=0) can be represented as infinite MA. This includes all ARMA processes, making linear representation fundamental to time series modeling.
The autocovariance function and spectral density form a Fourier transform pair:
Forward (ACF → Spectral Density):
Inverse (Spectral Density → ACF):
describes how variance is distributed across frequencies :
If with transfer function , then:
This elegant result shows filtering modifies the spectral density by the squared magnitude of the frequency response.
Spectral analysis enables signal extraction: design filters to pass desired frequencies and attenuate others. Used extensively in communications (bandpass filters), seismology (earthquake signal isolation), and economics (trend-cycle decomposition).
A stationary process is ergodic for the mean if:
This means time averages converge to ensemble averages almost surely. A single long realization contains the same information as infinitely many independent short realizations.
If the ACF satisfies:
then the process is ergodic for the mean. This holds if as (mixing condition).
With ergodicity, we can estimate:
from a single realization, which is crucial for real-world data analysis.
For a stationary ergodic sequence and any measurable function with :
This generalizes the law of large numbers to dependent sequences, enabling consistent estimation of expectations of any functional of the process.
Without ergodicity: We'd need multiple independent time series realizations to estimate population moments.
With ergodicity: A single sufficiently long series provides consistent estimates. This is the foundation for all empirical time series analysis—we almost never have multiple realizations, yet ergodicity validates using one long series for estimation and inference.
Before applying stationary process models, verify stationarity using these methods:
Tests null hypothesis: process has unit root (non-stationary)
Test vs . Reject → stationary.
Tests null hypothesis: process is stationary (opposite of ADF)
Decomposes (trend + random walk + error). Test . Reject → non-stationary.
Similar to ADF but robust to heteroskedasticity and serial correlation. Uses non-parametric correction to the test statistic.
If tests indicate non-stationarity:
After transformation, re-test stationarity before modeling.
Where stationary processes theory meets practice
Asset returns (log-returns) are often modeled as stationary processes, unlike prices which are non-stationary (random walks).
Application:
Volatility modeling (ARCH/GARCH) assumes stationarity of the squared returns series to forecast risk (VaR).
Noise in communication channels is modeled as stationary random processes (often Gaussian white noise).
Application:
Wiener filters use stationarity assumptions to optimally separate signal from noise, minimizing mean square error.
Climate indices (e.g., SOI, NAO) and seismic background noise are analyzed as stationary series after detrending.
Application:
Spectral analysis identifies dominant cycles (e.g., El Niño periodicity) in environmental data series.
Weak (second-order) stationarity only requires constant mean and autocovariance depending on lags, involving first two moments. Strict stationarity requires all finite-dimensional distributions to be invariant under time shifts. Strict stationarity implies weak stationarity if second moments exist, but not vice versa. Gaussian processes are an exception: weak stationary Gaussian processes are also strictly stationary.
For any coefficients a₁,...,aₙ and times t₁,...,tₙ, the variance of the linear combination Σᵢ aᵢXₜᵢ must be non-negative. Expanding this variance: Var(Σᵢ aᵢXₜᵢ) = Σᵢ Σⱼ aᵢaⱼγ(tᵢ-tⱼ) = aᵀΓa ≥ 0. This algebraic property is fundamental and ensures the covariance matrix is positive semidefinite.
Yes! Random walk Xₜ = Σᵢ₌₁ᵗ εᵢ (where εᵢ is white noise) is non-stationary since Var(Xₜ) = tσ² grows with time. However, its increments ΔXₜ = Xₜ - Xₜ₋₁ = εₜ are stationary. This distinction is crucial: ARIMA models difference non-stationary series to achieve stationarity.
If {Xₜ} is stationary and we apply a linear filter Yₜ = Σⱼ hⱼXₜ₋ⱼ with absolutely summable coefficients Σ|hⱼ| < ∞, then {Yₜ} is also stationary. The filtered process has autocovariance γY(k) = ΣⱼΣᵢ hⱼhᵢγX(k+j-i). Linear filters preserve stationarity while modifying the spectral characteristics.
ACF (autocorrelation function) measures direct and indirect correlations at each lag. PACF (partial autocorrelation function) measures only the direct correlation after removing intermediate lag effects. For MA(q): ACF cuts off after lag q, PACF decays. For AR(p): PACF cuts off after lag p, ACF decays. This diagnostic property helps identify model order.
The spectral density f(λ) represents how variance is distributed across frequencies. It's the Fourier transform of the autocovariance: γₖ = ∫₋ππ eⁱᵏᵘ f(λ)dλ. While ACF captures time-domain dependence, spectral density reveals frequency-domain structure. Peaks in f(λ) indicate dominant cyclical components. This duality is essential for filter design and signal extraction.
An ergodic process allows time averages to converge to ensemble averages. Specifically, if {Xₜ} is ergodic for the mean, then (1/n)Σₜ₌₁ⁿ Xₜ → E[X₁] almost surely as n → ∞. This is crucial for statistical inference: with ergodicity, a single long realization provides information about population moments. Without ergodicity, we'd need multiple independent realizations.
Common tests include: (1) Augmented Dickey-Fuller (ADF) test for unit roots (null: non-stationary), (2) KPSS test (null: stationary), (3) Phillips-Perron test (robust to heteroskedasticity), (4) Visual inspection: plot ACF (should decay) and check if mean/variance appear constant. Use differencing or detrending if tests reject stationarity.
The evolution of stationary process theory
G.U. Yule modeled sunspot numbers using autoregressive (AR) schemes. Independently, E. Slutsky showed that moving averages of random events could generate cyclic-like behavior, challenging the idea that economic cycles must have deterministic causes.
Khinchin established the rigorous mathematical foundation for stationary processes, defining the correlation function and proving the spectral representation theorem (Wiener-Khinchin theorem).
In his thesis 'A Study in the Analysis of Stationary Time Series', Wold proved that any stationary process can be decomposed into a deterministic part and a purely non-deterministic (linear) part.
Kolmogorov solved the fundamental problem of linear prediction for stationary sequences, deriving the formula for the mean square prediction error in terms of the spectral density.
"Stationarity is the assumption that allows us to learn from the past to predict the future. Without it, the past is just a sequence of unique events."
Recommended textbooks for deeper study
The definitive reference for rigorous mathematical theory of time series. Essential for understanding the proofs and Hilbert space foundations presented in this course.
The standard text for econometrics. Excellent coverage of stationarity, unit roots, and vector autoregressions (VAR) with economic applications.
A modern, accessible approach with extensive R examples. Balances theory with practical implementation and real-world data analysis.
The classic engineering text that introduced the ARIMA methodology (Box-Jenkins method). Focuses on model identification, estimation, and diagnostic checking.