Stationary Processes

Learning Objectives

What you'll master in this comprehensive module

Understand the rigor definitions of weak and strict stationarity
Master autocovariance and autocorrelation function properties
Analyze white noise processes and their fundamental role
Apply linear transformation theorems to stationary sequences
Implement linear filtration techniques for signal processing
Explore spectral density and frequency domain analysis
Understand ergodic theorems and long-run behavior

Essential Definitions

Rigorous mathematical foundations of stationary processes

Weak Stationary Process

A stochastic process {Xₜ : t ∈ ℤ} is weakly stationary if:

\text{Cov}(X_t, X_s) = \gamma_{|t-s|} = E[(X_t - \mu)(X_s - \mu)]

Conditions:

Second moment exists: E[Xₜ²] < ∞ for all t
Constant mean: E[Xₜ] = μ for all t
Autocovariance depends only on lag: Cov(Xₜ, Xₛ) = γ(t-s)

Key Insight

Weak stationarity ensures the process has constant statistical structure over time, making it amenable to forecasting and modeling.

Autocovariance Function

For a stationary process with mean μ, the autocovariance function at lag k is:

\gamma_k = \text{Cov}(X_t, X_{t+k}) = E[(X_t - \mu)(X_{t+k} - \mu)]

Properties:

Symmetry: γₖ = γ₋ₖ
Non-negative definiteness: Γₙ is positive semidefinite
Bound: |γₖ| ≤ γ₀ (variance is maximum)
γ₀ = Var(Xₜ)

Key Insight

The autocovariance function captures the linear dependence structure between observations at different time points.

Autocorrelation Function (ACF)

The autocorrelation function is the normalized autocovariance:

\rho_k = \frac{\gamma_k}{\gamma_0} = \text{Corr}(X_t, X_{t+k})

Properties:

ρ₀ = 1 (perfect correlation at lag 0)
|ρₖ| ≤ 1 for all k
ρₖ = ρ₋ₖ (symmetric)
Dimensionless measure

Key Insight

ACF provides a scale-free measure of temporal dependence, crucial for identifying model structure in ARMA processes.

White Noise Process

A sequence {εₜ} is white noise WN(μ, σ²) if:

E[\varepsilon_t] = \mu, \quad \text{Cov}(\varepsilon_t, \varepsilon_s) = \sigma^2 \delta_{ts}

Types:

Independent White Noise: {εₜ} are i.i.d.
Normal White Noise: εₜ ~ N(μ, σ²) i.i.d.
Uncorrelated White Noise: Only covariance = 0

Key Insight

White noise is the building block for linear time series models (MA, AR, ARMA). Independent white noise is stronger than uncorrelated white noise.

Rigorous Theorem Proofs

Step-by-step mathematical derivations of fundamental theorems

Theorem 1: Linear Transformation Preserves Stationarity

Fundamental Property

Affine transformations maintain stationary structure

Theorem Statement

Let $\{X_t\}$ be stationary with mean $\mu$ and ACF $\gamma_X(k)$ . Define $Y_t = aX_t + b$ . Then $\{Y_t\}$ is stationary with mean $a\mu + b$ and ACF $\gamma_Y(k) = a^2\gamma_X(k)$ .

Proof

1

Second Moment Verification

Show that the second moment of Y_t exists and is finite.

E[Y_t^2] = E[(aX_t + b)^2] = a^2E[X_t^2] + 2abE[X_t] + b^2 = a^2E[X_t^2] + 2ab\mu + b^2 < \infty

2

Constant Mean Property

Verify that the mean does not depend on time t.

E[Y_t] = E[aX_t + b] = aE[X_t] + b = a\mu + b

3

Autocovariance Derivation

Compute the autocovariance function at lag k.

\gamma_Y(k) = E[(Y_t - E[Y_t])(Y_{t+k} - E[Y_{t+k}])] = E[a(X_t-\mu) \cdot a(X_{t+k}-\mu)] = a^2\gamma_X(k)

4

Lag-Dependence Only

Confirm autocovariance depends solely on lag k, not on t.

\gamma_Y(k) = a^2\gamma_X(k) \text{ is a function of } k \text{ alone}

5

Standardization Application

Setting a = 1/√(γ_X(0)) and b = -μ/√(γ_X(0)) gives standardized process.

Z_t = \frac{X_t - \mu}{\sqrt{\gamma_X(0)}} \text{ with } E[Z_t]=0, \text{Var}(Z_t)=1

6

Conclusion

All three stationarity conditions are satisfied, completing the proof.

\therefore \{Y_t\} \text{ is stationary. } \blacksquare

Theorem 2: Properties of Autocovariance Function

Structural Constraints

Essential mathematical properties that all ACFs must satisfy

Theorem Statement

The autocovariance function $\gamma_k$ satisfies:

Symmetry: $\gamma_k = \gamma_{-k}$
Non-negative definiteness: $\mathbf{a}^T\Gamma_n\mathbf{a} \geq 0$ for any $\mathbf{a}$
Boundedness: $|\gamma_k| \leq \gamma_0$

Proofs

1

Symmetry Proof

Covariance is symmetric in its arguments by definition.

\gamma_k = \text{Cov}(X_t,X_{t+k}) = \text{Cov}(X_{t+k},X_t) = \gamma_{-k} \quad \blacksquare

2

Non-negative Definiteness Setup

Consider arbitrary coefficients a₁,...,aₙ and times t₁,...,tₙ.

\mathbf{a}^T\Gamma_n\mathbf{a} = \sum_{i,j} a_ia_j\gamma_{i-j} = \sum_{i,j} a_ia_jE[(X_{t_i}-\mu)(X_{t_j}-\mu)]

3

Exchange Sum and Expectation

By linearity of expectation, we can move E inside the double sum.

= E\left[\sum_i a_i(X_{t_i}-\mu) \sum_j a_j(X_{t_j}-\mu)\right] = E\left[\left(\sum_i a_i(X_{t_i}-\mu)\right)^2\right]

4

Non-negativity Conclusion

Expectation of a square is always non-negative.

E\left[\left(\sum_i a_i(X_{t_i}-\mu)\right)^2\right] \geq 0 \quad \blacksquare

5

Boundedness via Cauchy-Schwarz

Apply Cauchy-Schwarz inequality to covariances.

[\text{Cov}(X_t,X_{t+k})]^2 \leq \text{Var}(X_t)\text{Var}(X_{t+k}) = \gamma_0^2

6

Final Bound

Taking square roots gives the desired inequality.

|\gamma_k| \leq \gamma_0 \text{ for all } k \quad \blacksquare

Herglotz's Theorem (Converse)

Any sequence satisfying these three properties can be realized as the autocovariance of some stationary process. This profound result connects time-domain analysis to spectral theory via the spectral representation theorem.

Theorem 3: Sum of Orthogonal Stationary Processes

Decomposition Theory

Orthogonal processes combine to form stationary processes

Theorem Statement

If $\{X_t\}$ and $\{Y_t\}$ are stationary with $\text{Cov}(X_t,Y_s)=0$ for all $t,s$ , then $Z_t = X_t + Y_t$ is stationary with $\gamma_Z(k) = \gamma_X(k) + \gamma_Y(k)$ .

Proof

1

Second Moment Finiteness

Use Cauchy-Schwarz to bound cross-product term.

E[Z_t^2] = E[X_t^2] + 2E[X_tY_t] + E[Y_t^2] \leq E[X_t^2] + 2\sqrt{E[X_t^2]E[Y_t^2]} + E[Y_t^2] < \infty

2

Constant Mean

Expectation is linear.

E[Z_t] = E[X_t] + E[Y_t] = \mu_X + \mu_Y = \mu_Z

3

Expand Autocovariance

Use bilinearity of covariance.

\gamma_Z(k) = E[(X_t+Y_t-\mu_Z)(X_{t+k}+Y_{t+k}-\mu_Z)]

4

Distribute Products

Expand into four terms.

= E[(X_t-\mu_X)(X_{t+k}-\mu_X)] + E[(Y_t-\mu_Y)(Y_{t+k}-\mu_Y)] + 2E[(X_t-\mu_X)(Y_{t+k}-\mu_Y)]

5

Apply Orthogonality

Cross-covariance vanishes by assumption.

\text{Cov}(X_t,Y_{t+k}) = 0 \implies \gamma_Z(k) = \gamma_X(k) + \gamma_Y(k)

6

Lag-Dependence Verified

Result depends only on lag k, confirming stationarity.

\gamma_Z(k) \text{ is a function of } k \text{ alone} \quad \blacksquare

Decomposition Application

This theorem justifies additive decompositions like $X_t = T_t + S_t + \varepsilon_t$ (trend + seasonal + noise). If components are uncorrelated, we can analyze each separately and simply add their autocovariances.

Detailed Worked Examples

Complete step-by-step solutions with rigorous calculations

Example 1: Harmonic Process with Random Phase

Problem:

Let $U \sim \text{Uniform}(-\pi, \pi)$ and define $X_t = b\cos(at + U)$ where $a, b$ are constants. Prove that $\{X_t\}$ is stationary and find its autocovariance function.

Solution:

Calculate Mean:
$E[X_t] = E[b\cos(at + U)] = b\int_{-\pi}^{\pi} \cos(at + u) \frac{1}{2\pi} du$
Using substitution $v = at + u$ :
$= \frac{b}{2\pi}[\sin(at + u)]_{-\pi}^{\pi} = \frac{b}{2\pi}[\sin(at+\pi) - \sin(at-\pi)]$
$= \frac{b}{2\pi}[-\sin(at) - (-\sin(at))] = 0$
Compute Autocovariance: Since $E[X_t] = 0$ , we have $\gamma(k) = E[X_t X_{t+k}]$
$E[X_t X_{t+k}] = E[b\cos(at+U) \cdot b\cos(a(t+k)+U)]$
$= b^2 \int_{-\pi}^{\pi} \cos(at+u)\cos(a(t+k)+u) \frac{du}{2\pi}$
Apply product-to-sum formula: $\cos A \cos B = \frac{1}{2}[\cos(A+B) + \cos(A-B)]$
$= \frac{b^2}{4\pi} \int_{-\pi}^{\pi} [\cos(2at+ak+2u) + \cos(ak)] du$
Evaluate Integral:
First term integrates to zero (over full period):
$\int_{-\pi}^{\pi} \cos(2at+ak+2u) du = \frac{1}{2}[\sin(2at+ak+2u)]_{-\pi}^{\pi} = 0$
Second term:
$\int_{-\pi}^{\pi} \cos(ak) du = \cos(ak) \cdot 2\pi$
Final Result:
$\gamma(k) = \frac{b^2}{4\pi} \cdot 2\pi \cos(ak) = \frac{b^2}{2}\cos(ak)$
This depends only on lag $k$ , confirming stationarity. Variance is $\gamma(0) = b^2/2$ .

Key Insight:

Random phase $U$ uniformly distributed ensures stationarity despite the deterministic cosine structure. The ACF $\rho(k) = \cos(ak)$ perfectly oscillates, never decaying—typical of pure periodic signals.

Example 2: Finite Moving Average MA(q)

Problem:

Let $\{\varepsilon_t\} \sim WN(0, \sigma^2)$ and define $X_t = \sum_{j=0}^q a_j \varepsilon_{t-j}$ . Prove stationarity and derive the autocovariance function.

Solution:

Verify Second Moment:
$E[X_t^2] = E\left[\left(\sum_{j=0}^q a_j\varepsilon_{t-j}\right)^2\right] = \sum_{j=0}^q \sum_{i=0}^q a_j a_i E[\varepsilon_{t-j}\varepsilon_{t-i}]$
Since white noise: $E[\varepsilon_s\varepsilon_r] = \sigma^2\delta_{sr}$
$= \sum_{j=0}^q a_j^2 \sigma^2 = \sigma^2 \sum_{j=0}^q a_j^2 < \infty$
Check Mean:
$E[X_t] = E\left[\sum_{j=0}^q a_j\varepsilon_{t-j}\right] = \sum_{j=0}^q a_j E[\varepsilon_{t-j}] = 0$
Derive Autocovariance for lag $k \geq 0$ :
$\gamma(k) = E[X_t X_{t+k}] = E\left[\sum_{j=0}^q a_j\varepsilon_{t-j} \sum_{i=0}^q a_i\varepsilon_{t+k-i}\right]$
$= \sum_{j=0}^q \sum_{i=0}^q a_j a_i E[\varepsilon_{t-j}\varepsilon_{t+k-i}]$
Non-zero only when $t-j = t+k-i$ , i.e., $i = j+k$
Simplify for different lags:
For $0 \leq k \leq q$ :
$\gamma(k) = \sigma^2 \sum_{j=0}^{q-k} a_j a_{j+k}$
For $|k| > q$ :
$\gamma(k) = 0 \quad \text{(ACF cuts off!)}$
By symmetry: $\gamma(-k) = \gamma(k)$

Key Insight:

MA(q) processes are always stationary (finite weights guarantee finite variance). The ACF cuts off after lag $q$ , a diagnostic signature used in model identification. Variance is $\gamma(0) = \sigma^2\sum_{j=0}^q a_j^2$ .

Example 3: Rectangular Window Filter

Problem:

Apply a rectangular window filter to stationary

\{X_t\}

with

\gamma_X(k)

:

Y_t = \frac{1}{2M+1}\sum_{j=-M}^M X_{t-j}

Find the autocovariance of

\{Y_t\}

.

Solution:

Identify filter coefficients: $h_j = \frac{1}{2M+1}$ for $|j| \leq M$ , else $h_j = 0$
Apply general filtering formula:
$\gamma_Y(k) = \sum_{l=-\infty}^{\infty} \sum_{j=-\infty}^{\infty} h_l h_j \gamma_X(k+l-j)$
With finite support $h_j$ :
$= \sum_{l=-M}^M \sum_{j=-M}^M \frac{1}{(2M+1)^2} \gamma_X(k+l-j)$
Change variables: Let $m = l-j$ , then $m \in [-2M, 2M]$
$\gamma_Y(k) = \frac{1}{(2M+1)^2} \sum_{m=-2M}^{2M} w_m \gamma_X(k+m)$
where $w_m$ counts pairs $(l,j)$ such that $l-j=m$ with $|l|,|j| \leq M$
Determine weights $w_m$ : For $|m| \leq M$ , $w_m = 2M+1-|m|$ ; for $M < |m| \leq 2M$ , $w_m$ decreases linearly
Simplified form:
$\gamma_Y(k) = \frac{1}{2M+1}\sum_{m=-(2M)}^{2M} \left(1 - \frac{|m|}{2M+1}\right) \gamma_X(k+m)$

Interpretation:

The rectangular filter smooths the original process by averaging nearby values. The filtered ACF is a weighted average of the original ACF, with weights forming a triangular kernel. Larger $M$ increases smoothing but reduces responsiveness to changes.

Example 4: Spectral Density of MA(1)

Problem:

For $X_t = \varepsilon_t + \theta\varepsilon_{t-1}$ where $\{\varepsilon_t\} \sim WN(0,\sigma^2)$ , find the spectral density function.

Solution:

Find ACF: From MA(1) structure with $a_0=1, a_1=\theta$
$\gamma(0) = \sigma^2(1 + \theta^2)$
$\gamma(1) = \gamma(-1) = \sigma^2\theta$
$\gamma(k) = 0 \text{ for } |k| \geq 2$
Apply spectral density formula:
For linear process $X_t = \sum a_j \varepsilon_{t-j}$ :
$f(\lambda) = \frac{\sigma^2}{2\pi}\left|\sum_{j} a_j e^{-ij\lambda}\right|^2$
Compute transfer function:
$H(e^{-i\lambda}) = \sum_{j=0}^1 a_j e^{-ij\lambda} = 1 + \theta e^{-i\lambda}$
Calculate modulus squared:
$|H(e^{-i\lambda})|^2 = (1 + \theta e^{-i\lambda})(1 + \theta e^{i\lambda})$
$= 1 + \theta e^{-i\lambda} + \theta e^{i\lambda} + \theta^2$
$= 1 + \theta^2 + 2\theta\cos(\lambda)$
Final spectral density:
$f(\lambda) = \frac{\sigma^2}{2\pi}(1 + \theta^2 + 2\theta\cos\lambda), \quad \lambda \in [-\pi, \pi]$

Frequency Domain Interpretation:

If $\theta > 0$ , spectral density peaks at $\lambda=0$ (low frequencies dominate). If $\theta < 0$ , it peaks at $\lambda = \pm\pi$ (high frequencies dominate). This explains why $\theta < 0$ produces oscillatory behavior.

Advanced Topics

Deeper exploration of linear processes, spectral theory, and ergodicity

Linear Processes and Wold Decomposition

Linear Process Definition

A process $\{X_t\}$ is a linear process if it can be written as:

X_t = \sum_{j=-\infty}^{\infty} \psi_j \varepsilon_{t-j}

where $\{\varepsilon_t\} \sim WN(0,\sigma^2)$ and $\sum_{j=-\infty}^{\infty} |\psi_j| < \infty$ (absolutely summable coefficients).

Causal Representation

If $\psi_j = 0$ for $j < 0$ :

X_t = \sum_{j=0}^{\infty} \psi_j \varepsilon_{t-j}

Current value depends only on current and past innovations (causal / one-sided).

Autocovariance Formula

For linear process with $E[X_t]=0$ :

\gamma(k) = \sigma^2 \sum_{j=-\infty}^{\infty} \psi_j \psi_{j+k}

Absolutely summable $\psi_j$ ensures absolute convergence of $\gamma(k)$ .

Wold Decomposition Theorem

Any zero-mean stationary process $\{X_t\}$ can be uniquely decomposed as:

X_t = \sum_{j=0}^{\infty} \psi_j \varepsilon_{t-j} + V_t

where $\{\varepsilon_t\}$ is white noise, $\{V_t\}$ is deterministic (perfectly predictable from past), and they are uncorrelated. This separates the stochastic and deterministic components.

Practical Significance

Wold decomposition shows that purely nondeterministic processes (Vₜ=0) can be represented as infinite MA. This includes all ARMA processes, making linear representation fundamental to time series modeling.

Spectral Density and Frequency Domain

Spectral Representation

The autocovariance function $\gamma(k)$ and spectral density $f(\lambda)$ form a Fourier transform pair:

Forward (ACF → Spectral Density):

f(\lambda) = \frac{1}{2\pi} \sum_{k=-\infty}^{\infty} \gamma(k) e^{-ik\lambda}, \quad \lambda \in [-\pi,\pi]

Inverse (Spectral Density → ACF):

\gamma(k) = \int_{-\pi}^{\pi} e^{ik\lambda} f(\lambda) d\lambda

Properties

Non-negative: $f(\lambda) \geq 0$
Even function: $f(-\lambda) = f(\lambda)$
Real-valued for real processes
Integrates to variance: $\int_{-\pi}^{\pi} f(\lambda)d\lambda = \gamma(0)$

Interpretation

$f(\lambda)$ describes how variance is distributed across frequencies $\lambda$ :

• Peak at $\lambda=0$ : low-frequency (trend) dominance
• Peak at $\lambda=\pm\pi$ : high-frequency (oscillation) dominance
• Flat spectrum: white noise (all frequencies equally)

Linear Filter in Frequency Domain

If $Y_t = \sum h_j X_{t-j}$ with transfer function $H(e^{-i\lambda}) = \sum h_j e^{-ij\lambda}$ , then:

f_Y(\lambda) = |H(e^{-i\lambda})|^2 f_X(\lambda)

This elegant result shows filtering modifies the spectral density by the squared magnitude of the frequency response.

Engineering Applications

Spectral analysis enables signal extraction: design filters to pass desired frequencies and attenuate others. Used extensively in communications (bandpass filters), seismology (earthquake signal isolation), and economics (trend-cycle decomposition).

Ergodicity and Long-Run Behavior

Ergodicity Definition

A stationary process $\{X_t\}$ is ergodic for the mean if:

\frac{1}{n}\sum_{t=1}^n X_t \xrightarrow{\text{a.s.}} E[X_1] = \mu \quad \text{as } n \to \infty

This means time averages converge to ensemble averages almost surely. A single long realization contains the same information as infinitely many independent short realizations.

Sufficient Condition

If the ACF satisfies:

\lim_{n\to\infty} \frac{1}{n}\sum_{k=1}^n |\gamma(k)| = 0

then the process is ergodic for the mean. This holds if $\gamma(k) \to 0$ as $k \to \infty$ (mixing condition).

Statistical Inference

With ergodicity, we can estimate:

• Mean: $\hat{\mu} = n^{-1}\sum X_t$
• Variance: $\hat{\gamma}(0) = n^{-1}\sum(X_t-\hat{\mu})^2$
• ACF: $\hat{\gamma}(k) = n^{-1}\sum(X_t-\hat{\mu})(X_{t+k}-\hat{\mu})$

from a single realization, which is crucial for real-world data analysis.

Ergodic Theorem for Stationary Sequences

For a stationary ergodic sequence and any measurable function $g$ with $E[|g(X_t)|] < \infty$ :

\frac{1}{n}\sum_{t=1}^n g(X_t) \xrightarrow{\text{a.s.}} E[g(X_1)] \quad \text{as } n \to \infty

This generalizes the law of large numbers to dependent sequences, enabling consistent estimation of expectations of any functional of the process.

Why Ergodicity Matters

Without ergodicity: We'd need multiple independent time series realizations to estimate population moments.

With ergodicity: A single sufficiently long series provides consistent estimates. This is the foundation for all empirical time series analysis—we almost never have multiple realizations, yet ergodicity validates using one long series for estimation and inference.

Practical Stationarity Testing

Diagnostic Tools

Before applying stationary process models, verify stationarity using these methods:

1Visual Inspection

• Plot the series: does mean/variance appear constant?
• Check ACF plot: should decay towards zero
• Look for trends or seasonal patterns

2Augmented Dickey-Fuller (ADF) Test

Tests null hypothesis: process has unit root (non-stationary)

\Delta X_t = \alpha + \beta t + \gamma X_{t-1} + \sum_{j=1}^p \delta_j \Delta X_{t-j} + \varepsilon_t

Test $H_0: \gamma = 0$ vs $H_1: \gamma < 0$ . Reject $H_0$ → stationary.

3KPSS Test

Tests null hypothesis: process is stationary (opposite of ADF)

Decomposes $X_t = \xi t + r_t + \varepsilon_t$ (trend + random walk + error). Test $H_0: \text{Var}(r_t) = 0$ . Reject $H_0$ → non-stationary.

4Phillips-Perron Test

Similar to ADF but robust to heteroskedasticity and serial correlation. Uses non-parametric correction to the test statistic.

Making Data Stationary

If tests indicate non-stationarity:

• Differencing: $\nabla X_t = X_t - X_{t-1}$ removes trends
• Log transformation: stabilizes variance
• Detrending: remove deterministic trend
• Seasonal differencing: $\nabla_s X_t = X_t - X_{t-s}$ for seasonality

After transformation, re-test stationarity before modeling.

Real-World Applications

Where stationary processes theory meets practice

Quantitative Finance

Asset returns (log-returns) are often modeled as stationary processes, unlike prices which are non-stationary (random walks).

Application:

Volatility modeling (ARCH/GARCH) assumes stationarity of the squared returns series to forecast risk (VaR).

Signal Processing

Noise in communication channels is modeled as stationary random processes (often Gaussian white noise).

Application:

Wiener filters use stationarity assumptions to optimally separate signal from noise, minimizing mean square error.

Geophysics & Meteorology

Climate indices (e.g., SOI, NAO) and seismic background noise are analyzed as stationary series after detrending.

Application:

Spectral analysis identifies dominant cycles (e.g., El Niño periodicity) in environmental data series.

Frequently Asked Questions

What's the difference between weak and strict stationarity?

Weak (second-order) stationarity only requires constant mean and autocovariance depending on lags, involving first two moments. Strict stationarity requires all finite-dimensional distributions to be invariant under time shifts. Strict stationarity implies weak stationarity if second moments exist, but not vice versa. Gaussian processes are an exception: weak stationary Gaussian processes are also strictly stationary.

Why is the autocovariance function non-negative definite?

For any coefficients a₁,...,aₙ and times t₁,...,tₙ, the variance of the linear combination Σᵢ aᵢXₜᵢ must be non-negative. Expanding this variance: Var(Σᵢ aᵢXₜᵢ) = Σᵢ Σⱼ aᵢaⱼγ(tᵢ-tⱼ) = aᵀΓa ≥ 0. This algebraic property is fundamental and ensures the covariance matrix is positive semidefinite.

Can a non-stationary process have stationary increments?

Yes! Random walk Xₜ = Σᵢ₌₁ᵗ εᵢ (where εᵢ is white noise) is non-stationary since Var(Xₜ) = tσ² grows with time. However, its increments ΔXₜ = Xₜ - Xₜ₋₁ = εₜ are stationary. This distinction is crucial: ARIMA models difference non-stationary series to achieve stationarity.

How does linear filtration affect stationarity?

If {Xₜ} is stationary and we apply a linear filter Yₜ = Σⱼ hⱼXₜ₋ⱼ with absolutely summable coefficients Σ|hⱼ| < ∞, then {Yₜ} is also stationary. The filtered process has autocovariance γY(k) = ΣⱼΣᵢ hⱼhᵢγX(k+j-i). Linear filters preserve stationarity while modifying the spectral characteristics.

What is the relationship between ACF and PACF?

ACF (autocorrelation function) measures direct and indirect correlations at each lag. PACF (partial autocorrelation function) measures only the direct correlation after removing intermediate lag effects. For MA(q): ACF cuts off after lag q, PACF decays. For AR(p): PACF cuts off after lag p, ACF decays. This diagnostic property helps identify model order.

Why do we need the spectral density function?

The spectral density f(λ) represents how variance is distributed across frequencies. It's the Fourier transform of the autocovariance: γₖ = ∫₋ππ eⁱᵏᵘ f(λ)dλ. While ACF captures time-domain dependence, spectral density reveals frequency-domain structure. Peaks in f(λ) indicate dominant cyclical components. This duality is essential for filter design and signal extraction.

What does ergodicity mean for time series?

An ergodic process allows time averages to converge to ensemble averages. Specifically, if {Xₜ} is ergodic for the mean, then (1/n)Σₜ₌₁ⁿ Xₜ → E[X₁] almost surely as n → ∞. This is crucial for statistical inference: with ergodicity, a single long realization provides information about population moments. Without ergodicity, we'd need multiple independent realizations.

How do we test for stationarity in practice?

Common tests include: (1) Augmented Dickey-Fuller (ADF) test for unit roots (null: non-stationary), (2) KPSS test (null: stationary), (3) Phillips-Perron test (robust to heteroskedasticity), (4) Visual inspection: plot ACF (should decay) and check if mean/variance appear constant. Use differencing or detrending if tests reject stationarity.

Historical Development

The evolution of stationary process theory

1927•Yule & Slutsky

Birth of Time Series Analysis

G.U. Yule modeled sunspot numbers using autoregressive (AR) schemes. Independently, E. Slutsky showed that moving averages of random events could generate cyclic-like behavior, challenging the idea that economic cycles must have deterministic causes.

1934•A.Y. Khinchin

Correlation Theory

Khinchin established the rigorous mathematical foundation for stationary processes, defining the correlation function and proving the spectral representation theorem (Wiener-Khinchin theorem).

1938•Herman Wold

Wold Decomposition

In his thesis 'A Study in the Analysis of Stationary Time Series', Wold proved that any stationary process can be decomposed into a deterministic part and a purely non-deterministic (linear) part.

1941•A.N. Kolmogorov

Prediction Theory

Kolmogorov solved the fundamental problem of linear prediction for stationary sequences, deriving the formula for the mean square prediction error in terms of the spectral density.

Chapter Summary

Core Concepts

• Stationarity: Statistical properties (mean, variance, ACF) are time-invariant.
• ACF: Measures linear dependence over time; key for model identification.
• White Noise: The fundamental building block; uncorrelated, zero mean, constant variance.

Advanced Theory

• Spectral Density: Frequency domain equivalent of ACF; reveals cyclical structure.
• Linear Processes: Output of linear filters applied to white noise (Wold's Theorem).
• Ergodicity: Justifies using single time series realizations for statistical inference.

"Stationarity is the assumption that allows us to learn from the past to predict the future. Without it, the past is just a sequence of unique events."

Essential Definitions

Key Insight

Key Insight

Key Insight

Key Insight

Rigorous Theorem Proofs

Theorem Statement

Proof

Second Moment Verification

Constant Mean Property

Autocovariance Derivation

Lag-Dependence Only

Standardization Application

Conclusion

Theorem Statement

Proofs

Symmetry Proof

Non-negative Definiteness Setup

Exchange Sum and Expectation

Non-negativity Conclusion

Boundedness via Cauchy-Schwarz

Final Bound

Herglotz's Theorem (Converse)

Theorem Statement

Proof

Second Moment Finiteness

Constant Mean

Expand Autocovariance

Distribute Products

Apply Orthogonality

Lag-Dependence Verified

Decomposition Application

Detailed Worked Examples

Advanced Topics

Linear Process Definition

Causal Representation

Autocovariance Formula

Wold Decomposition Theorem

Practical Significance

Spectral Representation

Properties

Interpretation

Linear Filter in Frequency Domain

Engineering Applications

Ergodicity Definition

Sufficient Condition

Statistical Inference

Ergodic Theorem for Stationary Sequences

Why Ergodicity Matters

Diagnostic Tools

1Visual Inspection

2Augmented Dickey-Fuller (ADF) Test

3KPSS Test

4Phillips-Perron Test

Making Data Stationary

Real-World Applications

Frequently Asked Questions

Historical Development

Birth of Time Series Analysis

Correlation Theory

Wold Decomposition

Prediction Theory

Core Concepts

Advanced Theory

Further Reading