MathIsimple
Back to Time Series Analysis
Core Concept
6-8 Hours

Stationary Processes

Mathematical foundations of time series: stationarity, autocovariance structure, and spectral analysis

Learning Objectives
What you'll master in this comprehensive module
  • Understand the rigor definitions of weak and strict stationarity
  • Master autocovariance and autocorrelation function properties
  • Analyze white noise processes and their fundamental role
  • Apply linear transformation theorems to stationary sequences
  • Implement linear filtration techniques for signal processing
  • Explore spectral density and frequency domain analysis
  • Understand ergodic theorems and long-run behavior

Essential Definitions

Rigorous mathematical foundations of stationary processes

Weak Stationary Process

A stochastic process {Xₜ : t ∈ ℤ} is weakly stationary if:

Cov(Xt,Xs)=γts=E[(Xtμ)(Xsμ)]\text{Cov}(X_t, X_s) = \gamma_{|t-s|} = E[(X_t - \mu)(X_s - \mu)]

Conditions:

  • Second moment exists: E[Xₜ²] < ∞ for all t
  • Constant mean: E[Xₜ] = μ for all t
  • Autocovariance depends only on lag: Cov(Xₜ, Xₛ) = γ(t-s)
Key Insight

Weak stationarity ensures the process has constant statistical structure over time, making it amenable to forecasting and modeling.

Autocovariance Function

For a stationary process with mean μ, the autocovariance function at lag k is:

γk=Cov(Xt,Xt+k)=E[(Xtμ)(Xt+kμ)]\gamma_k = \text{Cov}(X_t, X_{t+k}) = E[(X_t - \mu)(X_{t+k} - \mu)]

Properties:

  • Symmetry: γₖ = γ₋ₖ
  • Non-negative definiteness: Γₙ is positive semidefinite
  • Bound: |γₖ| ≤ γ₀ (variance is maximum)
  • γ₀ = Var(Xₜ)
Key Insight

The autocovariance function captures the linear dependence structure between observations at different time points.

Autocorrelation Function (ACF)

The autocorrelation function is the normalized autocovariance:

ρk=γkγ0=Corr(Xt,Xt+k)\rho_k = \frac{\gamma_k}{\gamma_0} = \text{Corr}(X_t, X_{t+k})

Properties:

  • ρ₀ = 1 (perfect correlation at lag 0)
  • |ρₖ| ≤ 1 for all k
  • ρₖ = ρ₋ₖ (symmetric)
  • Dimensionless measure
Key Insight

ACF provides a scale-free measure of temporal dependence, crucial for identifying model structure in ARMA processes.

White Noise Process

A sequence {εₜ} is white noise WN(μ, σ²) if:

E[εt]=μ,Cov(εt,εs)=σ2δtsE[\varepsilon_t] = \mu, \quad \text{Cov}(\varepsilon_t, \varepsilon_s) = \sigma^2 \delta_{ts}

Types:

  • Independent White Noise: {εₜ} are i.i.d.
  • Normal White Noise: εₜ ~ N(μ, σ²) i.i.d.
  • Uncorrelated White Noise: Only covariance = 0
Key Insight

White noise is the building block for linear time series models (MA, AR, ARMA). Independent white noise is stronger than uncorrelated white noise.

Rigorous Theorem Proofs

Step-by-step mathematical derivations of fundamental theorems

Theorem 1: Linear Transformation Preserves Stationarity
Fundamental Property

Affine transformations maintain stationary structure

Theorem Statement

Let {Xt}\{X_t\} be stationary with mean μ\mu and ACF γX(k)\gamma_X(k). Define Yt=aXt+bY_t = aX_t + b. Then {Yt}\{Y_t\} is stationary with mean aμ+ba\mu + b and ACF γY(k)=a2γX(k)\gamma_Y(k) = a^2\gamma_X(k).

Proof

1
Second Moment Verification

Show that the second moment of Y_t exists and is finite.

E[Yt2]=E[(aXt+b)2]=a2E[Xt2]+2abE[Xt]+b2=a2E[Xt2]+2abμ+b2<E[Y_t^2] = E[(aX_t + b)^2] = a^2E[X_t^2] + 2abE[X_t] + b^2 = a^2E[X_t^2] + 2ab\mu + b^2 < \infty
2
Constant Mean Property

Verify that the mean does not depend on time t.

E[Yt]=E[aXt+b]=aE[Xt]+b=aμ+bE[Y_t] = E[aX_t + b] = aE[X_t] + b = a\mu + b
3
Autocovariance Derivation

Compute the autocovariance function at lag k.

γY(k)=E[(YtE[Yt])(Yt+kE[Yt+k])]=E[a(Xtμ)a(Xt+kμ)]=a2γX(k)\gamma_Y(k) = E[(Y_t - E[Y_t])(Y_{t+k} - E[Y_{t+k}])] = E[a(X_t-\mu) \cdot a(X_{t+k}-\mu)] = a^2\gamma_X(k)
4
Lag-Dependence Only

Confirm autocovariance depends solely on lag k, not on t.

γY(k)=a2γX(k) is a function of k alone\gamma_Y(k) = a^2\gamma_X(k) \text{ is a function of } k \text{ alone}
5
Standardization Application

Setting a = 1/√(γ_X(0)) and b = -μ/√(γ_X(0)) gives standardized process.

Zt=XtμγX(0) with E[Zt]=0,Var(Zt)=1Z_t = \frac{X_t - \mu}{\sqrt{\gamma_X(0)}} \text{ with } E[Z_t]=0, \text{Var}(Z_t)=1
6
Conclusion

All three stationarity conditions are satisfied, completing the proof.

{Yt} is stationary. \therefore \{Y_t\} \text{ is stationary. } \blacksquare
Theorem 2: Properties of Autocovariance Function
Structural Constraints

Essential mathematical properties that all ACFs must satisfy

Theorem Statement

The autocovariance function γk\gamma_k satisfies:

  1. Symmetry: γk=γk\gamma_k = \gamma_{-k}
  2. Non-negative definiteness: aTΓna0\mathbf{a}^T\Gamma_n\mathbf{a} \geq 0 for any a\mathbf{a}
  3. Boundedness: γkγ0|\gamma_k| \leq \gamma_0

Proofs

1
Symmetry Proof

Covariance is symmetric in its arguments by definition.

γk=Cov(Xt,Xt+k)=Cov(Xt+k,Xt)=γk\gamma_k = \text{Cov}(X_t,X_{t+k}) = \text{Cov}(X_{t+k},X_t) = \gamma_{-k} \quad \blacksquare
2
Non-negative Definiteness Setup

Consider arbitrary coefficients a₁,...,aₙ and times t₁,...,tₙ.

aTΓna=i,jaiajγij=i,jaiajE[(Xtiμ)(Xtjμ)]\mathbf{a}^T\Gamma_n\mathbf{a} = \sum_{i,j} a_ia_j\gamma_{i-j} = \sum_{i,j} a_ia_jE[(X_{t_i}-\mu)(X_{t_j}-\mu)]
3
Exchange Sum and Expectation

By linearity of expectation, we can move E inside the double sum.

=E[iai(Xtiμ)jaj(Xtjμ)]=E[(iai(Xtiμ))2]= E\left[\sum_i a_i(X_{t_i}-\mu) \sum_j a_j(X_{t_j}-\mu)\right] = E\left[\left(\sum_i a_i(X_{t_i}-\mu)\right)^2\right]
4
Non-negativity Conclusion

Expectation of a square is always non-negative.

E[(iai(Xtiμ))2]0E\left[\left(\sum_i a_i(X_{t_i}-\mu)\right)^2\right] \geq 0 \quad \blacksquare
5
Boundedness via Cauchy-Schwarz

Apply Cauchy-Schwarz inequality to covariances.

[Cov(Xt,Xt+k)]2Var(Xt)Var(Xt+k)=γ02[\text{Cov}(X_t,X_{t+k})]^2 \leq \text{Var}(X_t)\text{Var}(X_{t+k}) = \gamma_0^2
6
Final Bound

Taking square roots gives the desired inequality.

γkγ0 for all k|\gamma_k| \leq \gamma_0 \text{ for all } k \quad \blacksquare
Herglotz's Theorem (Converse)

Any sequence satisfying these three properties can be realized as the autocovariance of some stationary process. This profound result connects time-domain analysis to spectral theory via the spectral representation theorem.

Theorem 3: Sum of Orthogonal Stationary Processes
Decomposition Theory

Orthogonal processes combine to form stationary processes

Theorem Statement

If {Xt}\{X_t\} and {Yt}\{Y_t\} are stationary with Cov(Xt,Ys)=0\text{Cov}(X_t,Y_s)=0 for all t,st,s, then Zt=Xt+YtZ_t = X_t + Y_t is stationary with γZ(k)=γX(k)+γY(k)\gamma_Z(k) = \gamma_X(k) + \gamma_Y(k).

Proof

1
Second Moment Finiteness

Use Cauchy-Schwarz to bound cross-product term.

E[Zt2]=E[Xt2]+2E[XtYt]+E[Yt2]E[Xt2]+2E[Xt2]E[Yt2]+E[Yt2]<E[Z_t^2] = E[X_t^2] + 2E[X_tY_t] + E[Y_t^2] \leq E[X_t^2] + 2\sqrt{E[X_t^2]E[Y_t^2]} + E[Y_t^2] < \infty
2
Constant Mean

Expectation is linear.

E[Zt]=E[Xt]+E[Yt]=μX+μY=μZE[Z_t] = E[X_t] + E[Y_t] = \mu_X + \mu_Y = \mu_Z
3
Expand Autocovariance

Use bilinearity of covariance.

γZ(k)=E[(Xt+YtμZ)(Xt+k+Yt+kμZ)]\gamma_Z(k) = E[(X_t+Y_t-\mu_Z)(X_{t+k}+Y_{t+k}-\mu_Z)]
4
Distribute Products

Expand into four terms.

=E[(XtμX)(Xt+kμX)]+E[(YtμY)(Yt+kμY)]+2E[(XtμX)(Yt+kμY)]= E[(X_t-\mu_X)(X_{t+k}-\mu_X)] + E[(Y_t-\mu_Y)(Y_{t+k}-\mu_Y)] + 2E[(X_t-\mu_X)(Y_{t+k}-\mu_Y)]
5
Apply Orthogonality

Cross-covariance vanishes by assumption.

Cov(Xt,Yt+k)=0    γZ(k)=γX(k)+γY(k)\text{Cov}(X_t,Y_{t+k}) = 0 \implies \gamma_Z(k) = \gamma_X(k) + \gamma_Y(k)
6
Lag-Dependence Verified

Result depends only on lag k, confirming stationarity.

γZ(k) is a function of k alone\gamma_Z(k) \text{ is a function of } k \text{ alone} \quad \blacksquare
Decomposition Application

This theorem justifies additive decompositions like Xt=Tt+St+εtX_t = T_t + S_t + \varepsilon_t (trend + seasonal + noise). If components are uncorrelated, we can analyze each separately and simply add their autocovariances.

Detailed Worked Examples

Complete step-by-step solutions with rigorous calculations

Example 1: Harmonic Process with Random Phase

Problem:

Let UUniform(π,π)U \sim \text{Uniform}(-\pi, \pi) and define Xt=bcos(at+U)X_t = b\cos(at + U) where a,ba, b are constants. Prove that {Xt}\{X_t\} is stationary and find its autocovariance function.

Solution:

  1. Calculate Mean:
    E[Xt]=E[bcos(at+U)]=bππcos(at+u)12πduE[X_t] = E[b\cos(at + U)] = b\int_{-\pi}^{\pi} \cos(at + u) \frac{1}{2\pi} du

    Using substitution v=at+uv = at + u:

    =b2π[sin(at+u)]ππ=b2π[sin(at+π)sin(atπ)]= \frac{b}{2\pi}[\sin(at + u)]_{-\pi}^{\pi} = \frac{b}{2\pi}[\sin(at+\pi) - \sin(at-\pi)]
    =b2π[sin(at)(sin(at))]=0= \frac{b}{2\pi}[-\sin(at) - (-\sin(at))] = 0
  2. Compute Autocovariance: Since E[Xt]=0E[X_t] = 0, we have γ(k)=E[XtXt+k]\gamma(k) = E[X_t X_{t+k}]
    E[XtXt+k]=E[bcos(at+U)bcos(a(t+k)+U)]E[X_t X_{t+k}] = E[b\cos(at+U) \cdot b\cos(a(t+k)+U)]
    =b2ππcos(at+u)cos(a(t+k)+u)du2π= b^2 \int_{-\pi}^{\pi} \cos(at+u)\cos(a(t+k)+u) \frac{du}{2\pi}

    Apply product-to-sum formula: cosAcosB=12[cos(A+B)+cos(AB)]\cos A \cos B = \frac{1}{2}[\cos(A+B) + \cos(A-B)]

    =b24πππ[cos(2at+ak+2u)+cos(ak)]du= \frac{b^2}{4\pi} \int_{-\pi}^{\pi} [\cos(2at+ak+2u) + \cos(ak)] du
  3. Evaluate Integral:

    First term integrates to zero (over full period):

    ππcos(2at+ak+2u)du=12[sin(2at+ak+2u)]ππ=0\int_{-\pi}^{\pi} \cos(2at+ak+2u) du = \frac{1}{2}[\sin(2at+ak+2u)]_{-\pi}^{\pi} = 0

    Second term:

    ππcos(ak)du=cos(ak)2π\int_{-\pi}^{\pi} \cos(ak) du = \cos(ak) \cdot 2\pi
  4. Final Result:
    γ(k)=b24π2πcos(ak)=b22cos(ak)\gamma(k) = \frac{b^2}{4\pi} \cdot 2\pi \cos(ak) = \frac{b^2}{2}\cos(ak)

    This depends only on lag kk, confirming stationarity. Variance is γ(0)=b2/2\gamma(0) = b^2/2.

Key Insight:

Random phase UU uniformly distributed ensures stationarity despite the deterministic cosine structure. The ACF ρ(k)=cos(ak)\rho(k) = \cos(ak) perfectly oscillates, never decaying—typical of pure periodic signals.

Example 2: Finite Moving Average MA(q)

Problem:

Let {εt}WN(0,σ2)\{\varepsilon_t\} \sim WN(0, \sigma^2) and define Xt=j=0qajεtjX_t = \sum_{j=0}^q a_j \varepsilon_{t-j}. Prove stationarity and derive the autocovariance function.

Solution:

  1. Verify Second Moment:
    E[Xt2]=E[(j=0qajεtj)2]=j=0qi=0qajaiE[εtjεti]E[X_t^2] = E\left[\left(\sum_{j=0}^q a_j\varepsilon_{t-j}\right)^2\right] = \sum_{j=0}^q \sum_{i=0}^q a_j a_i E[\varepsilon_{t-j}\varepsilon_{t-i}]

    Since white noise: E[εsεr]=σ2δsrE[\varepsilon_s\varepsilon_r] = \sigma^2\delta_{sr}

    =j=0qaj2σ2=σ2j=0qaj2<= \sum_{j=0}^q a_j^2 \sigma^2 = \sigma^2 \sum_{j=0}^q a_j^2 < \infty
  2. Check Mean:
    E[Xt]=E[j=0qajεtj]=j=0qajE[εtj]=0E[X_t] = E\left[\sum_{j=0}^q a_j\varepsilon_{t-j}\right] = \sum_{j=0}^q a_j E[\varepsilon_{t-j}] = 0
  3. Derive Autocovariance for lag k0k \geq 0:
    γ(k)=E[XtXt+k]=E[j=0qajεtji=0qaiεt+ki]\gamma(k) = E[X_t X_{t+k}] = E\left[\sum_{j=0}^q a_j\varepsilon_{t-j} \sum_{i=0}^q a_i\varepsilon_{t+k-i}\right]
    =j=0qi=0qajaiE[εtjεt+ki]= \sum_{j=0}^q \sum_{i=0}^q a_j a_i E[\varepsilon_{t-j}\varepsilon_{t+k-i}]

    Non-zero only when tj=t+kit-j = t+k-i, i.e., i=j+ki = j+k

  4. Simplify for different lags:

    For 0kq0 \leq k \leq q:

    γ(k)=σ2j=0qkajaj+k\gamma(k) = \sigma^2 \sum_{j=0}^{q-k} a_j a_{j+k}

    For k>q|k| > q:

    γ(k)=0(ACF cuts off!)\gamma(k) = 0 \quad \text{(ACF cuts off!)}

    By symmetry: γ(k)=γ(k)\gamma(-k) = \gamma(k)

Key Insight:

MA(q) processes are always stationary (finite weights guarantee finite variance). The ACF cuts off after lag qq, a diagnostic signature used in model identification. Variance is γ(0)=σ2j=0qaj2\gamma(0) = \sigma^2\sum_{j=0}^q a_j^2.

Example 3: Rectangular Window Filter

Problem:

Apply a rectangular window filter to stationary {Xt}\{X_t\} with γX(k)\gamma_X(k):
Yt=12M+1j=MMXtjY_t = \frac{1}{2M+1}\sum_{j=-M}^M X_{t-j}
Find the autocovariance of {Yt}\{Y_t\}.

Solution:

  1. Identify filter coefficients: hj=12M+1h_j = \frac{1}{2M+1} for jM|j| \leq M, else hj=0h_j = 0
  2. Apply general filtering formula:
    γY(k)=l=j=hlhjγX(k+lj)\gamma_Y(k) = \sum_{l=-\infty}^{\infty} \sum_{j=-\infty}^{\infty} h_l h_j \gamma_X(k+l-j)

    With finite support hjh_j:

    =l=MMj=MM1(2M+1)2γX(k+lj)= \sum_{l=-M}^M \sum_{j=-M}^M \frac{1}{(2M+1)^2} \gamma_X(k+l-j)
  3. Change variables: Let m=ljm = l-j, then m[2M,2M]m \in [-2M, 2M]
    γY(k)=1(2M+1)2m=2M2MwmγX(k+m)\gamma_Y(k) = \frac{1}{(2M+1)^2} \sum_{m=-2M}^{2M} w_m \gamma_X(k+m)

    where wmw_m counts pairs (l,j)(l,j) such that lj=ml-j=m with l,jM|l|,|j| \leq M

  4. Determine weights wmw_m: For mM|m| \leq M, wm=2M+1mw_m = 2M+1-|m|; for M<m2MM < |m| \leq 2M, wmw_m decreases linearly

    Simplified form:

    γY(k)=12M+1m=(2M)2M(1m2M+1)γX(k+m)\gamma_Y(k) = \frac{1}{2M+1}\sum_{m=-(2M)}^{2M} \left(1 - \frac{|m|}{2M+1}\right) \gamma_X(k+m)

Interpretation:

The rectangular filter smooths the original process by averaging nearby values. The filtered ACF is a weighted average of the original ACF, with weights forming a triangular kernel. Larger MM increases smoothing but reduces responsiveness to changes.

Example 4: Spectral Density of MA(1)

Problem:

For Xt=εt+θεt1X_t = \varepsilon_t + \theta\varepsilon_{t-1} where {εt}WN(0,σ2)\{\varepsilon_t\} \sim WN(0,\sigma^2), find the spectral density function.

Solution:

  1. Find ACF: From MA(1) structure with a0=1,a1=θa_0=1, a_1=\theta
    γ(0)=σ2(1+θ2)\gamma(0) = \sigma^2(1 + \theta^2)
    γ(1)=γ(1)=σ2θ\gamma(1) = \gamma(-1) = \sigma^2\theta
    γ(k)=0 for k2\gamma(k) = 0 \text{ for } |k| \geq 2
  2. Apply spectral density formula:

    For linear process Xt=ajεtjX_t = \sum a_j \varepsilon_{t-j}:

    f(λ)=σ22πjajeijλ2f(\lambda) = \frac{\sigma^2}{2\pi}\left|\sum_{j} a_j e^{-ij\lambda}\right|^2
  3. Compute transfer function:
    H(eiλ)=j=01ajeijλ=1+θeiλH(e^{-i\lambda}) = \sum_{j=0}^1 a_j e^{-ij\lambda} = 1 + \theta e^{-i\lambda}
  4. Calculate modulus squared:
    H(eiλ)2=(1+θeiλ)(1+θeiλ)|H(e^{-i\lambda})|^2 = (1 + \theta e^{-i\lambda})(1 + \theta e^{i\lambda})
    =1+θeiλ+θeiλ+θ2= 1 + \theta e^{-i\lambda} + \theta e^{i\lambda} + \theta^2
    =1+θ2+2θcos(λ)= 1 + \theta^2 + 2\theta\cos(\lambda)
  5. Final spectral density:
    f(λ)=σ22π(1+θ2+2θcosλ),λ[π,π]f(\lambda) = \frac{\sigma^2}{2\pi}(1 + \theta^2 + 2\theta\cos\lambda), \quad \lambda \in [-\pi, \pi]

Frequency Domain Interpretation:

If θ>0\theta > 0, spectral density peaks at λ=0\lambda=0 (low frequencies dominate). If θ<0\theta < 0, it peaks at λ=±π\lambda = \pm\pi (high frequencies dominate). This explains why θ<0\theta < 0 produces oscillatory behavior.

Advanced Topics

Deeper exploration of linear processes, spectral theory, and ergodicity

Linear Processes and Wold Decomposition

Linear Process Definition

A process {Xt}\{X_t\} is a linear process if it can be written as:

Xt=j=ψjεtjX_t = \sum_{j=-\infty}^{\infty} \psi_j \varepsilon_{t-j}

where {εt}WN(0,σ2)\{\varepsilon_t\} \sim WN(0,\sigma^2) and j=ψj<\sum_{j=-\infty}^{\infty} |\psi_j| < \infty (absolutely summable coefficients).

Causal Representation

If ψj=0\psi_j = 0 for j<0j < 0:

Xt=j=0ψjεtjX_t = \sum_{j=0}^{\infty} \psi_j \varepsilon_{t-j}

Current value depends only on current and past innovations (causal / one-sided).

Autocovariance Formula

For linear process with E[Xt]=0E[X_t]=0:

γ(k)=σ2j=ψjψj+k\gamma(k) = \sigma^2 \sum_{j=-\infty}^{\infty} \psi_j \psi_{j+k}

Absolutely summable ψj\psi_j ensures absolute convergence of γ(k)\gamma(k).

Wold Decomposition Theorem

Any zero-mean stationary process {Xt}\{X_t\} can be uniquely decomposed as:

Xt=j=0ψjεtj+VtX_t = \sum_{j=0}^{\infty} \psi_j \varepsilon_{t-j} + V_t

where {εt}\{\varepsilon_t\} is white noise, {Vt}\{V_t\} is deterministic (perfectly predictable from past), and they are uncorrelated. This separates the stochastic and deterministic components.

Practical Significance

Wold decomposition shows that purely nondeterministic processes (Vₜ=0) can be represented as infinite MA. This includes all ARMA processes, making linear representation fundamental to time series modeling.

Spectral Density and Frequency Domain

Spectral Representation

The autocovariance function γ(k)\gamma(k) and spectral density f(λ)f(\lambda) form a Fourier transform pair:

Forward (ACF → Spectral Density):

f(λ)=12πk=γ(k)eikλ,λ[π,π]f(\lambda) = \frac{1}{2\pi} \sum_{k=-\infty}^{\infty} \gamma(k) e^{-ik\lambda}, \quad \lambda \in [-\pi,\pi]

Inverse (Spectral Density → ACF):

γ(k)=ππeikλf(λ)dλ\gamma(k) = \int_{-\pi}^{\pi} e^{ik\lambda} f(\lambda) d\lambda
Properties
  • Non-negative: f(λ)0f(\lambda) \geq 0
  • Even function: f(λ)=f(λ)f(-\lambda) = f(\lambda)
  • Real-valued for real processes
  • Integrates to variance: ππf(λ)dλ=γ(0)\int_{-\pi}^{\pi} f(\lambda)d\lambda = \gamma(0)
Interpretation

f(λ)f(\lambda) describes how variance is distributed across frequencies λ\lambda:

  • • Peak at λ=0\lambda=0: low-frequency (trend) dominance
  • • Peak at λ=±π\lambda=\pm\pi: high-frequency (oscillation) dominance
  • • Flat spectrum: white noise (all frequencies equally)
Linear Filter in Frequency Domain

If Yt=hjXtjY_t = \sum h_j X_{t-j} with transfer function H(eiλ)=hjeijλH(e^{-i\lambda}) = \sum h_j e^{-ij\lambda}, then:

fY(λ)=H(eiλ)2fX(λ)f_Y(\lambda) = |H(e^{-i\lambda})|^2 f_X(\lambda)

This elegant result shows filtering modifies the spectral density by the squared magnitude of the frequency response.

Engineering Applications

Spectral analysis enables signal extraction: design filters to pass desired frequencies and attenuate others. Used extensively in communications (bandpass filters), seismology (earthquake signal isolation), and economics (trend-cycle decomposition).

Ergodicity and Long-Run Behavior

Ergodicity Definition

A stationary process {Xt}\{X_t\} is ergodic for the mean if:

1nt=1nXta.s.E[X1]=μas n\frac{1}{n}\sum_{t=1}^n X_t \xrightarrow{\text{a.s.}} E[X_1] = \mu \quad \text{as } n \to \infty

This means time averages converge to ensemble averages almost surely. A single long realization contains the same information as infinitely many independent short realizations.

Sufficient Condition

If the ACF satisfies:

limn1nk=1nγ(k)=0\lim_{n\to\infty} \frac{1}{n}\sum_{k=1}^n |\gamma(k)| = 0

then the process is ergodic for the mean. This holds if γ(k)0\gamma(k) \to 0 as kk \to \infty (mixing condition).

Statistical Inference

With ergodicity, we can estimate:

  • • Mean: μ^=n1Xt\hat{\mu} = n^{-1}\sum X_t
  • • Variance: γ^(0)=n1(Xtμ^)2\hat{\gamma}(0) = n^{-1}\sum(X_t-\hat{\mu})^2
  • • ACF: γ^(k)=n1(Xtμ^)(Xt+kμ^)\hat{\gamma}(k) = n^{-1}\sum(X_t-\hat{\mu})(X_{t+k}-\hat{\mu})

from a single realization, which is crucial for real-world data analysis.

Ergodic Theorem for Stationary Sequences

For a stationary ergodic sequence and any measurable function gg with E[g(Xt)]<E[|g(X_t)|] < \infty:

1nt=1ng(Xt)a.s.E[g(X1)]as n\frac{1}{n}\sum_{t=1}^n g(X_t) \xrightarrow{\text{a.s.}} E[g(X_1)] \quad \text{as } n \to \infty

This generalizes the law of large numbers to dependent sequences, enabling consistent estimation of expectations of any functional of the process.

Why Ergodicity Matters

Without ergodicity: We'd need multiple independent time series realizations to estimate population moments.

With ergodicity: A single sufficiently long series provides consistent estimates. This is the foundation for all empirical time series analysis—we almost never have multiple realizations, yet ergodicity validates using one long series for estimation and inference.

Practical Stationarity Testing

Diagnostic Tools

Before applying stationary process models, verify stationarity using these methods:

1Visual Inspection
  • • Plot the series: does mean/variance appear constant?
  • • Check ACF plot: should decay towards zero
  • • Look for trends or seasonal patterns
2Augmented Dickey-Fuller (ADF) Test

Tests null hypothesis: process has unit root (non-stationary)

ΔXt=α+βt+γXt1+j=1pδjΔXtj+εt\Delta X_t = \alpha + \beta t + \gamma X_{t-1} + \sum_{j=1}^p \delta_j \Delta X_{t-j} + \varepsilon_t

Test H0:γ=0H_0: \gamma = 0 vs H1:γ<0H_1: \gamma < 0. Reject H0H_0 → stationary.

3KPSS Test

Tests null hypothesis: process is stationary (opposite of ADF)

Decomposes Xt=ξt+rt+εtX_t = \xi t + r_t + \varepsilon_t (trend + random walk + error). Test H0:Var(rt)=0H_0: \text{Var}(r_t) = 0. Reject H0H_0 → non-stationary.

4Phillips-Perron Test

Similar to ADF but robust to heteroskedasticity and serial correlation. Uses non-parametric correction to the test statistic.

Making Data Stationary

If tests indicate non-stationarity:

  • Differencing: Xt=XtXt1\nabla X_t = X_t - X_{t-1} removes trends
  • Log transformation: stabilizes variance
  • Detrending: remove deterministic trend
  • Seasonal differencing: sXt=XtXts\nabla_s X_t = X_t - X_{t-s} for seasonality

After transformation, re-test stationarity before modeling.

Real-World Applications

Where stationary processes theory meets practice

Quantitative Finance

Asset returns (log-returns) are often modeled as stationary processes, unlike prices which are non-stationary (random walks).

Application:

Volatility modeling (ARCH/GARCH) assumes stationarity of the squared returns series to forecast risk (VaR).

Signal Processing

Noise in communication channels is modeled as stationary random processes (often Gaussian white noise).

Application:

Wiener filters use stationarity assumptions to optimally separate signal from noise, minimizing mean square error.

Geophysics & Meteorology

Climate indices (e.g., SOI, NAO) and seismic background noise are analyzed as stationary series after detrending.

Application:

Spectral analysis identifies dominant cycles (e.g., El Niño periodicity) in environmental data series.

Frequently Asked Questions

What's the difference between weak and strict stationarity?

Weak (second-order) stationarity only requires constant mean and autocovariance depending on lags, involving first two moments. Strict stationarity requires all finite-dimensional distributions to be invariant under time shifts. Strict stationarity implies weak stationarity if second moments exist, but not vice versa. Gaussian processes are an exception: weak stationary Gaussian processes are also strictly stationary.

Why is the autocovariance function non-negative definite?

For any coefficients a₁,...,aₙ and times t₁,...,tₙ, the variance of the linear combination Σᵢ aᵢXₜᵢ must be non-negative. Expanding this variance: Var(Σᵢ aᵢXₜᵢ) = Σᵢ Σⱼ aᵢaⱼγ(tᵢ-tⱼ) = aᵀΓa ≥ 0. This algebraic property is fundamental and ensures the covariance matrix is positive semidefinite.

Can a non-stationary process have stationary increments?

Yes! Random walk Xₜ = Σᵢ₌₁ᵗ εᵢ (where εᵢ is white noise) is non-stationary since Var(Xₜ) = tσ² grows with time. However, its increments ΔXₜ = Xₜ - Xₜ₋₁ = εₜ are stationary. This distinction is crucial: ARIMA models difference non-stationary series to achieve stationarity.

How does linear filtration affect stationarity?

If {Xₜ} is stationary and we apply a linear filter Yₜ = Σⱼ hⱼXₜ₋ⱼ with absolutely summable coefficients Σ|hⱼ| < ∞, then {Yₜ} is also stationary. The filtered process has autocovariance γY(k) = ΣⱼΣᵢ hⱼhᵢγX(k+j-i). Linear filters preserve stationarity while modifying the spectral characteristics.

What is the relationship between ACF and PACF?

ACF (autocorrelation function) measures direct and indirect correlations at each lag. PACF (partial autocorrelation function) measures only the direct correlation after removing intermediate lag effects. For MA(q): ACF cuts off after lag q, PACF decays. For AR(p): PACF cuts off after lag p, ACF decays. This diagnostic property helps identify model order.

Why do we need the spectral density function?

The spectral density f(λ) represents how variance is distributed across frequencies. It's the Fourier transform of the autocovariance: γₖ = ∫₋ππ eⁱᵏᵘ f(λ)dλ. While ACF captures time-domain dependence, spectral density reveals frequency-domain structure. Peaks in f(λ) indicate dominant cyclical components. This duality is essential for filter design and signal extraction.

What does ergodicity mean for time series?

An ergodic process allows time averages to converge to ensemble averages. Specifically, if {Xₜ} is ergodic for the mean, then (1/n)Σₜ₌₁ⁿ Xₜ → E[X₁] almost surely as n → ∞. This is crucial for statistical inference: with ergodicity, a single long realization provides information about population moments. Without ergodicity, we'd need multiple independent realizations.

How do we test for stationarity in practice?

Common tests include: (1) Augmented Dickey-Fuller (ADF) test for unit roots (null: non-stationary), (2) KPSS test (null: stationary), (3) Phillips-Perron test (robust to heteroskedasticity), (4) Visual inspection: plot ACF (should decay) and check if mean/variance appear constant. Use differencing or detrending if tests reject stationarity.

Historical Development

The evolution of stationary process theory

1927Yule & Slutsky

Birth of Time Series Analysis

G.U. Yule modeled sunspot numbers using autoregressive (AR) schemes. Independently, E. Slutsky showed that moving averages of random events could generate cyclic-like behavior, challenging the idea that economic cycles must have deterministic causes.

1934A.Y. Khinchin

Correlation Theory

Khinchin established the rigorous mathematical foundation for stationary processes, defining the correlation function and proving the spectral representation theorem (Wiener-Khinchin theorem).

1938Herman Wold

Wold Decomposition

In his thesis 'A Study in the Analysis of Stationary Time Series', Wold proved that any stationary process can be decomposed into a deterministic part and a purely non-deterministic (linear) part.

1941A.N. Kolmogorov

Prediction Theory

Kolmogorov solved the fundamental problem of linear prediction for stationary sequences, deriving the formula for the mean square prediction error in terms of the spectral density.

Chapter Summary

Core Concepts

  • Stationarity: Statistical properties (mean, variance, ACF) are time-invariant.
  • ACF: Measures linear dependence over time; key for model identification.
  • White Noise: The fundamental building block; uncorrelated, zero mean, constant variance.

Advanced Theory

  • Spectral Density: Frequency domain equivalent of ACF; reveals cyclical structure.
  • Linear Processes: Output of linear filters applied to white noise (Wold's Theorem).
  • Ergodicity: Justifies using single time series realizations for statistical inference.

"Stationarity is the assumption that allows us to learn from the past to predict the future. Without it, the past is just a sequence of unique events."

Further Reading

Recommended textbooks for deeper study

Time Series: Theory and Methods
Brockwell & Davis (1991)

The definitive reference for rigorous mathematical theory of time series. Essential for understanding the proofs and Hilbert space foundations presented in this course.

Time Series Analysis
James D. Hamilton (1994)

The standard text for econometrics. Excellent coverage of stationarity, unit roots, and vector autoregressions (VAR) with economic applications.

Time Series Analysis and Its Applications
Shumway & Stoffer (2017)

A modern, accessible approach with extensive R examples. Balances theory with practical implementation and real-world data analysis.

Time Series Analysis: Forecasting and Control
Box, Jenkins, Reinsel & Ljung (2015)

The classic engineering text that introduced the ARIMA methodology (Box-Jenkins method). Focuses on model identification, estimation, and diagnostic checking.

Next ModuleARMA Models
Start Learning