MathIsimple
← Back to Probability Theory
Core Topic

Digital Characteristics & Characteristic Functions

Master the numerical characteristics of random variables including expectation, variance, moments, and characteristic functions. Essential tools for describing and analyzing probability distributions.

Intermediate Level
15 Lessons
10-12 Hours
Learning Objectives
Essential concepts you'll master in digital characteristics and characteristic functions
  • Master mathematical expectation and its fundamental properties
  • Understand variance, covariance, and correlation coefficient
  • Learn moment theory and distribution characterization
  • Master characteristic functions and their applications
  • Understand multivariate normal distribution properties

Mathematical Expectation

The fundamental measure of central tendency for random variables

Definition and Essence

Core Definition

Definition:

Mathematical expectation is the 'weighted average' of a random variable's values, where weights are corresponding probabilities, reflecting the central tendency of the random variable.

Convergence Condition:

Must satisfy 'absolute convergence' condition to avoid order dependency in summation/integration:

Discrete Case
Eξ=k=1xkpkE\xi = \sum_{k=1}^{\infty} x_k p_k

provided \sum_{k=1}^{\infty} |x_k| p_k < \infty

Continuous Case
Eξ=xp(x)dxE\xi = \int_{-\infty}^{\infty} x p(x) dx

provided \int_{-\infty}^{\infty} |x| p(x) dx < \infty

General Form Case
Eξ=xdF(x)E\xi = \int_{-\infty}^{\infty} x dF(x)

Stieltjes integral unifies discrete (summation) and continuous (integration) cases

Fundamental Properties

Key Properties

Monotonicity

If a ≤ ξ ≤ b, then a ≤ Eξ ≤ b; if ξ ≤ η, then Eξ ≤ Eη

Importance: Preserves ordering relationships

Linearity
E(i=1nciξi+b)=i=1nciEξi+bE\left(\sum_{i=1}^n c_i\xi_i + b\right) = \sum_{i=1}^n c_i E\xi_i + b

Linear combinations preserve expectation (independence not required)

Importance: Most fundamental property for calculations

Independence Property
If ξ1,,ξn independent, then E(ξ1ξn)=Eξ1Eξn\text{If } \xi_1, \ldots, \xi_n \text{ independent, then } E(\xi_1 \cdots \xi_n) = E\xi_1 \cdots E\xi_n

Product of independent variables equals product of expectations

Important Inequalities

Markov Inequality
P(ξε)EξεP(|\xi| \geq \varepsilon) \leq \frac{E|\xi|}{\varepsilon}

Condition: for ε > 0

Application: Bounds probability using first moment

Cauchy-Schwarz Inequality
E(ξη)2Eξ2Eη2|E(\xi\eta)|^2 \leq E\xi^2 \cdot E\eta^2

Condition: equality iff P(η = t₀ξ) = 1 for some constant t₀

Application: Fundamental inequality connecting second moments

Jensen Inequality
If g(x) is convex, then g(Eξ)Eg(ξ)\text{If } g(x) \text{ is convex, then } g(E\xi) \leq Eg(\xi)

Condition: strict convexity: equality iff P(ξ = Eξ) = 1

Application: Relates function of expectation to expectation of function

Common Distribution Expectations
DistributionParametersExpectation E[ξ]
Bernoulli Ber(p)p: success probabilitypp
Binomial B(n,p)n: trials, p: success probabilitynpnp
Poisson P(λ)λ: rate parameterλλ
Geometric Geo(p)p: success probability1/p1/p
Uniform U[a,b]a, b: interval endpoints(a+b)/2(a+b)/2
Exponential Exp(λ)λ: rate parameter1/λ1/λ
Normal N(μ,σ²)μ: mean, σ²: varianceμμ

Variance & Covariance

Measuring dispersion and joint variability between random variables

Variance: Measuring Dispersion

Definition and Formula

Definition:

Variance measures how much a random variable deviates from its mean, defined as the expectation of squared deviation:

Varξ=E(ξEξ)2=Eξ2(Eξ)2\text{Var}\xi = E(\xi - E\xi)^2 = E\xi^2 - (E\xi)^2
Key Point:

The computational formula E[ξ²] - (E[ξ])² avoids direct calculation of deviations

Properties:
  • Var ξ = 0 ⟺ P(ξ = c) = 1 for some constant c (degenerate distribution)
  • Translation invariance: Var(ξ + b) = Var ξ (constants don't affect dispersion)
  • Scaling property: Var(cξ) = c²Var ξ (dispersion scales quadratically)
  • Independence property: If ξ₁,...,ξₙ independent, then Var(Σξᵢ) = ΣVar(ξᵢ)

Standard Deviation

Definition:

σ(ξ) = √Var(ξ), having the same units as the random variable

Advantage:

More interpretable than variance due to matching units

Chebyshev Inequality

Definition:

P(ξEξε)Varξε2P(|\xi - E\xi| \geq \varepsilon) \leq \frac{\text{Var}\xi}{\varepsilon^2}
Covariance: Measuring Joint Variability
Definition:
Cov(ξ,η)=E[(ξEξ)(ηEη)]=E(ξη)EξEηCov(ξ,η) = E[(ξ - Eξ)(η - Eη)] = E(ξη) - Eξ·Eη

Measures how two variables 'jointly deviate' from their respective means

Properties

Symmetry: Cov(ξ,η) = Cov(η,ξ)
Linearity: Cov(aξ+b, cη+d) = ac·Cov(ξ,η)
Distributivity: Cov(Σξᵢ, Σηⱼ) = ΣᵢΣⱼCov(ξᵢ,ηⱼ)
Variance relationship: Var(ξ+η) = Var(ξ) + Var(η) + 2Cov(ξ,η)
Correlation Coefficient: Standardized Dependence
Formula:
rξη=Cov(ξ,η)VarξVarηr_{\xi\eta} = \frac{\text{Cov}(\xi,\eta)}{\sqrt{\text{Var}\xi \cdot \text{Var}\eta}}

Purpose: Eliminates scale effects from covariance

Properties

Bounded: |r_{ξη}| ≤ 1
Perfect positive correlation: r_{ξη} = 1 ⟺ P((ξ-Eξ)/√Varξ = (η-Eη)/√Varη) = 1
Perfect negative correlation: r_{ξη} = -1 ⟺ P((ξ-Eξ)/√Varξ = -(η-Eη)/√Varη) = 1
Uncorrelated: r_{ξη} = 0 ⟺ Cov(ξ,η) = 0 ⟺ E(ξη) = Eξ·Eη

Independence vs Uncorrelatedness

General Rule:

ξ and η independent ⇒ uncorrelated (converse not generally true)

Counterexample:

ξ = cos θ, η = sin θ where θ ~ U[0,2π]: uncorrelated but not independent

Special Case:

For bivariate normal: uncorrelated ⟺ independent

Common Distribution Variances
DistributionVariance Var(ξ)Note
Bernoulli Ber(p)p(1p)p(1-p)Maximum at p = 1/2
Binomial B(n,p)np(1p)np(1-p)n times Bernoulli variance
Poisson P(λ)λλMean equals variance
Geometric Geo(p)(1p)/p2(1-p)/p²Decreases with higher success probability
Uniform U[a,b](ba)2/12(b-a)²/12Depends only on interval width
Exponential Exp(λ)1/λ21/λ²Variance is square of mean
Normal N(μ,σ²)σ2σ²Direct parameter specification

Moment Theory

Unified framework for describing distribution characteristics

Moments: Unified Framework for Distribution Characteristics

Raw Moments

mk=Eξkm_k = E\xi^k

k-th power expectation about origin

Examples:
  • m₁ = Eξ (mean)
  • m₂ = Eξ² (second moment)

Central Moments

ck=E(ξEξ)kc_k = E(\xi - E\xi)^k

k-th power expectation about mean

Examples:
  • c₁ = 0 (always)
  • c₂ = Var ξ (variance)

Absolute Moments

Mα=EξαM_\alpha = E|\xi|^\alpha

Property: If Mₙ < ∞, then Mₖ < ∞ for 0 < k ≤ n

Shape Characteristics from Moments

Skewness Coefficient

γ1=c3c23/2\gamma_1 = \frac{c_3}{c_2^{3/2}}

Measures asymmetry of distribution

Values:
  • γ₁ > 0: right-skewed (positive skew)
  • γ₁ < 0: left-skewed (negative skew)
  • γ₁ = 0: symmetric (e.g., normal distribution)

Kurtosis Coefficient

γ2=c4c223\gamma_2 = \frac{c_4}{c_2^2} - 3

Measures 'peakedness' relative to normal distribution

Values:
  • γ₂ > 0: leptokurtic (more peaked than normal)
  • γ₂ < 0: platykurtic (flatter than normal)
  • γ₂ = 0: mesokurtic (normal peakedness)

Characteristic Functions

The most powerful tool for analyzing probability distributions

Definition and Fundamental Properties
Definition:
f(t)=EeitξfortR,wherei2=1f(t) = Ee^{itξ} for t ∈ ℝ, where i² = -1

Essence: Fourier transform of random variable, always exists without convergence conditions

Key Advantage: One-to-one correspondence with distribution functions

Mathematical Forms

Discrete Case
f(t)=k=1eitxkpkf(t) = \sum_{k=1}^{\infty} e^{itx_k} p_k
Continuous Case
f(t)=eitxp(x)dxf(t) = \int_{-\infty}^{\infty} e^{itx} p(x) dx
General Case
f(t)=eitxdF(x)f(t) = \int_{-\infty}^{\infty} e^{itx} dF(x)
Key Properties

Fundamental Properties

Boundedness and Conjugacy

|f(t)| ≤ f(0) = 1, f(-t) = f̄(t) (conjugate)

Uniform Continuity

f(t) is uniformly continuous on ℝ

Non-negative Definiteness

ΣᵢΣⱼ f(tᵢ-tⱼ)λᵢλ̄ⱼ ≥ 0 for any tᵢ ∈ ℝ, λᵢ ∈ ℂ

Independence Property
If ξ1,,ξn independent and η=i=1nξi, then fη(t)=i=1nfξi(t)\text{If } \xi_1, \ldots, \xi_n \text{ independent and } \eta = \sum_{i=1}^n \xi_i, \text{ then } f_\eta(t) = \prod_{i=1}^n f_{\xi_i}(t)

Significance: CF of sum = product of CFs (simplifies distribution calculations)

Moment Generation
If Eξn<, then f(k)(0)=ikEξk for 0kn\text{If } E\xi^n < \infty, \text{ then } f^{(k)}(0) = i^k E\xi^k \text{ for } 0 \leq k \leq n

Application: Extract moments through derivatives at origin

Linear Transformation
If η=aξ+b, then fη(t)=eitbfξ(at)\text{If } \eta = a\xi + b, \text{ then } f_\eta(t) = e^{itb} f_\xi(at)

Common Characteristic Functions
DistributionCharacteristic Function f(t)Parameters
Degenerate P(ξ=c)=1eitce^{itc}c: constant
Bernoulli Ber(p)peit+(1p)pe^{it} + (1-p)p: success probability
Binomial B(n,p)(peit+(1p))n(pe^{it} + (1-p))^nn: trials, p: probability
Poisson P(λ)eλ(eit1)e^{λ(e^{it}-1)}λ: rate parameter
Uniform U[a,b]eitbeitait(ba)\frac{e^{itb} - e^{ita}}{it(b-a)}a, b: interval endpoints
Exponential Exp(λ)(1itλ)1(1 - \frac{it}{λ})^{-1}λ: rate parameter
Normal N(μ,σ²)eitμσ2t22e^{itμ - \frac{σ²t²}{2}}μ: mean, σ²: variance
Cauchy Cauchy(a,b)eitabte^{ita - b|t|}a: location, b: scale
Fundamental Theorems

Inversion Formula

Statement:

For continuity points x₁ < x₂ of distribution function F(x):

F(x2)F(x1)=limT12πTTeitx1eitx2itf(t)dtF(x_2) - F(x_1) = \lim_{T \to \infty} \frac{1}{2\pi} \int_{-T}^T \frac{e^{-itx_1} - e^{-itx_2}}{it} f(t) dt
Significance:

Recovers distribution function from characteristic function

Uniqueness Theorem

Statement:

Characteristic function uniquely determines distribution

f1(t)=f2(t) for all tRF1(x)=F2(x) for all xRf_1(t) = f_2(t) \text{ for all } t \in \mathbb{R} \Rightarrow F_1(x) = F_2(x) \text{ for all } x \in \mathbb{R}
Significance:

Foundation for distribution identification via characteristic functions

Fourier Inversion

Statement:

If ∫₋∞^∞ |f(t)| dt < ∞, then ξ is continuous with density:

p(x)=12πeitxf(t)dtp(x) = \frac{1}{2\pi} \int_{-\infty}^{\infty} e^{-itx} f(t) dt
Significance:

Direct recovery of probability density from characteristic function

Multivariate Normal Distribution

The cornerstone of multivariate probability theory

Definition and Structure
Definition:

n-dimensional random vector ξ = (ξ₁,...,ξₙ)' has multivariate normal distribution if its characteristic function is:

f(t)=exp{ita12tΣt}f(t) = \exp\left\{ita' - \frac{1}{2}t'\Sigma t\right\}

Notation: ξ ~ N(a, Σ)

Parameters

a = (a₁,...,aₙ)' = Eξ (mean vector)
Σ = E[(ξ-a)(ξ-a)'] (n×n non-negative definite covariance matrix)
Fundamental Properties

Fundamental Properties

Marginal Normality

Any subset of components has multivariate normal distribution

If ξN(a,Σ), then ξ(k)N(a(k),Σ(k))\text{If } \xi \sim N(a, \Sigma), \text{ then } \xi_{(k)} \sim N(a_{(k)}, \Sigma_{(k)})

Subscript (k) denotes corresponding subvector/submatrix

Linear Transformation Invariance

Linear combinations preserve multivariate normality

If ξN(a,Σ) and η=Cξ for m×n matrix C, then ηN(Ca,CΣC)\text{If } \xi \sim N(a, \Sigma) \text{ and } \eta = C\xi \text{ for } m \times n \text{ matrix } C, \text{ then } \eta \sim N(Ca, C\Sigma C')
Independence-Uncorrelatedness Equivalence

Components are independent if and only if uncorrelated

ξi and ξj independentCov(ξi,ξj)=0\xi_i \text{ and } \xi_j \text{ independent} \Leftrightarrow \text{Cov}(\xi_i, \xi_j) = 0

Significance: Unique property not shared by other multivariate distributions

Conditional Distribution

Conditional Distribution

Setup:

Partition ξ = (ξ₁', ξ₂')' with corresponding mean and covariance partitions:

a=(a1,a2)a = (a₁', a₂')'
Σ=(Σ11Σ12Σ21Σ22)\Sigma = \begin{pmatrix} \Sigma_{11} & \Sigma_{12} \\ \Sigma_{21} & \Sigma_{22} \end{pmatrix}
Result:

Given ξ₁ = x₁, the conditional distribution is ξ₂|ξ₁ = x₁ ~ N(a₂·₁, Σ₂₂·₁) where:

a21=a2+Σ21Σ111(x1a1)a_{2 \cdot 1} = a_2 + \Sigma_{21}\Sigma_{11}^{-1}(x_1 - a_1)
Σ221=Σ22Σ21Σ111Σ12\Sigma_{22 \cdot 1} = \Sigma_{22} - \Sigma_{21}\Sigma_{11}^{-1}\Sigma_{12}
Interpretation:
  • Conditional mean: linear regression of ξ₂ on ξ₁
  • Conditional variance: independent of x₁ value

Worked Examples

Step-by-step solutions to typical digital characteristics problems

Poisson Distribution Second Moment
Problem:

For ξ ~ P(λ), find E[ξ²]

Solution Steps:
  1. 1Step 1: Use known properties - E[ξ] = λ, Var(ξ) = λ
  2. 2Step 2: Apply variance formula - Var(ξ) = E[ξ²] - (E[ξ])²
  3. 3Step 3: Substitute values - λ = E[ξ²] - λ²
  4. 4Step 4: Solve for E[ξ²] - E[ξ²] = λ + λ² = λ(1 + λ)
Key Point:

Use the computational variance formula rather than direct calculation

Linear Combination Variance
Problem:

If ξ and η are independent with E[ξ]=2, Var(ξ)=1, E[η]=3, Var(η)=2, find Var(2ξ - η + 1)

Solution Steps:
  1. 1Step 1: Apply variance properties - Var(aX) = a²Var(X), Var(X+c) = Var(X)
  2. 2Step 2: Use independence - Var(X+Y) = Var(X) + Var(Y) for independent X,Y
  3. 3Step 3: Calculate - Var(2ξ - η + 1) = Var(2ξ) + Var(-η) + Var(1)
  4. 4Step 4: Substitute - = 2²Var(ξ) + (-1)²Var(η) + 0 = 4(1) + 1(2) = 6
Key Point:

Constants don't affect variance; independence allows additive variance rule

Characteristic Function Identification
Problem:

If characteristic function is f(t) = e^{2it - 3t²/2}, identify the distribution

Solution Steps:
  1. 1Step 1: Compare with normal CF template - f(t) = e^{itμ - σ²t²/2}
  2. 2Step 2: Match coefficients - itμ = 2it, so μ = 2
  3. 3Step 3: Match variance term - σ²t²/2 = 3t²/2, so σ² = 3
  4. 4Step 4: Conclude - ξ ~ N(2, 3)
Key Point:

Normal distribution characteristic function has distinctive exponential quadratic form

Study Tips & Best Practices

Computational Strategy:

  • Use computational formulas: Var(X) = E[X²] - (E[X])² is often easier than the definition
  • Leverage linearity: E[aX + bY + c] = aE[X] + bE[Y] + c always holds
  • Check independence: Use E[XY] = E[X]E[Y] to verify independence
  • Characteristic functions: Products for sums, exponentials for linear transforms

Common Mistakes to Avoid:

  • Variance additivity: Only works for independent variables
  • Correlation vs causation: r = 0 doesn't mean independence
  • Moment existence: Check convergence conditions first
  • Characteristic function uniqueness: Different functions mean different distributions

Practice and apply what you've learned