Core Topic

10-12 Hours

Digital Characteristics & Characteristic Functions

Intermediate Level

15 Lessons

Core Concepts

Mathematical Expectation

The fundamental measure of central tendency for random variables

Definition and Essence

Core Definition

Definition:

Mathematical expectation is the 'weighted average' of a random variable's values, where weights are corresponding probabilities, reflecting the central tendency of the random variable.

Convergence Condition:

Must satisfy 'absolute convergence' condition to avoid order dependency in summation/integration:

Discrete Case

E\xi = \sum_{k=1}^{\infty} x_k p_k

provided \sum_{k=1}^{\infty} |x_k| p_k < \infty

Continuous Case

E\xi = \int_{-\infty}^{\infty} x p(x) dx

provided \int_{-\infty}^{\infty} |x| p(x) dx < \infty

General Form Case

E\xi = \int_{-\infty}^{\infty} x dF(x)

Stieltjes integral unifies discrete (summation) and continuous (integration) cases

Fundamental Properties

Key Properties

Monotonicity

If a ≤ ξ ≤ b, then a ≤ Eξ ≤ b; if ξ ≤ η, then Eξ ≤ Eη

Importance: Preserves ordering relationships

Linearity

E\left(\sum_{i=1}^n c_i\xi_i + b\right) = \sum_{i=1}^n c_i E\xi_i + b

Linear combinations preserve expectation (independence not required)

Importance: Most fundamental property for calculations

Independence Property

\text{If } \xi_1, \ldots, \xi_n \text{ independent, then } E(\xi_1 \cdots \xi_n) = E\xi_1 \cdots E\xi_n

Product of independent variables equals product of expectations

Important Inequalities

Markov Inequality

P(|\xi| \geq \varepsilon) \leq \frac{E|\xi|}{\varepsilon}

Condition: for ε > 0

Application: Bounds probability using first moment

Cauchy-Schwarz Inequality

|E(\xi\eta)|^2 \leq E\xi^2 \cdot E\eta^2

Condition: equality iff P(η = t₀ξ) = 1 for some constant t₀

Application: Fundamental inequality connecting second moments

Jensen Inequality

\text{If } g(x) \text{ is convex, then } g(E\xi) \leq Eg(\xi)

Condition: strict convexity: equality iff P(ξ = Eξ) = 1

Application: Relates function of expectation to expectation of function

Common Distribution Expectations

Distribution	Parameters	Expectation E[ξ]
Bernoulli Ber(p)	p: success probability	$p$
Binomial B(n,p)	n: trials, p: success probability	$np$
Poisson P(λ)	λ: rate parameter	$λ$
Geometric Geo(p)	p: success probability	$1/p$
Uniform U[a,b]	a, b: interval endpoints	$(a+b)/2$
Exponential Exp(λ)	λ: rate parameter	$1/λ$
Normal N(μ,σ²)	μ: mean, σ²: variance	$μ$

Variance & Covariance

Measuring dispersion and joint variability between random variables

Variance: Measuring Dispersion

Definition and Formula

Definition:

Variance measures how much a random variable deviates from its mean, defined as the expectation of squared deviation:

\text{Var}\xi = E(\xi - E\xi)^2 = E\xi^2 - (E\xi)^2

Key Point:

The computational formula E[ξ²] - (E[ξ])² avoids direct calculation of deviations

Properties:

Var ξ = 0 ⟺ P(ξ = c) = 1 for some constant c (degenerate distribution)
Translation invariance: Var(ξ + b) = Var ξ (constants don't affect dispersion)
Scaling property: Var(cξ) = c²Var ξ (dispersion scales quadratically)
Independence property: If ξ₁,...,ξₙ independent, then Var(Σξᵢ) = ΣVar(ξᵢ)

Standard Deviation

Definition:

σ(ξ) = √Var(ξ), having the same units as the random variable

Advantage:

More interpretable than variance due to matching units

Chebyshev Inequality

Definition:

P(|\xi - E\xi| \geq \varepsilon) \leq \frac{\text{Var}\xi}{\varepsilon^2}

Covariance: Measuring Joint Variability

Definition:

Cov(ξ,η) = E[(ξ - Eξ)(η - Eη)] = E(ξη) - Eξ·Eη

Measures how two variables 'jointly deviate' from their respective means

Properties

• Symmetry: Cov(ξ,η) = Cov(η,ξ)

• Linearity: Cov(aξ+b, cη+d) = ac·Cov(ξ,η)

• Distributivity: Cov(Σξᵢ, Σηⱼ) = ΣᵢΣⱼCov(ξᵢ,ηⱼ)

• Variance relationship: Var(ξ+η) = Var(ξ) + Var(η) + 2Cov(ξ,η)

Correlation Coefficient: Standardized Dependence

Formula:

r_{\xi\eta} = \frac{\text{Cov}(\xi,\eta)}{\sqrt{\text{Var}\xi \cdot \text{Var}\eta}}

Purpose: Eliminates scale effects from covariance

Properties

• Bounded: |r_{ξη}| ≤ 1

• Perfect positive correlation: r_{ξη} = 1 ⟺ P((ξ-Eξ)/√Varξ = (η-Eη)/√Varη) = 1

• Perfect negative correlation: r_{ξη} = -1 ⟺ P((ξ-Eξ)/√Varξ = -(η-Eη)/√Varη) = 1

• Uncorrelated: r_{ξη} = 0 ⟺ Cov(ξ,η) = 0 ⟺ E(ξη) = Eξ·Eη

Independence vs Uncorrelatedness

General Rule:

ξ and η independent ⇒ uncorrelated (converse not generally true)

Counterexample:

ξ = cos θ, η = sin θ where θ ~ U[0,2π]: uncorrelated but not independent

Special Case:

For bivariate normal: uncorrelated ⟺ independent

Common Distribution Variances

Distribution	Variance Var(ξ)	Note
Bernoulli Ber(p)	$p(1-p)$	Maximum at p = 1/2
Binomial B(n,p)	$np(1-p)$	n times Bernoulli variance
Poisson P(λ)	$λ$	Mean equals variance
Geometric Geo(p)	$(1-p)/p²$	Decreases with higher success probability
Uniform U[a,b]	$(b-a)²/12$	Depends only on interval width
Exponential Exp(λ)	$1/λ²$	Variance is square of mean
Normal N(μ,σ²)	$σ²$	Direct parameter specification

Moment Theory

Unified framework for distribution characteristics using moments

Moments: Unified Framework for Distribution Characteristics

Raw Moments

m_k = E\xi^k

k-th power expectation about origin

Examples:

• m₁ = Eξ (mean)
• m₂ = Eξ² (second moment)

Central Moments

c_k = E(\xi - E\xi)^k

k-th power expectation about mean

Examples:

• c₁ = 0 (always)
• c₂ = Var ξ (variance)

Absolute Moments

M_\alpha = E|\xi|^\alpha

Property: If Mₙ < ∞, then Mₖ < ∞ for 0 < k ≤ n

Shape Characteristics from Moments

Skewness Coefficient

\gamma_1 = \frac{c_3}{c_2^{3/2}}

Measures asymmetry of distribution

Values:

• γ₁ > 0: right-skewed (positive skew)
• γ₁ < 0: left-skewed (negative skew)
• γ₁ = 0: symmetric (e.g., normal distribution)

Kurtosis Coefficient

\gamma_2 = \frac{c_4}{c_2^2} - 3

Measures 'peakedness' relative to normal distribution

Values:

• γ₂ > 0: leptokurtic (more peaked than normal)
• γ₂ < 0: platykurtic (flatter than normal)
• γ₂ = 0: mesokurtic (normal peakedness)

Characteristic Functions

Fourier transforms for probability distributions

Definition and Fundamental Properties

Definition:

f(t) = Ee^{itξ} for t ∈ ℝ, where i² = -1

Essence: Fourier transform of random variable, always exists without convergence conditions

Key Advantage: One-to-one correspondence with distribution functions

Mathematical Forms

Discrete Case

f(t) = \sum_{k=1}^{\infty} e^{itx_k} p_k

Continuous Case

f(t) = \int_{-\infty}^{\infty} e^{itx} p(x) dx

General Case

f(t) = \int_{-\infty}^{\infty} e^{itx} dF(x)

Key Properties

Fundamental Properties

Boundedness and Conjugacy

|f(t)| ≤ f(0) = 1, f(-t) = f̄(t) (conjugate)

Uniform Continuity

f(t) is uniformly continuous on ℝ

Non-negative Definiteness

ΣᵢΣⱼ f(tᵢ-tⱼ)λᵢλ̄ⱼ ≥ 0 for any tᵢ ∈ ℝ, λᵢ ∈ ℂ

Independence Property

\text{If } \xi_1, \ldots, \xi_n \text{ independent and } \eta = \sum_{i=1}^n \xi_i, \text{ then } f_\eta(t) = \prod_{i=1}^n f_{\xi_i}(t)

Significance: CF of sum = product of CFs (simplifies distribution calculations)

Moment Generation

\text{If } E\xi^n < \infty, \text{ then } f^{(k)}(0) = i^k E\xi^k \text{ for } 0 \leq k \leq n

Application: Extract moments through derivatives at origin

Linear Transformation

\text{If } \eta = a\xi + b, \text{ then } f_\eta(t) = e^{itb} f_\xi(at)

Common Characteristic Functions

Distribution	Characteristic Function f(t)	Parameters
Degenerate P(ξ=c)=1	$e^{itc}$	c: constant
Bernoulli Ber(p)	$pe^{it} + (1-p)$	p: success probability
Binomial B(n,p)	$(pe^{it} + (1-p))^n$	n: trials, p: probability
Poisson P(λ)	$e^{λ(e^{it}-1)}$	λ: rate parameter
Uniform U[a,b]	$\frac{e^{itb} - e^{ita}}{it(b-a)}$	a, b: interval endpoints
Exponential Exp(λ)	$(1 - \frac{it}{λ})^{-1}$	λ: rate parameter
Normal N(μ,σ²)	$e^{itμ - \frac{σ²t²}{2}}$	μ: mean, σ²: variance
Cauchy Cauchy(a,b)	$e^{ita - b\|t\|}$	a: location, b: scale

Fundamental Theorems

Inversion Formula

Statement:

For continuity points x₁ < x₂ of distribution function F(x):

F(x_2) - F(x_1) = \lim_{T \to \infty} \frac{1}{2\pi} \int_{-T}^T \frac{e^{-itx_1} - e^{-itx_2}}{it} f(t) dt

Significance:

Recovers distribution function from characteristic function

Uniqueness Theorem

Statement:

Characteristic function uniquely determines distribution

f_1(t) = f_2(t) \text{ for all } t \in \mathbb{R} \Rightarrow F_1(x) = F_2(x) \text{ for all } x \in \mathbb{R}

Significance:

Foundation for distribution identification via characteristic functions

Fourier Inversion

Statement:

If ∫₋∞^∞ |f(t)| dt < ∞, then ξ is continuous with density:

p(x) = \frac{1}{2\pi} \int_{-\infty}^{\infty} e^{-itx} f(t) dt

Significance:

Direct recovery of probability density from characteristic function

Multivariate Normal Distribution

Properties and conditional distributions of multivariate normal

Definition and Structure

Definition:

n-dimensional random vector ξ = (ξ₁,...,ξₙ)' has multivariate normal distribution if its characteristic function is:

f(t) = \exp\left\{ita' - \frac{1}{2}t'\Sigma t\right\}

Notation: ξ ~ N(a, Σ)

Parameters

• a = (a₁,...,aₙ)' = Eξ (mean vector)

• Σ = E[(ξ-a)(ξ-a)'] (n×n non-negative definite covariance matrix)

Fundamental Properties

Marginal Normality

Any subset of components has multivariate normal distribution

\text{If } \xi \sim N(a, \Sigma), \text{ then } \xi_{(k)} \sim N(a_{(k)}, \Sigma_{(k)})

Subscript (k) denotes corresponding subvector/submatrix

Linear Transformation Invariance

Linear combinations preserve multivariate normality

\text{If } \xi \sim N(a, \Sigma) \text{ and } \eta = C\xi \text{ for } m \times n \text{ matrix } C, \text{ then } \eta \sim N(Ca, C\Sigma C')

Independence-Uncorrelatedness Equivalence

Components are independent if and only if uncorrelated

\xi_i \text{ and } \xi_j \text{ independent} \Leftrightarrow \text{Cov}(\xi_i, \xi_j) = 0

Significance: Unique property not shared by other multivariate distributions

Conditional Distribution

Setup:

Partition ξ = (ξ₁', ξ₂')' with corresponding mean and covariance partitions:

a = (a₁', a₂')'

\Sigma = \begin{pmatrix} \Sigma_{11} & \Sigma_{12} \\ \Sigma_{21} & \Sigma_{22} \end{pmatrix}

Result:

Given ξ₁ = x₁, the conditional distribution is ξ₂|ξ₁ = x₁ ~ N(a₂·₁, Σ₂₂·₁) where:

a_{2 \cdot 1} = a_2 + \Sigma_{21}\Sigma_{11}^{-1}(x_1 - a_1)

\Sigma_{22 \cdot 1} = \Sigma_{22} - \Sigma_{21}\Sigma_{11}^{-1}\Sigma_{12}

Interpretation:

• Conditional mean: linear regression of ξ₂ on ξ₁
• Conditional variance: independent of x₁ value

Worked Examples

Step-by-step solutions to digital characteristics problems

Poisson Distribution Second Moment

Problem:

For ξ ~ P(λ), find E[ξ²]

Solution Steps:

1Step 1: Use known properties - E[ξ] = λ, Var(ξ) = λ
2Step 2: Apply variance formula - Var(ξ) = E[ξ²] - (E[ξ])²
3Step 3: Substitute values - λ = E[ξ²] - λ²
4Step 4: Solve for E[ξ²] - E[ξ²] = λ + λ² = λ(1 + λ)

Key Point:

Use the computational variance formula rather than direct calculation

Linear Combination Variance

Problem:

If ξ and η are independent with E[ξ]=2, Var(ξ)=1, E[η]=3, Var(η)=2, find Var(2ξ - η + 1)

Solution Steps:

1Step 1: Apply variance properties - Var(aX) = a²Var(X), Var(X+c) = Var(X)
2Step 2: Use independence - Var(X+Y) = Var(X) + Var(Y) for independent X,Y
3Step 3: Calculate - Var(2ξ - η + 1) = Var(2ξ) + Var(-η) + Var(1)
4Step 4: Substitute - = 2²Var(ξ) + (-1)²Var(η) + 0 = 4(1) + 1(2) = 6

Key Point:

Constants don't affect variance; independence allows additive variance rule

Characteristic Function Identification

Problem:

If characteristic function is f(t) = e^{2it - 3t²/2}, identify the distribution

Solution Steps:

1Step 1: Compare with normal CF template - f(t) = e^{itμ - σ²t²/2}
2Step 2: Match coefficients - itμ = 2it, so μ = 2
3Step 3: Match variance term - σ²t²/2 = 3t²/2, so σ² = 3
4Step 4: Conclude - ξ ~ N(2, 3)

Key Point:

Normal distribution characteristic function has distinctive exponential quadratic form

Practice Quiz

Questions

Correct

Accuracy

What is the correct formula for mathematical expectation of a continuous random variable?

Not attempted

Which property of expectation is INCORRECT?

Not attempted

The computational formula for variance is:

Not attempted

X

and

Y

are uncorrelated, what can we conclude?

Not attempted

The correlation coefficient

\rho_{XY}

satisfies:

Not attempted

What is the characteristic function of

X \sim N(\mu, \sigma^2)

Not attempted

\text{Var}(X) = 0

, what can we conclude?

Not attempted

The skewness coefficient

\gamma_1 = 0

indicates:

Not attempted

What is the key advantage of characteristic functions?

Not attempted

For independent

X

and

Y

with CFs

f_X(t)

and

f_Y(t)

, the CF of

X + Y

is:

Not attempted

Frequently Asked Questions

What is the difference between expectation and variance?

Expectation E[X] measures the center or average value of a distribution (first moment), while variance Var(X) = E[(X-μ)²] measures dispersion or spread around the mean (second central moment). A distribution can have the same mean but different variances, indicating different levels of uncertainty.

Why do we need characteristic functions when we have moment generating functions?

Characteristic functions f(t) = E[e^{itX}] always exist for any distribution (bounded by 1), while moment generating functions M(t) = E[e^{tX}] may not exist if moments don't exist (e.g., Cauchy distribution). CF uniquely determines the distribution and is especially powerful for proving limit theorems in probability theory.

What does it mean when correlation coefficient is zero?

ρ_{XY} = 0 means X and Y are uncorrelated: Cov(X,Y) = 0 or E[XY] = E[X]E[Y]. This implies no linear relationship, but doesn't rule out nonlinear dependence. Independence always implies uncorrelatedness, but uncorrelated doesn't imply independence (except for bivariate normal distributions).

How do I interpret skewness and kurtosis?

Skewness (γ₁) measures asymmetry: γ₁ > 0 (right-skewed, long right tail), γ₁ < 0 (left-skewed), γ₁ = 0 (symmetric). Kurtosis (γ₂) measures tail heaviness relative to normal: γ₂ > 0 (heavy tails, more outliers), γ₂ < 0 (light tails), γ₂ = 0 (normal-like tails). These help identify appropriate distributions for modeling.

Can variance be negative?

No, variance is always non-negative: Var(X) = E[(X-μ)²] ≥ 0 since it's an expectation of a squared term. Var(X) = 0 if and only if X is constant with probability 1. Standard deviation σ = √Var(X) is also non-negative and has the same units as X.