The fundamental measure of central tendency for random variables
Mathematical expectation is the 'weighted average' of a random variable's values, where weights are corresponding probabilities, reflecting the central tendency of the random variable.
Must satisfy 'absolute convergence' condition to avoid order dependency in summation/integration:
provided \sum_{k=1}^{\infty} |x_k| p_k < \infty
provided \int_{-\infty}^{\infty} |x| p(x) dx < \infty
Stieltjes integral unifies discrete (summation) and continuous (integration) cases
If a ≤ ξ ≤ b, then a ≤ Eξ ≤ b; if ξ ≤ η, then Eξ ≤ Eη
Importance: Preserves ordering relationships
Linear combinations preserve expectation (independence not required)
Importance: Most fundamental property for calculations
Product of independent variables equals product of expectations
Condition: for ε > 0
Application: Bounds probability using first moment
Condition: equality iff P(η = t₀ξ) = 1 for some constant t₀
Application: Fundamental inequality connecting second moments
Condition: strict convexity: equality iff P(ξ = Eξ) = 1
Application: Relates function of expectation to expectation of function
| Distribution | Parameters | Expectation E[ξ] |
|---|---|---|
| Bernoulli Ber(p) | p: success probability | |
| Binomial B(n,p) | n: trials, p: success probability | |
| Poisson P(λ) | λ: rate parameter | |
| Geometric Geo(p) | p: success probability | |
| Uniform U[a,b] | a, b: interval endpoints | |
| Exponential Exp(λ) | λ: rate parameter | |
| Normal N(μ,σ²) | μ: mean, σ²: variance |
Measuring dispersion and joint variability between random variables
Variance measures how much a random variable deviates from its mean, defined as the expectation of squared deviation:
The computational formula E[ξ²] - (E[ξ])² avoids direct calculation of deviations
σ(ξ) = √Var(ξ), having the same units as the random variable
More interpretable than variance due to matching units
Measures how two variables 'jointly deviate' from their respective means
Purpose: Eliminates scale effects from covariance
ξ and η independent ⇒ uncorrelated (converse not generally true)
ξ = cos θ, η = sin θ where θ ~ U[0,2π]: uncorrelated but not independent
For bivariate normal: uncorrelated ⟺ independent
| Distribution | Variance Var(ξ) | Note |
|---|---|---|
| Bernoulli Ber(p) | Maximum at p = 1/2 | |
| Binomial B(n,p) | n times Bernoulli variance | |
| Poisson P(λ) | Mean equals variance | |
| Geometric Geo(p) | Decreases with higher success probability | |
| Uniform U[a,b] | Depends only on interval width | |
| Exponential Exp(λ) | Variance is square of mean | |
| Normal N(μ,σ²) | Direct parameter specification |
Unified framework for distribution characteristics using moments
k-th power expectation about origin
k-th power expectation about mean
Property: If Mₙ < ∞, then Mₖ < ∞ for 0 < k ≤ n
Measures asymmetry of distribution
Measures 'peakedness' relative to normal distribution
Fourier transforms for probability distributions
Essence: Fourier transform of random variable, always exists without convergence conditions
Key Advantage: One-to-one correspondence with distribution functions
|f(t)| ≤ f(0) = 1, f(-t) = f̄(t) (conjugate)
f(t) is uniformly continuous on ℝ
ΣᵢΣⱼ f(tᵢ-tⱼ)λᵢλ̄ⱼ ≥ 0 for any tᵢ ∈ ℝ, λᵢ ∈ ℂ
Significance: CF of sum = product of CFs (simplifies distribution calculations)
Application: Extract moments through derivatives at origin
| Distribution | Characteristic Function f(t) | Parameters |
|---|---|---|
| Degenerate P(ξ=c)=1 | c: constant | |
| Bernoulli Ber(p) | p: success probability | |
| Binomial B(n,p) | n: trials, p: probability | |
| Poisson P(λ) | λ: rate parameter | |
| Uniform U[a,b] | a, b: interval endpoints | |
| Exponential Exp(λ) | λ: rate parameter | |
| Normal N(μ,σ²) | μ: mean, σ²: variance | |
| Cauchy Cauchy(a,b) | a: location, b: scale |
For continuity points x₁ < x₂ of distribution function F(x):
Recovers distribution function from characteristic function
Characteristic function uniquely determines distribution
Foundation for distribution identification via characteristic functions
If ∫₋∞^∞ |f(t)| dt < ∞, then ξ is continuous with density:
Direct recovery of probability density from characteristic function
Properties and conditional distributions of multivariate normal
n-dimensional random vector ξ = (ξ₁,...,ξₙ)' has multivariate normal distribution if its characteristic function is:
Notation: ξ ~ N(a, Σ)
Any subset of components has multivariate normal distribution
Subscript (k) denotes corresponding subvector/submatrix
Linear combinations preserve multivariate normality
Components are independent if and only if uncorrelated
Significance: Unique property not shared by other multivariate distributions
Partition ξ = (ξ₁', ξ₂')' with corresponding mean and covariance partitions:
Given ξ₁ = x₁, the conditional distribution is ξ₂|ξ₁ = x₁ ~ N(a₂·₁, Σ₂₂·₁) where:
Step-by-step solutions to digital characteristics problems
For ξ ~ P(λ), find E[ξ²]
Use the computational variance formula rather than direct calculation
If ξ and η are independent with E[ξ]=2, Var(ξ)=1, E[η]=3, Var(η)=2, find Var(2ξ - η + 1)
Constants don't affect variance; independence allows additive variance rule
If characteristic function is f(t) = e^{2it - 3t²/2}, identify the distribution
Normal distribution characteristic function has distinctive exponential quadratic form
Expectation E[X] measures the center or average value of a distribution (first moment), while variance Var(X) = E[(X-μ)²] measures dispersion or spread around the mean (second central moment). A distribution can have the same mean but different variances, indicating different levels of uncertainty.
Characteristic functions f(t) = E[e^{itX}] always exist for any distribution (bounded by 1), while moment generating functions M(t) = E[e^{tX}] may not exist if moments don't exist (e.g., Cauchy distribution). CF uniquely determines the distribution and is especially powerful for proving limit theorems in probability theory.
ρ_{XY} = 0 means X and Y are uncorrelated: Cov(X,Y) = 0 or E[XY] = E[X]E[Y]. This implies no linear relationship, but doesn't rule out nonlinear dependence. Independence always implies uncorrelatedness, but uncorrelated doesn't imply independence (except for bivariate normal distributions).
Skewness (γ₁) measures asymmetry: γ₁ > 0 (right-skewed, long right tail), γ₁ < 0 (left-skewed), γ₁ = 0 (symmetric). Kurtosis (γ₂) measures tail heaviness relative to normal: γ₂ > 0 (heavy tails, more outliers), γ₂ < 0 (light tails), γ₂ = 0 (normal-like tails). These help identify appropriate distributions for modeling.
No, variance is always non-negative: Var(X) = E[(X-μ)²] ≥ 0 since it's an expectation of a squared term. Var(X) = 0 if and only if X is constant with probability 1. Standard deviation σ = √Var(X) is also non-negative and has the same units as X.