Random Variables & Distributions | Probability Theory Learning

Learning Objectives

Essential concepts you'll master in random variables and distributions

Understand the definition and classification of random variables
Master distribution functions and their properties
Learn common discrete and continuous distributions
Analyze multidimensional random variables and independence
Apply functions of random variables and sampling distributions

Random Variable Fundamentals

Understanding the mathematical foundation and classification of random variables

Definition & Classification

Random Variable Definition

Definition:

$Let (Ω, ℱ, P) be a probability space. A single-valued real function ξ(ω) is called a random variable if: for any Borel set B, ξ⁻¹(B) = {ω : ξ(ω) ∈ B} ∈ ℱ$

Key Point:

Random variables map sample points ω from the sample space Ω to real numbers, achieving 'quantification of random experiment results'

Examples:

• Coin toss: Ω = {Head, Tail}, define ξ(Head) = 1, ξ(Tail) = 0
• Dice roll: Ω = {1,2,3,4,5,6}, ξ(ω) = ω (identity mapping)
• Lifetime of a bulb: ξ(ω) = actual lifetime in hours

Discrete Random Variables

Definition:

$A random variable that takes on a finite or countably infinite number of values$

Characterization:

$Described by distribution sequence (probability mass function): P(ξ = xᵢ) = p(xᵢ)$

Properties:

$Non-negativity: p(xᵢ) ≥ 0 for all i$
$Normalization: Σp(xᵢ) = 1$

Examples:

• Number of heads in coin tosses
• Number of defective items in a batch
• Number of customers arriving per hour

Continuous Random Variables

Definition:

$A random variable whose values fill some interval, with a non-negative integrable probability density function p(x)$

Characterization:

$Described by probability density function (PDF): F(x) = ∫₋∞ˣ p(t)dt$

Properties:

$Non-negativity: p(x) ≥ 0$
$Normalization: ∫₋∞^∞ p(x)dx = 1$
$Point probability: P(ξ = c) = 0 for any real number c$

Key Insight:

$For continuous random variables: P(a < ξ ≤ b) = P(a ≤ ξ ≤ b) = P(a ≤ ξ < b) = P(a < ξ < b) = ∫ₐᵇ p(x)dx$

Distribution Functions

The fundamental tool for describing probability distributions

Cumulative Distribution Function (CDF)

Definition:

$For any random variable ξ, F(x) = P(ξ ≤ x) for x ∈ ℝ is called the distribution function of ξ$

Essential Properties

Monotonicity

$If a ≤ b, then F(a) ≤ F(b)$

The probability never decreases as we move right on the real line

Limit Properties

$F(-∞) = 0, F(+∞) = 1$

Probability approaches 0 at negative infinity and 1 at positive infinity

Right Continuity

$F(x+0) = F(x) for all x$

The function is continuous from the right at every point

Relationships with PMF/PDF

Discrete Case

F(x) = Σ_{xᵢ≤x} p(xᵢ)

Step function with jumps at discrete values

Continuous Case

F(x) = ∫₋∞ˣ p(t)dt

Sampling without replacement (defective items, card draws)

Key Properties:

$E[X] = n(M/N)$
$Approximates B(n,M/N) when N → ∞$

Common Continuous Distributions

Fundamental continuous probability distributions and their properties

Uniform Distribution

U(a,b)

Parameters:

$a, b ∈ ℝ, a < b$

Application:

Equal probability over an interval (random timing, rounding errors)

Key Properties:

$E[X] = (a+b)/2$
$Var(X) = (b-a)²/12$
$P(c < X < c+l) = l/(b-a) for any c$

Probability Density Function:

p(x) = \begin{cases}\frac{1}{b-a}, & a < x < b \\ 0, & \text{otherwise}\end{cases}

Normal Distribution

N(μ,σ²)

Parameters:

$μ ∈ ℝ (mean), σ² > 0 (variance)$

Application:

Natural phenomena (heights, measurement errors, test scores)

Key Properties:

$E[X] = μ$
$Var(X) = σ²$
$Standardization: (X-μ)/σ ~ N(0,1)$
$Additive property$

Probability Density Function:

p(x) = \frac{1}{\sqrt{2\pi}\sigma}e^{-\frac{(x-\mu)^2}{2\sigma^2}}

Exponential Distribution

Exp(λ)

Parameters:

$λ > 0 (rate parameter)$

Application:

Lifetime modeling, waiting times between events

Key Properties:

$E[X] = 1/λ$
$Var(X) = 1/λ²$
$Memoryless: P(X > s+t|X > s) = P(X > t)$

Probability Density Function:

p(x) = \begin{cases}\lambda e^{-\lambda x}, & x > 0 \\ 0, & x \leq 0\end{cases}

Gamma Distribution

Γ(α,β)

Parameters:

$α > 0 (shape), β > 0 (rate)$

Application:

Sum of independent exponential variables, reliability modeling

Key Properties:

$E[X] = α/β$
$Var(X) = α/β²$
$Additive: Γ(α₁,β) + Γ(α₂,β) = Γ(α₁+α₂,β)$

Probability Density Function:

p(x) = \begin{cases}\frac{\beta^\alpha}{\Gamma(\alpha} />x^{\alpha-1}e^{-\beta x}, & x > 0 \\ 0, & x \leq 0\end{cases}

Chi-squared Distribution

χ²(n)

Parameters:

$n ∈ ℕ (degrees of freedom)$

Application:

Sum of squares of independent standard normal variables

Key Properties:

$E[X] = n$
$Var(X) = 2n$
$Additive: χ²(n₁) + χ²(n₂) = χ²(n₁+n₂)$

Probability Density Function:

p(x) = \begin{cases}\frac{(1/2)^{n/2}}{\Gamma(n/2} />x^{n/2-1}e^{-x/2}, & x > 0 \\ 0, & x \leq 0\end{cases}

Multidimensional Random Variables

Joint distributions, independence, and conditional distributions

Joint Distributions

Discrete Case

$Joint PMF: P(X = xᵢ, Y = yⱼ) = pᵢⱼ with Σᵢ Σⱼ pᵢⱼ = 1$

Marginals: $P(X = xᵢ) = Σⱼ pᵢⱼ, P(Y = yⱼ) = Σᵢ pᵢⱼ$

Continuous Case

$Joint PDF: p(x,y) with ∫∫ p(x,y)dxdy = 1$

Marginals: $p_X(x) = ∫ p(x,y)dy, p_Y(y) = ∫ p(x,y)dx$

Independence

Definition:

$Random variables X and Y are independent if F(x,y) = F_X(x)F_Y(y) for all x,y$

Equivalent Conditions:

$Discrete: pᵢⱼ = pᵢ·p·ⱼ for all i,j$
$Continuous: p(x,y) = p_X(x)p_Y(y) almost everywhere$

Example:

$In bivariate normal distribution N(μ₁,μ₂,σ₁²,σ₂²,ρ), X and Y are independent iff ρ = 0$

Conditional Distributions

Discrete:

$P(Y = yⱼ|X = xᵢ) = pᵢⱼ/pᵢ· (when pᵢ· > 0)$

Continuous:

$p_{Y|X}(y|x) = p(x,y)/p_X(x) (when p_X(x) > 0)$

Sampling Distributions

Essential distributions for statistical inference

Chi-squared Distribution

Definition:

$If X₁, X₂, ..., Xₙ ~ N(0,1) independently, then Y = Σᵢ₌₁ⁿ Xᵢ² ~ χ²(n)$

Properties:

$Degrees of freedom: n$
$Additive property$
$E[Y] = n, Var(Y) = 2n$

Statistical Applications:

• Goodness of fit tests
• Variance testing
• Independence testing

t-Distribution

Definition:

$If X ~ N(0,1), Y ~ χ²(n) independently, then T = X/√(Y/n) ~ t(n)$

Properties:

$Degrees of freedom: n$
$Symmetric about 0$
$Approaches N(0,1) as n → ∞$

Statistical Applications:

• Small sample inference
• Confidence intervals for mean
• Hypothesis testing

F-Distribution

Definition:

$If X ~ χ²(m), Y ~ χ²(n) independently, then F = (X/m)/(Y/n) ~ F(m,n)$

Properties:

$Two degrees of freedom: m, n$
$F(m,n) = 1/F(n,m)$
$If T ~ t(n), then T² ~ F(1,n)$

Statistical Applications:

• Variance ratio testing
• ANOVA
• Regression analysis

Worked Examples

Step-by-step solutions to typical random variable problems

Binomial Probability Calculation

Problem:

In 10 independent trials with success probability 0.3, find P(X = 4)

Solution Steps:

1 $Step 1: Identify distribution - X ~ B(10, 0.3)$
2 $Step 2: Apply PMF formula - P(X = k) = C₁₀ᵏ (0.3)ᵏ (0.7)¹⁰⁻ᵏ$
3 $Step 3: Calculate combination - C₁₀⁴ = 10!/(4!×6!) = 210$
4 $Step 4: Compute probability - P(X = 4) = 210 × (0.3)⁴ × (0.7)⁶$
5 $Step 5: Final calculation - P(X = 4) = 210 × 0.0081 × 0.1176 ≈ 0.200$

Key Point:

Binomial distribution applies to fixed number of independent trials with constant success probability

Normal Distribution Standardization

Problem:

If X ~ N(100, 16), find P(96 < X < 108)

Solution Steps:

1 $Step 1: Identify parameters - μ = 100, σ² = 16, so σ = 4$
2 $Step 2: Standardize - Z = (X - 100)/4 ~ N(0,1)$
3 $Step 3: Transform bounds - P(96 < X < 108) = P(-1 < Z < 2)$
4 $Step 4: Use standard normal table - P(-1 < Z < 2) = Φ(2) - Φ(-1)$
5 $Step 5: Calculate - P(-1 < Z < 2) = 0.9772 - 0.1587 = 0.8185$

Key Point:

Standardization allows us to use the standard normal table for any normal distribution

Exponential Distribution Memory Property

Problem:

For X ~ Exp(λ), prove that P(X > s+t|X > s) = P(X > t)

Solution Steps:

1 $Step 1: Write conditional probability - P(X > s+t|X > s) = P(X > s+t, X > s)/P(X > s)$
2 $Step 2: Simplify numerator - P(X > s+t, X > s) = P(X > s+t)$
3 $Step 3: Use exponential CDF - P(X > x) = e⁻λˣ for x > 0$
4 $Step 4: Substitute - P(X > s+t|X > s) = e⁻λ(s+t)/e⁻λs = e⁻λt$
5 $Step 5: Conclude - P(X > s+t|X > s) = e⁻λt = P(X > t)$

Key Point:

Memoryless property means past waiting time doesn't affect future waiting time

Study Tips & Best Practices

Problem-Solving Strategy:

• Identify the distribution: Look for key characteristics (discrete/continuous, parameters)
• Choose appropriate formulas: PMF for discrete, PDF for continuous, CDF for probabilities
• Check boundary conditions: Ensure probabilities sum/integrate to 1
• Use standardization: Transform to standard forms when possible

Common Mistakes to Avoid:

• Confusing PMF and PDF: Remember P(X=x)=0 for continuous variables
• Forgetting independence: Joint PMF/PDF must factor for independent variables
• Wrong distribution choice: Match problem context to appropriate distribution
• Parameter interpretation: Understand what each parameter represents physically

Explore Further

Additional resources to deepen your understanding of random variables

Practice ProblemsTest your understanding Formula ReferenceDistribution formulas PrerequisitesProbability foundations Next CourseStatistical inference