MathIsimple
← Back to Probability Theory
Core Topic

Random Variables & Distributions

Master the mathematical foundation of probability through random variables. Learn discrete and continuous distributions, joint distributions, independence, and sampling distributions essential for statistical inference.

Intermediate Level
12 Lessons
8-10 Hours
Learning Objectives
Essential concepts you'll master in random variables and distributions
  • Understand the definition and classification of random variables
  • Master distribution functions and their properties
  • Learn common discrete and continuous distributions
  • Analyze multidimensional random variables and independence
  • Apply functions of random variables and sampling distributions

Random Variable Fundamentals

Understanding the mathematical foundation and classification of random variables

Definition & Classification

Random Variable Definition

Definition:

Let (Ω, ℱ, P) be a probability space. A single-valued real function ξ(ω) is called a random variable if: for any Borel set B, ξ⁻¹(B) = {ω : ξ(ω) ∈ B} ∈ ℱ

Key Point:

Random variables map sample points ω from the sample space Ω to real numbers, achieving 'quantification of random experiment results'

Examples:
  • Coin toss: Ω = {Head, Tail}, define ξ(Head) = 1, ξ(Tail) = 0
  • Dice roll: Ω = {1,2,3,4,5,6}, ξ(ω) = ω (identity mapping)
  • Lifetime of a bulb: ξ(ω) = actual lifetime in hours

Discrete Random Variables

Definition:

A random variable that takes on a finite or countably infinite number of values

Characterization:

Described by distribution sequence (probability mass function): P(ξ = xᵢ) = p(xᵢ)

Properties:
  • Non-negativity: p(xᵢ) ≥ 0 for all i
  • Normalization: Σp(xᵢ) = 1
Examples:
  • Number of heads in coin tosses
  • Number of defective items in a batch
  • Number of customers arriving per hour

Continuous Random Variables

Definition:

A random variable whose values fill some interval, with a non-negative integrable probability density function p(x)

Characterization:

Described by probability density function (PDF): F(x) = ∫₋∞ˣ p(t)dt

Properties:
  • Non-negativity: p(x) ≥ 0
  • Normalization: ∫₋∞^∞ p(x)dx = 1
  • Point probability: P(ξ = c) = 0 for any real number c
Key Insight:

For continuous random variables: P(a < ξ ≤ b) = P(a ≤ ξ ≤ b) = P(a ≤ ξ < b) = P(a < ξ < b) = ∫ₐᵇ p(x)dx

Distribution Functions

The fundamental tool for describing probability distributions

Cumulative Distribution Function (CDF)
Definition:

For any random variable ξ, F(x) = P(ξ ≤ x) for x ∈ ℝ is called the distribution function of ξ

Essential Properties

Monotonicity

If a ≤ b, then F(a) ≤ F(b)

The probability never decreases as we move right on the real line

Limit Properties

F(-∞) = 0, F(+∞) = 1

Probability approaches 0 at negative infinity and 1 at positive infinity

Right Continuity

F(x+0) = F(x) for all x

The function is continuous from the right at every point

Relationships with PMF/PDF

Discrete Case
F(x)=Σxixp(xi)F(x) = Σ_{xᵢ≤x} p(xᵢ)

Step function with jumps at discrete values

Continuous Case
F(x)=xp(t)dtF(x) = ∫₋∞ˣ p(t)dt

Smooth function with F'(x) = p(x) at continuity points

Common Discrete Distributions

Essential discrete probability distributions and their applications

Bernoulli Distribution
Ber(p)
Parameters:

p ∈ (0,1) (success probability)

Probability Mass Function:
P(X=k)=pk(1p)1k,k0,1P(X = k) = p^k(1-p)^{1-k}, k ∈ {0,1}
Application:

Single trial success/failure experiment

Key Properties:
  • E[X] = p
  • Var(X) = p(1-p)
  • Special case of binomial with n=1
Binomial Distribution
B(n,p)
Parameters:

n ∈ ℕ (trials), p ∈ (0,1) (success probability)

Probability Mass Function:
P(X=k)=Cnkpk(1p)nk,k=0,1,...,nP(X = k) = C_n^k p^k(1-p)^{n-k}, k = 0,1,...,n
Application:

Number of successes in n independent Bernoulli trials

Key Properties:
  • E[X] = np
  • Var(X) = np(1-p)
  • Additive: B(n₁,p) + B(n₂,p) = B(n₁+n₂,p)
Poisson Distribution
P(λ)
Parameters:

λ > 0 (rate parameter)

Probability Mass Function:
P(X=k)=λkeλk!,k=0,1,2,...P(X = k) = \frac{\lambda^k e^{-\lambda}}{k!}, k = 0,1,2,...
Application:

Rare events occurrence (defects, arrivals, accidents)

Key Properties:
  • E[X] = Var(X) = λ
  • Additive: P(λ₁) + P(λ₂) = P(λ₁+λ₂)
  • Approximates binomial when n large, p small, np = λ
Geometric Distribution
Geo(p)
Parameters:

p ∈ (0,1) (success probability)

Probability Mass Function:
P(X=k)=(1p)k1p,k=1,2,3,...P(X = k) = (1-p)^{k-1}p, k = 1,2,3,...
Application:

Number of trials until first success

Key Properties:
  • E[X] = 1/p
  • Var(X) = (1-p)/p²
  • Memoryless: P(X > s+t|X > s) = P(X > t)
Hypergeometric Distribution
H(n,M,N)
Parameters:

n (sample size), M (success states), N (population size)

Probability Mass Function:
P(X=k)=CMkCNMnkCNnP(X = k) = \frac{C_M^k C_{N-M}^{n-k}}{C_N^n}
Application:

Sampling without replacement (defective items, card draws)

Key Properties:
  • E[X] = n(M/N)
  • Approximates B(n,M/N) when N → ∞

Common Continuous Distributions

Fundamental continuous probability distributions and their properties

Uniform Distribution
U(a,b)
Parameters:

a, b ∈ ℝ, a < b

Application:

Equal probability over an interval (random timing, rounding errors)

Key Properties:
  • E[X] = (a+b)/2
  • Var(X) = (b-a)²/12
  • P(c < X < c+l) = l/(b-a) for any c
Probability Density Function:
p(x)={1ba,a<x<b0,otherwisep(x) = \begin{cases}\frac{1}{b-a}, & a < x < b \\ 0, & \text{otherwise}\end{cases}
Normal Distribution
N(μ,σ²)
Parameters:

μ ∈ ℝ (mean), σ² > 0 (variance)

Application:

Natural phenomena (heights, measurement errors, test scores)

Key Properties:
  • E[X] = μ
  • Var(X) = σ²
  • Standardization: (X-μ)/σ ~ N(0,1)
  • Additive property
Probability Density Function:
p(x)=12πσe(xμ)22σ2p(x) = \frac{1}{\sqrt{2\pi}\sigma}e^{-\frac{(x-\mu)^2}{2\sigma^2}}
Exponential Distribution
Exp(λ)
Parameters:

λ > 0 (rate parameter)

Application:

Lifetime modeling, waiting times between events

Key Properties:
  • E[X] = 1/λ
  • Var(X) = 1/λ²
  • Memoryless: P(X > s+t|X > s) = P(X > t)
Probability Density Function:
p(x)={λeλx,x>00,x0p(x) = \begin{cases}\lambda e^{-\lambda x}, & x > 0 \\ 0, & x \leq 0\end{cases}
Gamma Distribution
Γ(α,β)
Parameters:

α > 0 (shape), β > 0 (rate)

Application:

Sum of independent exponential variables, reliability modeling

Key Properties:
  • E[X] = α/β
  • Var(X) = α/β²
  • Additive: Γ(α₁,β) + Γ(α₂,β) = Γ(α₁+α₂,β)
Probability Density Function:
p(x)={βαΓ(α)xα1eβx,x>00,x0p(x) = \begin{cases}\frac{\beta^\alpha}{\Gamma(\alpha)}x^{\alpha-1}e^{-\beta x}, & x > 0 \\ 0, & x \leq 0\end{cases}
Chi-squared Distribution
χ²(n)
Parameters:

n ∈ ℕ (degrees of freedom)

Application:

Sum of squares of independent standard normal variables

Key Properties:
  • E[X] = n
  • Var(X) = 2n
  • Additive: χ²(n₁) + χ²(n₂) = χ²(n₁+n₂)
Probability Density Function:
p(x)={(1/2)n/2Γ(n/2)xn/21ex/2,x>00,x0p(x) = \begin{cases}\frac{(1/2)^{n/2}}{\Gamma(n/2)}x^{n/2-1}e^{-x/2}, & x > 0 \\ 0, & x \leq 0\end{cases}

Multidimensional Random Variables

Joint distributions, independence, and conditional distributions

Joint Distributions
Discrete Case

Joint PMF: P(X = xᵢ, Y = yⱼ) = pᵢⱼ with Σᵢ Σⱼ pᵢⱼ = 1

Marginals: P(X = xᵢ) = Σⱼ pᵢⱼ, P(Y = yⱼ) = Σᵢ pᵢⱼ

Continuous Case

Joint PDF: p(x,y) with ∫∫ p(x,y)dxdy = 1

Marginals: p_X(x) = ∫ p(x,y)dy, p_Y(y) = ∫ p(x,y)dx

Independence
Definition:

Random variables X and Y are independent if F(x,y) = F_X(x)F_Y(y) for all x,y

Equivalent Conditions:
  • Discrete: pᵢⱼ = pᵢ·p·ⱼ for all i,j
  • Continuous: p(x,y) = p_X(x)p_Y(y) almost everywhere
Example:

In bivariate normal distribution N(μ₁,μ₂,σ₁²,σ₂²,ρ), X and Y are independent iff ρ = 0

Conditional Distributions
Discrete:

P(Y = yⱼ|X = xᵢ) = pᵢⱼ/pᵢ· (when pᵢ· > 0)

Continuous:

p_{Y|X}(y|x) = p(x,y)/p_X(x) (when p_X(x) > 0)

Sampling Distributions

Essential distributions for statistical inference

Chi-squared Distribution
Definition:

If X₁, X₂, ..., Xₙ ~ N(0,1) independently, then Y = Σᵢ₌₁ⁿ Xᵢ² ~ χ²(n)

Properties:
  • Degrees of freedom: n
  • Additive property
  • E[Y] = n, Var(Y) = 2n
Statistical Applications:
  • Goodness of fit tests
  • Variance testing
  • Independence testing
t-Distribution
Definition:

If X ~ N(0,1), Y ~ χ²(n) independently, then T = X/√(Y/n) ~ t(n)

Properties:
  • Degrees of freedom: n
  • Symmetric about 0
  • Approaches N(0,1) as n → ∞
Statistical Applications:
  • Small sample inference
  • Confidence intervals for mean
  • Hypothesis testing
F-Distribution
Definition:

If X ~ χ²(m), Y ~ χ²(n) independently, then F = (X/m)/(Y/n) ~ F(m,n)

Properties:
  • Two degrees of freedom: m, n
  • F(m,n) = 1/F(n,m)
  • If T ~ t(n), then T² ~ F(1,n)
Statistical Applications:
  • Variance ratio testing
  • ANOVA
  • Regression analysis

Worked Examples

Step-by-step solutions to typical random variable problems

Binomial Probability Calculation
Problem:

In 10 independent trials with success probability 0.3, find P(X = 4)

Solution Steps:
  1. 1Step 1: Identify distribution - X ~ B(10, 0.3)
  2. 2Step 2: Apply PMF formula - P(X = k) = C₁₀ᵏ (0.3)ᵏ (0.7)¹⁰⁻ᵏ
  3. 3Step 3: Calculate combination - C₁₀⁴ = 10!/(4!×6!) = 210
  4. 4Step 4: Compute probability - P(X = 4) = 210 × (0.3)⁴ × (0.7)⁶
  5. 5Step 5: Final calculation - P(X = 4) = 210 × 0.0081 × 0.1176 ≈ 0.200
Key Point:

Binomial distribution applies to fixed number of independent trials with constant success probability

Normal Distribution Standardization
Problem:

If X ~ N(100, 16), find P(96 < X < 108)

Solution Steps:
  1. 1Step 1: Identify parameters - μ = 100, σ² = 16, so σ = 4
  2. 2Step 2: Standardize - Z = (X - 100)/4 ~ N(0,1)
  3. 3Step 3: Transform bounds - P(96 < X < 108) = P(-1 < Z < 2)
  4. 4Step 4: Use standard normal table - P(-1 < Z < 2) = Φ(2) - Φ(-1)
  5. 5Step 5: Calculate - P(-1 < Z < 2) = 0.9772 - 0.1587 = 0.8185
Key Point:

Standardization allows us to use the standard normal table for any normal distribution

Exponential Distribution Memory Property
Problem:

For X ~ Exp(λ), prove that P(X > s+t|X > s) = P(X > t)

Solution Steps:
  1. 1Step 1: Write conditional probability - P(X > s+t|X > s) = P(X > s+t, X > s)/P(X > s)
  2. 2Step 2: Simplify numerator - P(X > s+t, X > s) = P(X > s+t)
  3. 3Step 3: Use exponential CDF - P(X > x) = e⁻λˣ for x > 0
  4. 4Step 4: Substitute - P(X > s+t|X > s) = e⁻λ(s+t)/e⁻λs = e⁻λt
  5. 5Step 5: Conclude - P(X > s+t|X > s) = e⁻λt = P(X > t)
Key Point:

Memoryless property means past waiting time doesn't affect future waiting time

Study Tips & Best Practices

Problem-Solving Strategy:

  • Identify the distribution: Look for key characteristics (discrete/continuous, parameters)
  • Choose appropriate formulas: PMF for discrete, PDF for continuous, CDF for probabilities
  • Check boundary conditions: Ensure probabilities sum/integrate to 1
  • Use standardization: Transform to standard forms when possible

Common Mistakes to Avoid:

  • Confusing PMF and PDF: Remember P(X=x)=0 for continuous variables
  • Forgetting independence: Joint PMF/PDF must factor for independent variables
  • Wrong distribution choice: Match problem context to appropriate distribution
  • Parameter interpretation: Understand what each parameter represents physically

Practice and apply what you've learned