MathIsimple
← Back to Probability Theory
Core Topic

Random Variables & Distributions

Master the mathematical foundation of probability through random variables. Learn discrete and continuous distributions, joint distributions, independence, and sampling distributions essential for statistical inference.

Intermediate Level
12 Lessons
8-10 Hours
Learning Objectives
Essential concepts you'll master in random variables and distributions
  • Understand the definition and classification of random variables
  • Master distribution functions and their properties
  • Learn common discrete and continuous distributions
  • Analyze multidimensional random variables and independence
  • Apply functions of random variables and sampling distributions

Random Variable Fundamentals

Understanding the mathematical foundation and classification of random variables

Definition & Classification

Random Variable Definition

Definition:

Let(Ω,F,P)beaprobabilityspace.Asinglevaluedrealfunctionξ(ω)iscalledarandomvariableif:foranyBorelsetB,ξ1(B)=ω:ξ(ω)BFLet (Ω, ℱ, P) be a probability space. A single-valued real function ξ(ω) is called a random variable if: for any Borel set B, ξ⁻¹(B) = {ω : ξ(ω) ∈ B} ∈ ℱ

Key Point:

Random variables map sample points ω from the sample space Ω to real numbers, achieving 'quantification of random experiment results'

Examples:
  • Coin toss: Ω = {Head, Tail}, define ξ(Head) = 1, ξ(Tail) = 0
  • Dice roll: Ω = {1,2,3,4,5,6}, ξ(ω) = ω (identity mapping)
  • Lifetime of a bulb: ξ(ω) = actual lifetime in hours

Discrete Random Variables

Definition:

ArandomvariablethattakesonafiniteorcountablyinfinitenumberofvaluesA random variable that takes on a finite or countably infinite number of values

Characterization:

Describedbydistributionsequence(probabilitymassfunction):P(ξ=xi)=p(xi)Described by distribution sequence (probability mass function): P(ξ = xᵢ) = p(xᵢ)

Properties:
  • Nonnegativity:p(xi)0foralliNon-negativity: p(xᵢ) ≥ 0 for all i
  • Normalization:Σp(xi)=1Normalization: Σp(xᵢ) = 1
Examples:
  • Number of heads in coin tosses
  • Number of defective items in a batch
  • Number of customers arriving per hour

Continuous Random Variables

Definition:

Arandomvariablewhosevaluesfillsomeinterval,withanonnegativeintegrableprobabilitydensityfunctionp(x)A random variable whose values fill some interval, with a non-negative integrable probability density function p(x)

Characterization:

Describedbyprobabilitydensityfunction(PDF):F(x)=xp(t)dtDescribed by probability density function (PDF): F(x) = ∫₋∞ˣ p(t)dt

Properties:
  • Nonnegativity:p(x)0Non-negativity: p(x) ≥ 0
  • Normalization:p(x)dx=1Normalization: ∫₋∞^∞ p(x)dx = 1
  • Pointprobability:P(ξ=c)=0foranyrealnumbercPoint probability: P(ξ = c) = 0 for any real number c
Key Insight:

Forcontinuousrandomvariables:P(a<ξb)=P(aξb)=P(aξ<b)=P(a<ξ<b)=abp(x)dxFor continuous random variables: P(a < ξ ≤ b) = P(a ≤ ξ ≤ b) = P(a ≤ ξ < b) = P(a < ξ < b) = ∫ₐᵇ p(x)dx

Distribution Functions

The fundamental tool for describing probability distributions

Cumulative Distribution Function (CDF)
Definition:

Foranyrandomvariableξ,F(x)=P(ξx)forxRiscalledthedistributionfunctionofξFor any random variable ξ, F(x) = P(ξ ≤ x) for x ∈ ℝ is called the distribution function of ξ

Essential Properties

Monotonicity

Ifab,thenF(a)F(b)If a ≤ b, then F(a) ≤ F(b)

The probability never decreases as we move right on the real line

Limit Properties

F()=0,F(+)=1F(-∞) = 0, F(+∞) = 1

Probability approaches 0 at negative infinity and 1 at positive infinity

Right Continuity

F(x+0)=F(x)forallxF(x+0) = F(x) for all x

The function is continuous from the right at every point

Relationships with PMF/PDF

Discrete Case
F(x)=Σxixp(xi)F(x) = Σ_{xᵢ≤x} p(xᵢ)

Step function with jumps at discrete values

Continuous Case
F(x)=xp(t)dtF(x) = ∫₋∞ˣ p(t)dt

Smooth function with F'(x) = p(x) at continuity points

Common Discrete Distributions

Essential discrete probability distributions and their applications

Bernoulli Distribution
Ber(p)
Parameters:

p ∈ (0,1) (success probability)

Probability Mass Function:
P(X=k)=pk(1p)1k,k0,1P(X = k) = p^k(1-p)^{1-k}, k ∈ {0,1}
Application:

Single trial success/failure experiment

Key Properties:
  • E[X]=pE[X] = p
  • Var(X)=p(1p)Var(X) = p(1-p)
  • Specialcaseofbinomialwithn=1Special case of binomial with n=1
Binomial Distribution
B(n,p)
Parameters:

n ∈ ℕ (trials), p ∈ (0,1) (success probability)

Probability Mass Function:
P(X=k)=Cnkpk(1p)nk,k=0,1,...,nP(X = k) = C_n^k p^k(1-p)^{n-k}, k = 0,1,...,n
Application:

Number of successes in n independent Bernoulli trials

Key Properties:
  • E[X]=npE[X] = np
  • Var(X)=np(1p)Var(X) = np(1-p)
  • Additive:B(n1,p)+B(n2,p)=B(n1+n2,p)Additive: B(n₁,p) + B(n₂,p) = B(n₁+n₂,p)
Poisson Distribution
P(λ)
Parameters:

λ > 0 (rate parameter)

Probability Mass Function:
P(X=k)=λkeλk!,k=0,1,2,...P(X = k) = \frac{\lambda^k e^{-\lambda}}{k!}, k = 0,1,2,...
Application:

Rare events occurrence (defects, arrivals, accidents)

Key Properties:
  • E[X]=Var(X)=λE[X] = Var(X) = λ
  • Additive:P(λ1)+P(λ2)=P(λ1+λ2)Additive: P(λ₁) + P(λ₂) = P(λ₁+λ₂)
  • Approximatesbinomialwhennlarge,psmall,np=λApproximates binomial when n large, p small, np = λ
Geometric Distribution
Geo(p)
Parameters:

p ∈ (0,1) (success probability)

Probability Mass Function:
P(X=k)=(1p)k1p,k=1,2,3,...P(X = k) = (1-p)^{k-1}p, k = 1,2,3,...
Application:

Number of trials until first success

Key Properties:
  • E[X]=1/pE[X] = 1/p
  • Var(X)=(1p)/p2Var(X) = (1-p)/p²
  • Memoryless:P(X>s+tX>s)=P(X>t)Memoryless: P(X > s+t|X > s) = P(X > t)
Hypergeometric Distribution
H(n,M,N)
Parameters:

n (sample size), M (success states), N (population size)

Probability Mass Function:
P(X=k)=CMkCNMnkCNnP(X = k) = \frac{C_M^k C_{N-M}^{n-k}}{C_N^n}
Application:

Sampling without replacement (defective items, card draws)

Key Properties:
  • E[X]=n(M/N)E[X] = n(M/N)
  • ApproximatesB(n,M/N)whenNApproximates B(n,M/N) when N → ∞

Common Continuous Distributions

Fundamental continuous probability distributions and their properties

Uniform Distribution
U(a,b)
Parameters:

a,bR,a<ba, b ∈ ℝ, a < b

Application:

Equal probability over an interval (random timing, rounding errors)

Key Properties:
  • E[X]=(a+b)/2E[X] = (a+b)/2
  • Var(X)=(ba)2/12Var(X) = (b-a)²/12
  • P(c<X<c+l)=l/(ba)foranycP(c < X < c+l) = l/(b-a) for any c
Probability Density Function:
p(x)={1ba,a<x<b0,otherwisep(x) = \begin{cases}\frac{1}{b-a}, & a < x < b \\ 0, & \text{otherwise}\end{cases}
Normal Distribution
N(μ,σ²)
Parameters:

μR(mean),σ2>0(variance)μ ∈ ℝ (mean), σ² > 0 (variance)

Application:

Natural phenomena (heights, measurement errors, test scores)

Key Properties:
  • E[X]=μE[X] = μ
  • Var(X)=σ2Var(X) = σ²
  • Standardization:(Xμ)/σ N(0,1)Standardization: (X-μ)/σ ~ N(0,1)
  • AdditivepropertyAdditive property
Probability Density Function:
p(x)=12πσe(xμ)22σ2p(x) = \frac{1}{\sqrt{2\pi}\sigma}e^{-\frac{(x-\mu)^2}{2\sigma^2}}
Exponential Distribution
Exp(λ)
Parameters:

λ>0(rateparameter)λ > 0 (rate parameter)

Application:

Lifetime modeling, waiting times between events

Key Properties:
  • E[X]=1/λE[X] = 1/λ
  • Var(X)=1/λ2Var(X) = 1/λ²
  • Memoryless:P(X>s+tX>s)=P(X>t)Memoryless: P(X > s+t|X > s) = P(X > t)
Probability Density Function:
p(x)={λeλx,x>00,x0p(x) = \begin{cases}\lambda e^{-\lambda x}, & x > 0 \\ 0, & x \leq 0\end{cases}
Gamma Distribution
Γ(α,β)
Parameters:

α>0(shape),β>0(rate)α > 0 (shape), β > 0 (rate)

Application:

Sum of independent exponential variables, reliability modeling

Key Properties:
  • E[X]=α/βE[X] = α/β
  • Var(X)=α/β2Var(X) = α/β²
  • Additive:Γ(α1,β)+Γ(α2,β)=Γ(α1+α2,β)Additive: Γ(α₁,β) + Γ(α₂,β) = Γ(α₁+α₂,β)
Probability Density Function:
p(x)={βαΓ(α/>xα1eβx,x>00,x0p(x) = \begin{cases}\frac{\beta^\alpha}{\Gamma(\alpha} />x^{\alpha-1}e^{-\beta x}, & x > 0 \\ 0, & x \leq 0\end{cases}
Chi-squared Distribution
χ²(n)
Parameters:

nN(degreesoffreedom)n ∈ ℕ (degrees of freedom)

Application:

Sum of squares of independent standard normal variables

Key Properties:
  • E[X]=nE[X] = n
  • Var(X)=2nVar(X) = 2n
  • Additive:χ2(n1)+χ2(n2)=χ2(n1+n2)Additive: χ²(n₁) + χ²(n₂) = χ²(n₁+n₂)
Probability Density Function:
p(x)={(1/2)n/2Γ(n/2/>xn/21ex/2,x>00,x0p(x) = \begin{cases}\frac{(1/2)^{n/2}}{\Gamma(n/2} />x^{n/2-1}e^{-x/2}, & x > 0 \\ 0, & x \leq 0\end{cases}

Multidimensional Random Variables

Joint distributions, independence, and conditional distributions

Joint Distributions
Discrete Case

JointPMF:P(X=xi,Y=yj)=pijwithΣiΣjpij=1Joint PMF: P(X = xᵢ, Y = yⱼ) = pᵢⱼ with Σᵢ Σⱼ pᵢⱼ = 1

Marginals: P(X=xi)=Σjpij,P(Y=yj)=ΣipijP(X = xᵢ) = Σⱼ pᵢⱼ, P(Y = yⱼ) = Σᵢ pᵢⱼ

Continuous Case

JointPDF:p(x,y)withp(x,y)dxdy=1Joint PDF: p(x,y) with ∫∫ p(x,y)dxdy = 1

Marginals: pX(x)=p(x,y)dy,pY(y)=p(x,y)dxp_X(x) = ∫ p(x,y)dy, p_Y(y) = ∫ p(x,y)dx

Independence
Definition:

RandomvariablesXandYareindependentifF(x,y)=FX(x)FY(y)forallx,yRandom variables X and Y are independent if F(x,y) = F_X(x)F_Y(y) for all x,y

Equivalent Conditions:
  • Discrete:pij=pipjforalli,jDiscrete: pᵢⱼ = pᵢ·p·ⱼ for all i,j
  • Continuous:p(x,y)=pX(x)pY(y)almosteverywhereContinuous: p(x,y) = p_X(x)p_Y(y) almost everywhere
Example:

InbivariatenormaldistributionN(μ1,μ2,σ12,σ22,ρ),XandYareindependentiffρ=0In bivariate normal distribution N(μ₁,μ₂,σ₁²,σ₂²,ρ), X and Y are independent iff ρ = 0

Conditional Distributions
Discrete:

P(Y=yjX=xi)=pij/pi(whenpi>0)P(Y = yⱼ|X = xᵢ) = pᵢⱼ/pᵢ· (when pᵢ· > 0)

Continuous:

pYX(yx)=p(x,y)/pX(x)(whenpX(x)>0)p_{Y|X}(y|x) = p(x,y)/p_X(x) (when p_X(x) > 0)

Sampling Distributions

Essential distributions for statistical inference

Chi-squared Distribution
Definition:

IfX1,X2,...,Xn N(0,1)independently,thenY=Σi=1nXi2 χ2(n)If X₁, X₂, ..., Xₙ ~ N(0,1) independently, then Y = Σᵢ₌₁ⁿ Xᵢ² ~ χ²(n)

Properties:
  • Degreesoffreedom:nDegrees of freedom: n
  • AdditivepropertyAdditive property
  • E[Y]=n,Var(Y)=2nE[Y] = n, Var(Y) = 2n
Statistical Applications:
  • Goodness of fit tests
  • Variance testing
  • Independence testing
t-Distribution
Definition:

IfX N(0,1),Y χ2(n)independently,thenT=X/(Y/n) t(n)If X ~ N(0,1), Y ~ χ²(n) independently, then T = X/√(Y/n) ~ t(n)

Properties:
  • Degreesoffreedom:nDegrees of freedom: n
  • Symmetricabout0Symmetric about 0
  • ApproachesN(0,1)asnApproaches N(0,1) as n → ∞
Statistical Applications:
  • Small sample inference
  • Confidence intervals for mean
  • Hypothesis testing
F-Distribution
Definition:

IfX χ2(m),Y χ2(n)independently,thenF=(X/m)/(Y/n) F(m,n)If X ~ χ²(m), Y ~ χ²(n) independently, then F = (X/m)/(Y/n) ~ F(m,n)

Properties:
  • Twodegreesoffreedom:m,nTwo degrees of freedom: m, n
  • F(m,n)=1/F(n,m)F(m,n) = 1/F(n,m)
  • IfT t(n),thenT2 F(1,n)If T ~ t(n), then T² ~ F(1,n)
Statistical Applications:
  • Variance ratio testing
  • ANOVA
  • Regression analysis

Worked Examples

Step-by-step solutions to typical random variable problems

Binomial Probability Calculation
Problem:

In 10 independent trials with success probability 0.3, find P(X = 4)

Solution Steps:
  1. 1Step1:IdentifydistributionX B(10,0.3)Step 1: Identify distribution - X ~ B(10, 0.3)
  2. 2Step2:ApplyPMFformulaP(X=k)=C10k(0.3)k(0.7)10kStep 2: Apply PMF formula - P(X = k) = C₁₀ᵏ (0.3)ᵏ (0.7)¹⁰⁻ᵏ
  3. 3Step3:CalculatecombinationC104=10!/(4!×6!)=210Step 3: Calculate combination - C₁₀⁴ = 10!/(4!×6!) = 210
  4. 4Step4:ComputeprobabilityP(X=4)=210×(0.3)4×(0.7)6Step 4: Compute probability - P(X = 4) = 210 × (0.3)⁴ × (0.7)⁶
  5. 5Step5:FinalcalculationP(X=4)=210×0.0081×0.11760.200Step 5: Final calculation - P(X = 4) = 210 × 0.0081 × 0.1176 ≈ 0.200
Key Point:

Binomial distribution applies to fixed number of independent trials with constant success probability

Normal Distribution Standardization
Problem:

If X ~ N(100, 16), find P(96 < X < 108)

Solution Steps:
  1. 1Step1:Identifyparametersμ=100,σ2=16,soσ=4Step 1: Identify parameters - μ = 100, σ² = 16, so σ = 4
  2. 2Step2:StandardizeZ=(X100)/4 N(0,1)Step 2: Standardize - Z = (X - 100)/4 ~ N(0,1)
  3. 3Step3:TransformboundsP(96<X<108)=P(1<Z<2)Step 3: Transform bounds - P(96 < X < 108) = P(-1 < Z < 2)
  4. 4Step4:UsestandardnormaltableP(1<Z<2)=Φ(2)Φ(1)Step 4: Use standard normal table - P(-1 < Z < 2) = Φ(2) - Φ(-1)
  5. 5Step5:CalculateP(1<Z<2)=0.97720.1587=0.8185Step 5: Calculate - P(-1 < Z < 2) = 0.9772 - 0.1587 = 0.8185
Key Point:

Standardization allows us to use the standard normal table for any normal distribution

Exponential Distribution Memory Property
Problem:

For X ~ Exp(λ), prove that P(X > s+t|X > s) = P(X > t)

Solution Steps:
  1. 1Step1:WriteconditionalprobabilityP(X>s+tX>s)=P(X>s+t,X>s)/P(X>s)Step 1: Write conditional probability - P(X > s+t|X > s) = P(X > s+t, X > s)/P(X > s)
  2. 2Step2:SimplifynumeratorP(X>s+t,X>s)=P(X>s+t)Step 2: Simplify numerator - P(X > s+t, X > s) = P(X > s+t)
  3. 3Step3:UseexponentialCDFP(X>x)=eλxforx>0Step 3: Use exponential CDF - P(X > x) = e⁻λˣ for x > 0
  4. 4Step4:SubstituteP(X>s+tX>s)=eλ(s+t)/eλs=eλtStep 4: Substitute - P(X > s+t|X > s) = e⁻λ(s+t)/e⁻λs = e⁻λt
  5. 5Step5:ConcludeP(X>s+tX>s)=eλt=P(X>t)Step 5: Conclude - P(X > s+t|X > s) = e⁻λt = P(X > t)
Key Point:

Memoryless property means past waiting time doesn't affect future waiting time

Study Tips & Best Practices

Problem-Solving Strategy:

  • Identify the distribution: Look for key characteristics (discrete/continuous, parameters)
  • Choose appropriate formulas: PMF for discrete, PDF for continuous, CDF for probabilities
  • Check boundary conditions: Ensure probabilities sum/integrate to 1
  • Use standardization: Transform to standard forms when possible

Common Mistakes to Avoid:

  • Confusing PMF and PDF: Remember P(X=x)=0 for continuous variables
  • Forgetting independence: Joint PMF/PDF must factor for independent variables
  • Wrong distribution choice: Match problem context to appropriate distribution
  • Parameter interpretation: Understand what each parameter represents physically

Practice and apply what you've learned