MathIsimple
Core Topic
8-10 Hours

Random Variables & Distributions

Intermediate Level
12 Lessons
Core Concepts

Random Variable Fundamentals

Understanding the mathematical foundation and classification of random variables

Definition & Classification

Random Variable Definition

X:ΩR,ameasurablefunctionmappingfromsamplespacetorealnumbersX: \Omega \to \mathbb{R}, a measurable function mapping from sample space to real numbers

Key Point:

Random variable transforms random outcomes into numerical values for mathematical analysis

Properties:
  • Measurability:{ω:X(ω)x}FforallxRMeasurability: \{\omega: X(\omega) \leq x\} \in \mathcal{F} for all x \in \mathbb{R}
  • Range:Canbefinite,countablyinfinite,oruncountablyinfiniteRange: Can be finite, countably infinite, or uncountably infinite
  • Inducesprobability:PX(A)=P({ω:X(ω)A})Induces probability: P_X(A) = P(\{\omega: X(\omega) \in A\})
Examples:
  • Dice roll: X(outcome) = number shown
  • Coin tosses: X = number of heads in n tosses
  • Lifetime: X = time until device failure

Discrete vs Continuous

ClassificationbasedontherangeofpossiblevaluesClassification based on the range of possible values

Characterization:

Discrete:countablerange;Continuous:uncountablerange(interval)Discrete: countable range; Continuous: uncountable range (interval)

Properties:
  • Discrete:describedbyPMFP(X=xi)Discrete: described by PMF P(X=x_i)
  • Continuous:describedbyPDFp(x)whereP(X=x)=0Continuous: described by PDF p(x) where P(X=x)=0
  • Mixed:combinationofdiscreteandcontinuouspartsMixed: combination of discrete and continuous parts
Examples:
  • Discrete: number of customers, defects count
  • Continuous: temperature, waiting time, height
  • Mixed: insurance claim (0 with positive probability, continuous otherwise)

Distribution Functions

The fundamental tool for describing probability distributions

Cumulative Distribution Function

Definition

F(x)=P(Xx)forallxRF(x) = P(X \leq x) for all x \in \mathbb{R}

Essential Properties

Monotonicity

F(x1)F(x2) if x1x2F(x_1) \leq F(x_2) \text{ if } x_1 \leq x_2

CDF is non-decreasing

Right Continuity

limxa+F(x)=F(a)\lim_{x \to a^+} F(x) = F(a)

Continuous from the right

Limits

limxF(x)=0,limx+F(x)=1\lim_{x \to -\infty} F(x) = 0, \lim_{x \to +\infty} F(x) = 1

Bounds at infinity

Relationships with PMF/PDF

Discrete Case
F(x)=xixP(X=xi)F(x) = \sum_{x_i \leq x} P(X = x_i)

Sum of probabilities up to x

Continuous Case
F(x)=xp(t)dt,p(x)=F(x)F(x) = \int_{-\infty}^x p(t)dt, \quad p(x) = F'(x)

Integral of PDF; PDF is derivative of CDF

Common Discrete Distributions

Essential discrete probability distributions and their applications

Bernoulli Distribution
Ber(p)
Parameters:

p ∈ (0,1) (success probability)

Probability Mass Function:
P(X=k)=pk(1p)1k,k0,1P(X = k) = p^k(1-p)^{1-k}, k ∈ {0,1}
Application:

Single trial success/failure experiment

Key Properties:
  • E[X]=pE[X] = p
  • Var(X)=p(1p)Var(X) = p(1-p)
  • Specialcaseofbinomialwithn=1Special case of binomial with n=1
Binomial Distribution
B(n,p)
Parameters:

n ∈ ℕ (trials), p ∈ (0,1) (success probability)

Probability Mass Function:
P(X=k)=Cnkpk(1p)nk,k=0,1,...,nP(X = k) = C_n^k p^k(1-p)^{n-k}, k = 0,1,...,n
Application:

Number of successes in n independent Bernoulli trials

Key Properties:
  • E[X]=npE[X] = np
  • Var(X)=np(1p)Var(X) = np(1-p)
  • Additive:B(n1,p)+B(n2,p)=B(n1+n2,p)Additive: B(n₁,p) + B(n₂,p) = B(n₁+n₂,p)
Poisson Distribution
P(λ)
Parameters:

λ > 0 (rate parameter)

Probability Mass Function:
P(X=k)=λkeλk!,k=0,1,2,...P(X = k) = \frac{\lambda^k e^{-\lambda}}{k!}, k = 0,1,2,...
Application:

Rare events occurrence (defects, arrivals, accidents)

Key Properties:
  • E[X]=Var(X)=λE[X] = Var(X) = λ
  • Additive:P(λ1)+P(λ2)=P(λ1+λ2)Additive: P(λ₁) + P(λ₂) = P(λ₁+λ₂)
  • Approximatesbinomialwhennlarge,psmall,np=λApproximates binomial when n large, p small, np = λ
Geometric Distribution
Geo(p)
Parameters:

p ∈ (0,1) (success probability)

Probability Mass Function:
P(X=k)=(1p)k1p,k=1,2,3,...P(X = k) = (1-p)^{k-1}p, k = 1,2,3,...
Application:

Number of trials until first success

Key Properties:
  • E[X]=1/pE[X] = 1/p
  • Var(X)=(1p)/p2Var(X) = (1-p)/p²
  • Memoryless:P(X>s+tX>s)=P(X>t)Memoryless: P(X > s+t|X > s) = P(X > t)
Hypergeometric Distribution
H(n,M,N)
Parameters:

n (sample size), M (success states), N (population size)

Probability Mass Function:
P(X=k)=CMkCNMnkCNnP(X = k) = \frac{C_M^k C_{N-M}^{n-k}}{C_N^n}
Application:

Sampling without replacement (defective items, card draws)

Key Properties:
  • E[X]=n(M/N)E[X] = n(M/N)
  • ApproximatesB(n,M/N)whenNApproximates B(n,M/N) when N → ∞

Common Continuous Distributions

Fundamental continuous probability distributions and their properties

Uniform Distribution
U(a,b)
Parameters:

a,bR,a<ba, b ∈ ℝ, a < b

Application:

Equal probability over an interval (random timing, rounding errors)

Key Properties:
  • E[X]=(a+b)/2E[X] = (a+b)/2
  • Var(X)=(ba)2/12Var(X) = (b-a)²/12
  • P(c<X<c+l)=l/(ba)foranycP(c < X < c+l) = l/(b-a) for any c
Probability Density Function:
p(x)={1ba,a<x<b0,otherwisep(x) = \begin{cases}\frac{1}{b-a}, & a < x < b \\ 0, & \text{otherwise}\end{cases}
Normal Distribution
N(μ,σ²)
Parameters:

μR(mean),σ2>0(variance)μ ∈ ℝ (mean), σ² > 0 (variance)

Application:

Natural phenomena (heights, measurement errors, test scores)

Key Properties:
  • E[X]=μE[X] = μ
  • Var(X)=σ2Var(X) = σ²
  • Standardization:(Xμ)/σ N(0,1)Standardization: (X-μ)/σ ~ N(0,1)
  • AdditivepropertyAdditive property
Probability Density Function:
p(x)=12πσe(xμ)22σ2p(x) = \frac{1}{\sqrt{2\pi}\sigma}e^{-\frac{(x-\mu)^2}{2\sigma^2}}
Exponential Distribution
Exp(λ)
Parameters:

λ>0(rateparameter)λ > 0 (rate parameter)

Application:

Lifetime modeling, waiting times between events

Key Properties:
  • E[X]=1/λE[X] = 1/λ
  • Var(X)=1/λ2Var(X) = 1/λ²
  • Memoryless:P(X>s+tX>s)=P(X>t)Memoryless: P(X > s+t|X > s) = P(X > t)
Probability Density Function:
p(x)={λeλx,x>00,x0p(x) = \begin{cases}\lambda e^{-\lambda x}, & x > 0 \\ 0, & x \leq 0\end{cases}
Gamma Distribution
Γ(α,β)
Parameters:

α>0(shape),β>0(rate)α > 0 (shape), β > 0 (rate)

Application:

Sum of independent exponential variables, reliability modeling

Key Properties:
  • E[X]=α/βE[X] = α/β
  • Var(X)=α/β2Var(X) = α/β²
  • Additive:Γ(α1,β)+Γ(α2,β)=Γ(α1+α2,β)Additive: Γ(α₁,β) + Γ(α₂,β) = Γ(α₁+α₂,β)
Probability Density Function:
p(x)={βαΓ(α)xα1eβx,x>00,x0p(x) = \begin{cases}\frac{\beta^\alpha}{\Gamma(\alpha)}x^{\alpha-1}e^{-\beta x}, & x > 0 \\ 0, & x \leq 0\end{cases}
Chi-squared Distribution
χ²(n)
Parameters:

nN(degreesoffreedom)n ∈ ℕ (degrees of freedom)

Application:

Sum of squares of independent standard normal variables

Key Properties:
  • E[X]=nE[X] = n
  • Var(X)=2nVar(X) = 2n
  • Additive:χ2(n1)+χ2(n2)=χ2(n1+n2)Additive: χ²(n₁) + χ²(n₂) = χ²(n₁+n₂)
Probability Density Function:
p(x)={(1/2)n/2Γ(n/2)xn/21ex/2,x>00,x0p(x) = \begin{cases}\frac{(1/2)^{n/2}}{\Gamma(n/2)}x^{n/2-1}e^{-x/2}, & x > 0 \\ 0, & x \leq 0\end{cases}

Multidimensional Random Variables

Understanding joint distributions and independence of multiple random variables

Joint Distributions

Discrete Case

Definition:

JointPMF:P(X=xi,Y=yj)=pijwithΣiΣjpij=1Joint PMF: P(X = xᵢ, Y = yⱼ) = pᵢⱼ with Σᵢ Σⱼ pᵢⱼ = 1

Condition:
  • Nonnegativity:pij0Non-negativity: pᵢⱼ ≥ 0
  • Normalization:ΣΣpij=1Normalization: ΣΣ pᵢⱼ = 1
Properties:
  • MarginalPMF:P(X=xi)=ΣjpijMarginal PMF: P(X=xᵢ) = Σⱼ pᵢⱼ
  • ConditionalPMF:P(Y=yjX=xi)=pij/piConditional PMF: P(Y=yⱼ|X=xᵢ) = pᵢⱼ/pᵢ·

Continuous Case

Definition:

JointPDF:p(x,y)withp(x,y)dxdy=1Joint PDF: p(x,y) with ∫∫ p(x,y)dxdy = 1

Condition:
  • Nonnegativity:p(x,y)0Non-negativity: p(x,y) ≥ 0
  • Normalization:p(x,y)dxdy=1Normalization: ∫∫ p(x,y)dxdy = 1
Properties:
  • MarginalPDF:pX(x)=p(x,y)dyMarginal PDF: p_X(x) = ∫ p(x,y)dy
  • ConditionalPDF:pYX(yx)=p(x,y)/pX(x)Conditional PDF: p_{Y|X}(y|x) = p(x,y)/p_X(x)
Independence

Definition

Definition:

RandomvariablesXandYareindependentiftheirjointdistributionfactorsintotheproductofmarginalsRandom variables X and Y are independent if their joint distribution factors into the product of marginals

Condition:
  • Discrete:P(X=x,Y=y)=P(X=x)P(Y=y)Discrete: P(X=x, Y=y) = P(X=x)P(Y=y)
  • Continuous:p(x,y)=pX(x)pY(y)Continuous: p(x,y) = p_X(x)p_Y(y)
Properties:
  • E[XY]=E[X]E[Y]E[XY] = E[X]E[Y]
  • Var(X+Y)=Var(X)+Var(Y)Var(X+Y) = Var(X) + Var(Y)
  • Cov(X,Y)=0Cov(X,Y) = 0

Functions of Random Variables

Methods for determining the distribution of functions of random variables

Distribution of Functions

Discrete Case

Method:

P(Y=y)=Σx:g(x)=yP(X=x)P(Y=y) = Σ_{x:g(x)=y} P(X=x)

Steps:
  • 1IdentifyallxvaluesmappingtoyIdentify all x values mapping to y
  • 2SumprobabilitiesofthesexvaluesSum probabilities of these x values
Applications:
  • Linear transformations Y = aX + b
  • Square of a random variable Y = X²

Continuous Case (Monotonic)

Method:

pY(y)=pX(h(y))h(y)wherex=h(y)istheinversefunctionp_Y(y) = p_X(h(y))|h'(y)| where x=h(y) is the inverse function

Steps:
  • 1Findinversefunctionx=h(y)Find inverse function x = h(y)
  • 2Calculatederivativeh(y)Calculate derivative h'(y)
  • 3SubstituteintoformulaSubstitute into formula
Applications:
  • Linear scaling
  • Log-normal distribution derivation

Sampling Distributions

Essential distributions for statistical inference

Chi-squared Distribution
Definition:

IfX1,X2,...,Xn N(0,1)independently,thenY=Σi=1nXi2 χ2(n)If X₁, X₂, ..., Xₙ ~ N(0,1) independently, then Y = Σᵢ₌₁ⁿ Xᵢ² ~ χ²(n)

Properties:
  • Degreesoffreedom:nDegrees of freedom: n
  • AdditivepropertyAdditive property
  • E[Y]=n,Var(Y)=2nE[Y] = n, Var(Y) = 2n
Statistical Applications:
  • Goodness of fit tests
  • Variance testing
  • Independence testing
t-Distribution
Definition:

IfX N(0,1),Y χ2(n)independently,thenT=X/(Y/n) t(n)If X ~ N(0,1), Y ~ χ²(n) independently, then T = X/√(Y/n) ~ t(n)

Properties:
  • Degreesoffreedom:nDegrees of freedom: n
  • Symmetricabout0Symmetric about 0
  • ApproachesN(0,1)asnApproaches N(0,1) as n → ∞
Statistical Applications:
  • Small sample inference
  • Confidence intervals for mean
  • Hypothesis testing
F-Distribution
Definition:

IfX χ2(m),Y χ2(n)independently,thenF=(X/m)/(Y/n) F(m,n)If X ~ χ²(m), Y ~ χ²(n) independently, then F = (X/m)/(Y/n) ~ F(m,n)

Properties:
  • Twodegreesoffreedom:m,nTwo degrees of freedom: m, n
  • F(m,n)=1/F(n,m)F(m,n) = 1/F(n,m)
  • IfT t(n),thenT2 F(1,n)If T ~ t(n), then T² ~ F(1,n)
Statistical Applications:
  • Variance ratio testing
  • ANOVA
  • Regression analysis

Worked Examples

Step-by-step solutions to typical random variable problems

Binomial Probability Calculation
Problem:

In 10 independent trials with success probability 0.3, find P(X = 4)

Solution Steps:
  1. 1Step1:IdentifydistributionX B(10,0.3)Step 1: Identify distribution - X ~ B(10, 0.3)
  2. 2Step2:ApplyPMFformulaP(X=k)=C10k(0.3)k(0.7)10kStep 2: Apply PMF formula - P(X = k) = C₁₀ᵏ (0.3)ᵏ (0.7)¹⁰⁻ᵏ
  3. 3Step3:CalculatecombinationC104=10!/(4!×6!)=210Step 3: Calculate combination - C₁₀⁴ = 10!/(4!×6!) = 210
  4. 4Step4:ComputeprobabilityP(X=4)=210×(0.3)4×(0.7)6Step 4: Compute probability - P(X = 4) = 210 × (0.3)⁴ × (0.7)⁶
  5. 5Step5:FinalcalculationP(X=4)=210×0.0081×0.11760.200Step 5: Final calculation - P(X = 4) = 210 × 0.0081 × 0.1176 ≈ 0.200
Key Point:

Binomial distribution applies to fixed number of independent trials with constant success probability

Normal Distribution Standardization
Problem:

If X ~ N(100, 16), find P(96 < X < 108)

Solution Steps:
  1. 1Step1:Identifyparametersμ=100,σ2=16,soσ=4Step 1: Identify parameters - μ = 100, σ² = 16, so σ = 4
  2. 2Step2:StandardizeZ=(X100)/4 N(0,1)Step 2: Standardize - Z = (X - 100)/4 ~ N(0,1)
  3. 3Step3:TransformboundsP(96<X<108)=P(1<Z<2)Step 3: Transform bounds - P(96 < X < 108) = P(-1 < Z < 2)
  4. 4Step4:UsestandardnormaltableP(1<Z<2)=Φ(2)Φ(1)Step 4: Use standard normal table - P(-1 < Z < 2) = Φ(2) - Φ(-1)
  5. 5Step5:CalculateP(1<Z<2)=0.97720.1587=0.8185Step 5: Calculate - P(-1 < Z < 2) = 0.9772 - 0.1587 = 0.8185
Key Point:

Standardization allows us to use the standard normal table for any normal distribution

Exponential Distribution Memory Property
Problem:

For X ~ Exp(λ), prove that P(X > s+t|X > s) = P(X > t)

Solution Steps:
  1. 1Step1:WriteconditionalprobabilityP(X>s+tX>s)=P(X>s+t,X>s)/P(X>s)Step 1: Write conditional probability - P(X > s+t|X > s) = P(X > s+t, X > s)/P(X > s)
  2. 2Step2:SimplifynumeratorP(X>s+t,X>s)=P(X>s+t)Step 2: Simplify numerator - P(X > s+t, X > s) = P(X > s+t)
  3. 3Step3:UseexponentialCDFP(X>x)=eλxforx>0Step 3: Use exponential CDF - P(X > x) = e⁻λˣ for x > 0
  4. 4Step4:SubstituteP(X>s+tX>s)=eλ(s+t)/eλs=eλtStep 4: Substitute - P(X > s+t|X > s) = e⁻λ(s+t)/e⁻λs = e⁻λt
  5. 5Step5:ConcludeP(X>s+tX>s)=eλt=P(X>t)Step 5: Conclude - P(X > s+t|X > s) = e⁻λt = P(X > t)
Key Point:

Memoryless property means past waiting time doesn't affect future waiting time

Practice Quiz
10
Questions
0
Correct
0%
Accuracy
1
What is the key difference between discrete and continuous random variables?
Not attempted
2
For a continuous random variable XX, what is P(X=x)P(X = x) for any specific value xx?
Not attempted
3
Which property does NOT hold for a CDF F(x)F(x)?
Not attempted
4
If XB(n,p)X \sim B(n, p), what is E[X]E[X]?
Not attempted
5
What distribution models the number of trials until the first success?
Not attempted
6
If XX and YY are independent, which is TRUE?
Not attempted
7
What is the memoryless property?
Not attempted
8
If XN(0,1)X \sim N(0, 1) and Y=i=1nXi2Y = \sum_{i=1}^n X_i^2 where XiX_i are independent N(0,1)N(0,1), what is YY's distribution?
Not attempted
9
For the transformation Y=aX+bY = aX + b where XX is continuous with PDF pX(x)p_X(x), what is pY(y)p_Y(y)?
Not attempted
10
What does E[XY]=E[X]E[Y]E[XY] = E[X]E[Y] imply about XX and YY?
Not attempted

Frequently Asked Questions

What is the difference between PMF and PDF?

PMF (Probability Mass Function) is for discrete random variables and gives exact probabilities: P(X = x). PDF (Probability Density Function) is for continuous random variables and gives probability density, where probabilities are found by integration over intervals: P(a < X < b) = ∫ₐᵇ p(x)dx. For continuous variables, P(X = x) = 0.

How do I know which distribution to use for a problem?

Match the problem context to distribution characteristics: Fixed trials with success/failure → Binomial; Time until first success → Geometric; Rare events over time/space → Poisson; Waiting time → Exponential; Measurement errors, natural phenomena → Normal. Look for key words and understand what each parameter represents.

What does independence mean for random variables?

Random variables X and Y are independent if knowing the value of one provides no information about the other. Mathematically: P(X=x, Y=y) = P(X=x)×P(Y=y) (discrete) or p(x,y) = pₓ(x)×pᵧ(y) (continuous). Independence implies uncorrelatedness (Cov(X,Y) = 0), but the converse is not generally true.

Why are sampling distributions (χ², t, F) important?

Sampling distributions describe the behavior of statistics computed from random samples. χ² distribution is used for variance testing and goodness-of-fit; t-distribution for small sample means and confidence intervals; F-distribution for comparing variances and ANOVA. They form the foundation of statistical inference.

How do I transform a random variable?

For discrete: P(Y=y) = Σ_{x:g(x)=y} P(X=x). For continuous monotonic transformations: find inverse x=h(y), then pᵧ(y) = pₓ(h(y))|h'(y)|. For non-monotonic or multivariate transformations, use Jacobian methods or CDF approach. Always check the domain of the transformed variable.

Ask AI ✨
MathIsimple – Simple, Friendly Math Tools & Learning