MathIsimple
Back to Mathematical Statistics
Distribution Families
6-8 Hours

Common Distribution Families

Master probability distribution families and their applications in statistical inference

Learning Objectives
What you'll master in probability distribution theory
  • Master fundamental distribution families: binomial, Poisson, normal, uniform, exponential
  • Understand advanced distributions: gamma, chi-square, t-distribution, F-distribution
  • Learn exponential family theory and its applications in statistical inference
  • Explore distribution relationships and transformation properties
  • Apply distribution knowledge to real-world statistical problems
  • Recognize when to use specific distributions in statistical modeling

Fundamental Distributions

Core probability distributions essential for statistical modeling

Binomial Distribution
Models the number of successes in nn independent Bernoulli trials

The binomial distribution B(n,p)B(n,p) describes the number of successes in a fixed number of independent trials.

P(X=k)=(nk)pk(1p)nk,k=0,1,,nP(X=k) = \binom{n}{k}p^k(1-p)^{n-k}, \quad k=0,1,\ldots,n

Parameters: n{1,2,3,}n \in \{1,2,3,\ldots\} (number of trials), p(0,1)p \in (0,1) (success probability)

Mean

E[X]=npE[X] = np

Variance

Var(X)=np(1p)\text{Var}(X) = np(1-p)

MGF

MX(t)=(pet+1p)nM_X(t) = (pe^t + 1-p)^n

Distribution Family

Exponential family

Applications:
  • Quality control: Number of defective items in a batch
  • Medical trials: Success rate of treatments across patients
  • Marketing: Response rates to advertising campaigns
Example: Quality Control Inspection

Problem:

A factory produces items with a 5% defect rate. If we inspect 20 randomly selected items, what is the probability of finding exactly 2 defective items? What is the expected number of defects?

Solution:

  1. Identify parameters: n=20n=20 trials, p=0.05p=0.05 defect probability
  2. Distribution: XB(20,0.05)X \sim B(20, 0.05)
  3. Calculate probability for exactly k=2k=2 defects:
    P(X=2)=(202)(0.05)2(0.95)18P(X=2) = \binom{20}{2}(0.05)^2(0.95)^{18}
  4. Compute: P(X=2)=190×0.0025×0.39720.189P(X=2) = 190 \times 0.0025 \times 0.3972 \approx 0.189
  5. Expected defects: E[X]=np=20×0.05=1E[X] = np = 20 \times 0.05 = 1

Key Insight:

The binomial distribution is appropriate when we have a fixed number of independent trials with constant success probability. Use the binomial formula directly for small nn, or normal approximation for large nn.

Poisson Distribution
Models the number of rare events occurring in a fixed interval

The Poisson distribution P(λ)P(\lambda) models the count of events occurring in a fixed time or space interval when events occur independently at a constant average rate.

P(X=k)=λkeλk!,k=0,1,2,P(X=k) = \frac{\lambda^k e^{-\lambda}}{k!}, \quad k=0,1,2,\ldots

Parameter: λ>0\lambda > 0 (average rate per interval)

Mean

E[X]=λE[X] = \lambda

Variance

Var(X)=λ\text{Var}(X) = \lambda

Limit Property

B(n,p)P(np) as nB(n,p) \to P(np) \text{ as } n \to \infty

when p0p \to 0, npλnp \to \lambda

Additive Property

X1+X2P(λ1+λ2)X_1 + X_2 \sim P(\lambda_1 + \lambda_2)

for independent Poisson RVs

Applications:
  • Traffic analysis: Accidents per time period on highways
  • Telecommunications: Call arrivals at service centers
  • Biology: Mutation counts in DNA sequences
Example: Call Center Arrivals

Problem:

A call center receives an average of 4 calls per minute. Assuming calls arrive independently, what is the probability of receiving exactly 6 calls in the next minute? At most 2 calls?

Solution:

  1. Model: Calls follow XP(λ=4)X \sim P(\lambda = 4)
  2. Probability of exactly 6 calls:
    P(X=6)=46e46!=4096×0.01837200.104P(X=6) = \frac{4^6 e^{-4}}{6!} = \frac{4096 \times 0.0183}{720} \approx 0.104
  3. Probability of at most 2 calls:
    P(X2)=P(X=0)+P(X=1)+P(X=2)P(X \leq 2) = P(X=0) + P(X=1) + P(X=2)
  4. Computing: P(X2)=e4(1+4+8)=13e40.238P(X \leq 2) = e^{-4}(1 + 4 + 8) = 13e^{-4} \approx 0.238

Key Insight:

Poisson is ideal for counting rare events in fixed intervals. The mean equals the variance (λ=4\lambda = 4), and we can sum individual probabilities for cumulative calculations.

Normal Distribution
The most important continuous distribution in statistics

The normal (Gaussian) distribution N(μ,σ2)N(\mu, \sigma^2) is fundamental to statistics, appearing naturally in many phenomena due to the Central Limit Theorem.

f(x)=1σ2πexp((xμ)22σ2),xRf(x) = \frac{1}{\sigma\sqrt{2\pi}} \exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right), \quad x \in \mathbb{R}

Parameters: μR\mu \in \mathbb{R} (mean), σ>0\sigma > 0 (standard deviation)

Mean

E[X]=μE[X] = \mu

Variance

Var(X)=σ2\text{Var}(X) = \sigma^2

Standard Normal

Z=XμσN(0,1)Z = \frac{X - \mu}{\sigma} \sim N(0,1)

68-95-99.7 Rule

68% within μ±σ\mu \pm \sigma

95% within μ±2σ\mu \pm 2\sigma

99.7% within μ±3σ\mu \pm 3\sigma

Applications:
  • Natural phenomena: Heights, weights, measurement errors
  • Financial returns and risk modeling
  • Central Limit Theorem applications for sample means
Example: Standardizing Test Scores

Problem:

Test scores are normally distributed with mean μ=75\mu = 75 and standard deviation σ=10\sigma = 10. What percentage of students score above 90? What score separates the top 10% of students?

Solution:

  1. Distribution: XN(75,102)X \sim N(75, 10^2)
  2. Standardize for P(X>90)P(X > 90):
    Z=907510=1.5Z = \frac{90 - 75}{10} = 1.5
  3. From standard normal table: P(Z>1.5)=1Φ(1.5)=10.9332=0.0668P(Z > 1.5) = 1 - \Phi(1.5) = 1 - 0.9332 = 0.0668
  4. About 6.68% score above 90
  5. For top 10%: Find xx where P(X>x)=0.10P(X > x) = 0.10
  6. This means P(Z>z)=0.10P(Z > z) = 0.10, so z=1.28z = 1.28
  7. Convert back: x=μ+zσ=75+1.28(10)=87.8x = \mu + z\sigma = 75 + 1.28(10) = 87.8

Key Insight:

Always standardize normal distributions using Z=(Xμ)/σZ = (X - \mu)/\sigma to use standard normal tables. For percentiles, work backwards from ZZ-scores to original units.

Exponential Distribution
Models waiting times and lifetimes with the memoryless property

The exponential distribution Exp(λ)\text{Exp}(\lambda) models the time until an event occurs in a Poisson process, characterized by the unique memoryless property.

f(x)=λeλx,x>0f(x) = \lambda e^{-\lambda x}, \quad x > 0

Parameter: λ>0\lambda > 0 (rate parameter)

Mean

E[X]=1λE[X] = \frac{1}{\lambda}

Variance

Var(X)=1λ2\text{Var}(X) = \frac{1}{\lambda^2}

Memoryless Property

P(X>s+tX>s)=P(X>t)P(X > s+t \mid X > s) = P(X > t)

Past doesn't affect future

Relation to Gamma

Exp(λ)=Γ(1,λ)\text{Exp}(\lambda) = \Gamma(1, \lambda)
Applications:
  • Product lifetime and reliability engineering
  • Service times in queuing systems
  • Radioactive decay and failure time modeling
Example: Component Lifetime

Problem:

Electronic components have lifetimes that follow Exp(λ=0.001)\text{Exp}(\lambda = 0.001) failures per hour. What is the probability a component lasts more than 1000 hours? If it has already lasted 500 hours, what's the probability it lasts another 500 hours?

Solution:

  1. Distribution: XExp(0.001)X \sim \text{Exp}(0.001), mean lifetime = 1/0.001=10001/0.001 = 1000 hours
  2. Probability of lasting more than 1000 hours:
    P(X>1000)=eλt=e0.001×1000=e10.368P(X > 1000) = e^{-\lambda t} = e^{-0.001 \times 1000} = e^{-1} \approx 0.368
  3. Use memoryless property for conditional probability:
    P(X>1000X>500)=P(X>500)P(X > 1000 \mid X > 500) = P(X > 500)
  4. Calculate: P(X>500)=e0.001×500=e0.50.606P(X > 500) = e^{-0.001 \times 500} = e^{-0.5} \approx 0.606

Key Insight:

The memoryless property means the component's past survival doesn't affect its future lifetime - a unique property of exponential distributions. This makes it ideal for modeling "wear-free" failures.

Advanced Distributions

Derived distributions essential for statistical inference and hypothesis testing

Gamma Distribution
Generalizes exponential distribution for sums of waiting times

The gamma distribution Γ(α,λ)\Gamma(\alpha, \lambda) models the sum of α\alpha independent exponential random variables, widely used in Bayesian statistics and queuing theory.

f(x)=λαΓ(α)xα1eλx,x>0f(x) = \frac{\lambda^\alpha}{\Gamma(\alpha)} x^{\alpha-1} e^{-\lambda x}, \quad x > 0

Parameters: α>0\alpha > 0 (shape), λ>0\lambda > 0 (rate). Γ(α)\Gamma(\alpha) is the gamma function.

Mean

E[X]=αλE[X] = \frac{\alpha}{\lambda}

Variance

Var(X)=αλ2\text{Var}(X) = \frac{\alpha}{\lambda^2}

Additivity

Γ(α1,λ)+Γ(α2,λ)=Γ(α1+α2,λ)\Gamma(\alpha_1, \lambda) + \Gamma(\alpha_2, \lambda) = \Gamma(\alpha_1+\alpha_2, \lambda)

Special Cases

Γ(1,λ)=Exp(λ)\Gamma(1, \lambda) = \text{Exp}(\lambda)

Γ(n/2,1/2)=χ2(n)\Gamma(n/2, 1/2) = \chi^2(n)

Applications:
  • Waiting time until kk-th event in Poisson process
  • Bayesian inference: conjugate prior for Poisson rate
  • Rainfall modeling and insurance claim amounts
Example: Time Until Third Failure

Problem:

Machine failures occur at rate λ=0.5\lambda = 0.5 per day (exponentially distributed). What is the expected time until the third failure? What's the probability the third failure occurs within 10 days?

Solution:

  1. Time until 3rd failure: XΓ(α=3,λ=0.5)X \sim \Gamma(\alpha=3, \lambda=0.5)
  2. Expected time: E[X]=α/λ=3/0.5=6E[X] = \alpha/\lambda = 3/0.5 = 6 days
  3. Probability within 10 days requires integrating the PDF:
    P(X10)=0100.53Γ(3)x2e0.5xdxP(X \leq 10) = \int_0^{10} \frac{0.5^3}{\Gamma(3)} x^2 e^{-0.5x} dx
  4. Since Γ(3)=2!\Gamma(3) = 2!:
    P(X10)=1e5(1+5+12.5)0.875P(X \leq 10) = 1 - e^{-5}(1 + 5 + 12.5) \approx 0.875

Key Insight:

Gamma distribution models the sum of independent exponential waiting times. Use the additivity property and CDF to calculate probabilities for multiple events.

Chi-Square Distribution
Sum of squared standard normal variables, fundamental in hypothesis testing

The chi-square distribution χ2(n)\chi^2(n) arises as the distribution of the sum of squares of nn independent standard normal random variables.

f(x)=12n/2Γ(n/2)xn/21ex/2,x>0f(x) = \frac{1}{2^{n/2}\Gamma(n/2)} x^{n/2-1} e^{-x/2}, \quad x > 0

Parameter: n1n \geq 1 (degrees of freedom)

Definition

χ2(n)=i=1nZi2\chi^2(n) = \sum_{i=1}^n Z_i^2

where ZiN(0,1)Z_i \sim N(0,1)

Mean & Variance

E[X]=n,Var(X)=2nE[X] = n, \quad \text{Var}(X) = 2n

Additivity

χ2(n1)+χ2(n2)=χ2(n1+n2)\chi^2(n_1) + \chi^2(n_2) = \chi^2(n_1+n_2)

Relation to Gamma

χ2(n)=Γ(n/2,1/2)\chi^2(n) = \Gamma(n/2, 1/2)
Applications:
  • Goodness-of-fit testing for categorical data
  • Testing independence in contingency tables
  • Confidence intervals for variance in normal populations
Example: Sample Variance Distribution

Problem:

A sample of n=10n=10 observations from N(μ,σ2=25)N(\mu, \sigma^2=25) has sample variance S2S^2. Find the distribution of (n1)S2/σ2(n-1)S^2/\sigma^2 and calculate P(S2>35)P(S^2 > 35).

Solution:

  1. Key theorem: For normal samples, (n1)S2σ2χ2(n1)\frac{(n-1)S^2}{\sigma^2} \sim \chi^2(n-1)
  2. Here: 9S225χ2(9)\frac{9S^2}{25} \sim \chi^2(9)
  3. Want P(S2>35)P(S^2 > 35):
    P(S2>35)=P(9S225>9×3525)=P(χ2(9)>12.6)P(S^2 > 35) = P\left(\frac{9S^2}{25} > \frac{9 \times 35}{25}\right) = P(\chi^2(9) > 12.6)
  4. From chi-square table: P(χ2(9)>12.6)0.18P(\chi^2(9) > 12.6) \approx 0.18
  5. About 18% chance sample variance exceeds 35

Key Insight:

Sample variance from normal populations follows a scaled chi-square distribution. This forms the basis for variance testing and confidence intervals.

Student's t-Distribution
For small sample inference when population variance is unknown

The t-distribution t(n)t(n) is used for inference about means when the population standard deviation is unknown and sample size is small.

f(t)=Γ((n+1)/2)nπΓ(n/2)(1+t2n)(n+1)/2f(t) = \frac{\Gamma((n+1)/2)}{\sqrt{n\pi}\Gamma(n/2)} \left(1 + \frac{t^2}{n}\right)^{-(n+1)/2}

Parameter: n1n \geq 1 (degrees of freedom)

Definition

T=XK/nT = \frac{X}{\sqrt{K/n}}

where XN(0,1)X \sim N(0,1), Kχ2(n)K \sim \chi^2(n)

Properties

Symmetric around 0

E[T]=0E[T] = 0 (if n2n \geq 2)

Var(T)=nn2\text{Var}(T) = \frac{n}{n-2} (if n3n \geq 3)

Limit Behavior

t(n)N(0,1) as nt(n) \to N(0,1) \text{ as } n \to \infty

Heavier Tails

More probability in tails than normal

Applications:
  • Confidence intervals for population mean when σ\sigma unknown
  • One-sample and two-sample t-tests
  • Regression coefficient significance testing
Example: Small Sample Confidence Interval

Problem:

A sample of n=9n=9 measurements has mean xˉ=50\bar{x}=50 and sample standard deviation s=6s=6. Construct a 95% confidence interval for the population mean, assuming normality.

Solution:

  1. Since σ\sigma is unknown, use t-distribution with n1=8n-1=8 df
  2. For 95% CI, find t0.025(8)t_{0.025}(8) from t-table: t0.025(8)=2.306t_{0.025}(8) = 2.306
  3. Confidence interval formula:
    xˉ±tα/2(n1)sn\bar{x} \pm t_{\alpha/2}(n-1) \cdot \frac{s}{\sqrt{n}}
  4. Standard error: SE=s/n=6/9=2\text{SE} = s/\sqrt{n} = 6/\sqrt{9} = 2
  5. Margin of error: ME=2.306×2=4.612\text{ME} = 2.306 \times 2 = 4.612
  6. 95% CI: (504.612,50+4.612)=(45.39,54.61)(50 - 4.612, 50 + 4.612) = (45.39, 54.61)

Key Insight:

Use t-distribution instead of normal when sample size is small (n<30n < 30) and σ\sigma is unknown. The wider interval accounts for additional uncertainty from estimating σ\sigma.

F-Distribution
Ratio of chi-square variables for comparing variances

The F-distribution F(m,n)F(m,n) is the ratio of two independent chi-square variables, fundamental in ANOVA and variance testing.

F=K1/mK2/nwhere K1χ2(m),K2χ2(n)F = \frac{K_1/m}{K_2/n} \quad \text{where } K_1 \sim \chi^2(m), K_2 \sim \chi^2(n)

Parameters: m,n1m, n \geq 1 (degrees of freedom)

Mean

E[F]=nn2E[F] = \frac{n}{n-2}

for n>2n > 2

Reciprocal Property

1FF(n,m)\frac{1}{F} \sim F(n,m)

Quantile Relation

F1α(m,n)=1Fα(n,m)F_{1-\alpha}(m,n) = \frac{1}{F_\alpha(n,m)}

Connection to t

t2(n)F(1,n)t^2(n) \sim F(1,n)
Applications:
  • Testing equality of two population variances
  • ANOVA F-tests for comparing multiple means
  • Regression model significance testing
Example: Comparing Two Variances

Problem:

Two independent samples from normal populations have sample variances s12=45s_1^2 = 45 (n1=10n_1=10) and s22=20s_2^2 = 20 (n2=15n_2=15). Test if the population variances are equal at α=0.05\alpha=0.05.

Solution:

  1. Test statistic: F=s12s22=4520=2.25F = \frac{s_1^2}{s_2^2} = \frac{45}{20} = 2.25
  2. Under H0:σ12=σ22H_0: \sigma_1^2 = \sigma_2^2, FF(9,14)F \sim F(9, 14)
  3. Critical values for two-tailed test at α=0.05\alpha=0.05:
  4. Upper: F0.025(9,14)3.21F_{0.025}(9,14) \approx 3.21
  5. Lower: F0.975(9,14)=1/F0.025(14,9)1/3.800.26F_{0.975}(9,14) = 1/F_{0.025}(14,9) \approx 1/3.80 \approx 0.26
  6. Decision: Since 0.26<2.25<3.210.26 < 2.25 < 3.21, fail to reject H0H_0
  7. Conclusion: Insufficient evidence that variances differ

Key Insight:

F-test compares variance ratio to F-distribution. Use reciprocal property to find lower critical value. Always put larger variance in numerator for one-tailed tests.

Exponential Family Theory

Unified framework connecting many important distributions

General Form

A distribution belongs to the exponential family if its density can be written as:

f(x;θ)=c(θ)exp{j=1kQj(θ)Tj(x)}h(x)f(x;\theta) = c(\theta) \exp\left\{\sum_{j=1}^k Q_j(\theta) T_j(x)\right\} h(x)

c(θ)c(\theta)

Normalizing constant depending only on θ\theta

Qj(θ)Q_j(\theta)

Natural parameter functions

Tj(x)T_j(x)

Sufficient statistics

h(x)h(x)

Base measure (independent of θ\theta)

Examples:

Normal N(μ,σ2)N(\mu, \sigma^2):

c(μ,σ2)exp{μσ2x12σ2x2}12πc(\mu,\sigma^2) \exp\left\{\frac{\mu}{\sigma^2}x - \frac{1}{2\sigma^2}x^2\right\} \cdot \frac{1}{\sqrt{2\pi}}

T1(x)=xT_1(x)=x, T2(x)=x2T_2(x)=x^2

Poisson P(λ)P(\lambda):

eλexp{ln(λ)x}1x!e^{-\lambda} \exp\{\ln(\lambda) \cdot x\} \cdot \frac{1}{x!}

T(x)=xT(x)=x, natural parameter η=ln(λ)\eta=\ln(\lambda)

Key Properties:
  • Sufficient statistics have finite dimension
  • MLE has nice asymptotic properties
  • Conjugate priors exist for Bayesian inference
Example: Verifying Exponential Family

Problem:

Show that the binomial distribution B(n,p)B(n,p) belongs to the exponential family and identify its natural parameter and sufficient statistic.

Solution:

  1. Start with PMF: P(X=k)=(nk)pk(1p)nkP(X=k) = \binom{n}{k}p^k(1-p)^{n-k}
  2. Rewrite using logarithms:
    P(X=k)=(nk)(1p)nexp{kln(p1p)}P(X=k) = \binom{n}{k}(1-p)^n \exp\left\{k \ln\left(\frac{p}{1-p}\right)\right\}
  3. Identify components:

    c(p)=(1p)nc(p) = (1-p)^n

    Q(p)=ln(p/(1p))Q(p) = \ln(p/(1-p)) (natural parameter)

    T(k)=kT(k) = k (sufficient statistic)

    h(k)=(nk)h(k) = \binom{n}{k}

  4. Therefore B(n,p)B(n,p) is in exponential family with kk as sufficient statistic

Key Insight:

Converting to exponential family form reveals the sufficient statistic and natural parameter. The natural parameter η=ln(p/(1p))\eta = \ln(p/(1-p)) is the log-odds.

Distribution Relationships

Understanding how distributions connect and derive from each other

Gamma Family
Γ(1,λ)=Exp(λ)\Gamma(1, \lambda) = \text{Exp}(\lambda)
Γ(n/2,1/2)=χ2(n)\Gamma(n/2, 1/2) = \chi^2(n)
Additivity: Γ(α1,λ)+Γ(α2,λ)=Γ(α1+α2,λ)\Gamma(\alpha_1,\lambda) + \Gamma(\alpha_2,\lambda) = \Gamma(\alpha_1+\alpha_2,\lambda)
Normal Connections
Z=(Xμ)/σN(0,1)Z = (X-\mu)/\sigma \sim N(0,1)
i=1nZi2χ2(n)\sum_{i=1}^n Z_i^2 \sim \chi^2(n)
CLT: XˉN(μ,σ2/n)\bar{X} \to N(\mu, \sigma^2/n) for large nn
t and F Origins
t(n)=N(0,1)χ2(n)/nt(n) = \frac{N(0,1)}{\sqrt{\chi^2(n)/n}}
F(m,n)=χ2(m)/mχ2(n)/nF(m,n) = \frac{\chi^2(m)/m}{\chi^2(n)/n}
t2(n)=F(1,n)t^2(n) = F(1,n)
Discrete Limits
Poisson Approx: B(n,p)P(np)B(n,p) \to P(np) as n,p0n\to\infty, p\to 0
Normal Approx: B(n,p)N(np,np(1p))B(n,p) \to N(np, np(1-p)) for large nn
P(λ)N(λ,λ)P(\lambda) \to N(\lambda, \lambda) for large λ\lambda

Rigorous Theorem Proofs

Step-by-step mathematical derivations of fundamental distribution theorems

Proof: Poisson Limit Theorem
Binomial converges to Poisson under rare event conditions

Theorem Statement:

Let XnB(n,pn)X_n \sim B(n, p_n) where nn \to \infty, pn0p_n \to 0, and npnλnp_n \to \lambda for some constant λ>0\lambda > 0. Then:

P(Xn=k)λkeλk!as nP(X_n = k) \to \frac{\lambda^k e^{-\lambda}}{k!} \quad \text{as } n \to \infty

That is, B(n,pn)B(n, p_n) converges in distribution to P(λ)P(\lambda).

Proof:

  1. Step 1 (Start with Binomial PMF): For XnB(n,pn)X_n \sim B(n, p_n):
    P(Xn=k)=(nk)pnk(1pn)nkP(X_n = k) = \binom{n}{k} p_n^k (1-p_n)^{n-k}
  2. Step 2 (Expand Binomial Coefficient): Write out the combination:
    P(Xn=k)=n!k!(nk)!pnk(1pn)nkP(X_n = k) = \frac{n!}{k!(n-k)!} p_n^k (1-p_n)^{n-k}
    =n(n1)(n2)(nk+1)k!pnk(1pn)nk= \frac{n(n-1)(n-2)\cdots(n-k+1)}{k!} p_n^k (1-p_n)^{n-k}
  3. Step 3 (Substitute pn=λ/n+o(1/n)p_n = \lambda/n + o(1/n)): Since npnλnp_n \to \lambda, we have pnλ/np_n \sim \lambda/n:
    P(Xn=k)=n(n1)(nk+1)nk(npn)kk!(1pn)n(1pn)kP(X_n = k) = \frac{n(n-1)\cdots(n-k+1)}{n^k} \cdot \frac{(np_n)^k}{k!} \cdot (1-p_n)^n \cdot (1-p_n)^{-k}
  4. Step 4 (Take Limit of Each Factor): As nn \to \infty:
    n(n1)(nk+1)nk=nnn1nnk+1n1\frac{n(n-1)\cdots(n-k+1)}{n^k} = \frac{n}{n} \cdot \frac{n-1}{n} \cdots \frac{n-k+1}{n} \to 1
    (Each factor approaches 1 for fixed kk)
  5. Step 5 (Exponential Limit): For the (1pn)n(1-p_n)^n term:
    (1pn)n=(1λn+o(1/n))neλ(1-p_n)^n = \left(1 - \frac{\lambda}{n} + o(1/n)\right)^n \to e^{-\lambda}
    Using the fundamental limit (1x/n)nex(1-x/n)^n \to e^{-x}.
  6. Step 6 (Remaining Term): The (1pn)k(1-p_n)^{-k} term:
    (1pn)k1k=1(1-p_n)^{-k} \to 1^{-k} = 1
    since pn0p_n \to 0 and kk is fixed.
  7. Step 7 (Parameter Convergence): Since npnλnp_n \to \lambda:
    (npn)kλk(np_n)^k \to \lambda^k
  8. Step 8 (Combine All Limits): Putting everything together:
    P(Xn=k)1λkk!eλ1=λkeλk!P(X_n = k) \to 1 \cdot \frac{\lambda^k}{k!} \cdot e^{-\lambda} \cdot 1 = \frac{\lambda^k e^{-\lambda}}{k!} \quad \blacksquare

Practical Significance:

  • Provides approximation: B(n,p)P(np)B(n,p) \approx P(np) when n>20,p<0.05n > 20, p < 0.05
  • Explains why Poisson models rare events in large populations
  • Justifies using simpler Poisson calculations instead of binomial
Proof: Chi-Square Distribution from Normal
Deriving chi-square as sum of squared standard normals using MGF

Theorem Statement:

Let Z1,Z2,,ZnZ_1, Z_2, \ldots, Z_n be independent N(0,1)N(0,1) random variables. Then:

X=i=1nZi2χ2(n)X = \sum_{i=1}^n Z_i^2 \sim \chi^2(n)

where χ2(n)\chi^2(n) is the chi-square distribution with nn degrees of freedom.

Proof:

  1. Step 1 (MGF of Single Squared Normal): First find the MGF of Y=Z2Y = Z^2 where ZN(0,1)Z \sim N(0,1):
    MY(t)=E[etZ2]=etz212πez2/2dzM_Y(t) = E[e^{tZ^2}] = \int_{-\infty}^{\infty} e^{tz^2} \cdot \frac{1}{\sqrt{2\pi}} e^{-z^2/2} dz
  2. Step 2 (Combine Exponents): Merge the exponential terms:
    MY(t)=12πez2(1/2t)dzM_Y(t) = \frac{1}{\sqrt{2\pi}} \int_{-\infty}^{\infty} e^{-z^2(1/2 - t)} dz
    This integral converges when 1/2t>01/2 - t > 0, i.e., t<1/2t < 1/2.
  3. Step 3 (Complete the Square): Rewrite as Gaussian integral:
    MY(t)=12πez22/(12t)dzM_Y(t) = \frac{1}{\sqrt{2\pi}} \int_{-\infty}^{\infty} e^{-\frac{z^2}{2/(1-2t)}} dz
    This is a normal density with variance σ2=1/(12t)\sigma^2 = 1/(1-2t), so:
    MY(t)=12π2π112t=(12t)1/2M_Y(t) = \frac{1}{\sqrt{2\pi}} \cdot \sqrt{2\pi} \cdot \frac{1}{\sqrt{1-2t}} = (1-2t)^{-1/2}
  4. Step 4 (MGF of Sum): For independent Z1,,ZnZ_1, \ldots, Z_n:
    MX(t)=E[etZi2]=E[i=1netZi2]M_X(t) = E\left[e^{t\sum Z_i^2}\right] = E\left[\prod_{i=1}^n e^{tZ_i^2}\right]
    By independence:
    MX(t)=i=1nE[etZi2]=i=1n(12t)1/2M_X(t) = \prod_{i=1}^n E[e^{tZ_i^2}] = \prod_{i=1}^n (1-2t)^{-1/2}
  5. Step 5 (Simplify Product):
    MX(t)=[(12t)1/2]n=(12t)n/2M_X(t) = [(1-2t)^{-1/2}]^n = (1-2t)^{-n/2}
  6. Step 6 (Recognize Chi-Square MGF): The MGF (12t)n/2(1-2t)^{-n/2} uniquely identifies the χ2(n)\chi^2(n) distribution.
  7. Step 7 (Verify with Gamma): Note that χ2(n)=Γ(n/2,1/2)\chi^2(n) = \Gamma(n/2, 1/2), whose MGF is:
    MΓ(t)=(1t1/2)n/2=(12t)n/2M_{\Gamma}(t) = \left(1 - \frac{t}{1/2}\right)^{-n/2} = (1-2t)^{-n/2} \quad \checkmark
  8. Step 8 (Conclusion by MGF Uniqueness): Since MGFs uniquely determine distributions:
    i=1nZi2χ2(n)\sum_{i=1}^n Z_i^2 \sim \chi^2(n) \quad \blacksquare

Key Implications:

  • Sample variance from normal data: (n1)S2/σ2χ2(n1)(n-1)S^2/\sigma^2 \sim \chi^2(n-1)
  • Foundation for goodness-of-fit tests and contingency tables
  • Basis for deriving t and F distributions
Proof: Moment Generating Function Uniqueness
MGF uniquely determines the probability distribution

Theorem Statement:

If two random variables XX and YY have moment generating functions MX(t)M_X(t) and MY(t)M_Y(t) that exist and are equal in an open interval containing 0, then XX and YY have the same distribution.

MX(t)=MY(t) for t<ϵFX=FYM_X(t) = M_Y(t) \text{ for } |t| < \epsilon \quad \Rightarrow \quad F_X = F_Y

Proof (Sketch):

  1. Step 1 (MGF Defines All Moments): If MX(t)M_X(t) exists in a neighborhood of 0, we can expand:
    MX(t)=E[etX]=k=0tkk!E[Xk]=k=0tkμkk!M_X(t) = E[e^{tX}] = \sum_{k=0}^{\infty} \frac{t^k}{k!} E[X^k] = \sum_{k=0}^{\infty} \frac{t^k \mu_k}{k!}
    where μk=E[Xk]\mu_k = E[X^k] are the moments.
  2. Step 2 (Extract Moments): By differentiating the MGF:
    MX(k)(0)=dkMXdtkt=0=E[Xk]M_X^{(k)}(0) = \frac{d^k M_X}{dt^k}\bigg|_{t=0} = E[X^k]
    Thus, the MGF encodes all moments.
  3. Step 3 (Moment Equality): If MX(t)=MY(t)M_X(t) = M_Y(t) near 0:
    MX(k)(0)=MY(k)(0)k0M_X^{(k)}(0) = M_Y^{(k)}(0) \quad \forall k \geq 0
    Therefore: E[Xk]=E[Yk]E[X^k] = E[Y^k] for all kk.
  4. Step 4 (Moment Sequence Determines Distribution): Under regularity conditions (moments determine distribution - Carleman's condition), if all moments match:
    {E[Xk]}k=0={E[Yk]}k=0\{E[X^k]\}_{k=0}^{\infty} = \{E[Y^k]\}_{k=0}^{\infty}
    then the distributions are identical.
  5. Step 5 (Analytic Uniqueness): More rigorously, the MGF is an analytic function. Two analytic functions equal on an open interval around 0 must be equal everywhere in their domain of analyticity.
  6. Step 6 (Inversion Formula): The distribution can be recovered from MGF via Fourier/Laplace inversion:
    FX(x)=L1{MX(t)}F_X(x) = \mathcal{L}^{-1}\{M_X(t)\}
    If MGFs are equal, inversions yield the same CDF.
  7. Step 7 (Conclusion): Therefore:
    MX(t)=MY(t) in neighborhood of 0P(Xx)=P(Yx) for all xM_X(t) = M_Y(t) \text{ in neighborhood of } 0 \quad \Rightarrow \quad P(X \leq x) = P(Y \leq x) \text{ for all } x \quad \blacksquare

Applications:

  • Proving distribution of sums: MX+Y(t)=MX(t)MY(t)M_{X+Y}(t) = M_X(t)M_Y(t) for independent X,YX, Y
  • Identifying distributions without deriving full PMF/PDF
  • Central Limit Theorem proof uses MGF convergence

Frequently Asked Questions

Common questions about probability distributions and their applications

When should I use binomial vs. Poisson distribution?

Use binomial when you have a fixed number of trials (nn) with constant success probability (pp). Use Poisson when counting rare events in a continuous interval with no upper limit. As a rule of thumb, if n>20n > 20 and p<0.05p < 0.05, Poisson approximates binomial well with λ=np\lambda = np.

Why is the normal distribution so important?

The normal distribution appears naturally due to the Central Limit Theorem: sample means from any distribution approach normality as sample size increases. It's mathematically tractable (closed-form formulas), symmetric, and completely determined by two parameters (μ,σ2\mu, \sigma^2). This makes it foundational for inference, hypothesis testing, and confidence intervals.

What's the difference between exponential and gamma distributions?

Exponential Exp(λ)\text{Exp}(\lambda) models waiting time until the first event in a Poisson process. Gamma Γ(α,λ)\Gamma(\alpha, \lambda) generalizes this to waiting time until the α\alpha-th event. In fact, Exp(λ)=Γ(1,λ)\text{Exp}(\lambda) = \Gamma(1, \lambda). Gamma has two parameters allowing more flexible shapes, while exponential has only rate λ\lambda.

When do I use t-distribution instead of normal?

Use t-distribution when: (1) sample size is small (typically n<30n < 30), (2) population variance is unknown, and (3) you're estimating the population standard deviation from sample data. As degrees of freedom increase, t(n)N(0,1)t(n) \to N(0,1). For large samples, t and normal are nearly identical.

What is the memoryless property and which distributions have it?

The memoryless property states P(X>s+tX>s)=P(X>t)P(X > s+t \mid X > s) = P(X > t): the future doesn't depend on the past. Only exponential (continuous) and geometric (discrete) distributions have this property. It's ideal for modeling "wear-free" failures where components don't age. For systems that do wear out, use Weibull or gamma instead.

How do chi-square, t, and F distributions relate?

All derive from normal distributions. Chi-square: χ2(n)=Zi2\chi^2(n) = \sum Z_i^2 (sum of squared normals). t-distribution: t(n)=Z/χ2(n)/nt(n) = Z/\sqrt{\chi^2(n)/n} (normal÷chi-square). F-distribution: F(m,n)=(χ2(m)/m)/(χ2(n)/n)F(m,n) = (\chi^2(m)/m)/(\chi^2(n)/n) (ratio of chi-squares). Also, t2(n)=F(1,n)t^2(n) = F(1,n). These relationships connect variance testing, mean testing, and ANOVA.

What is the 68-95-99.7 rule for normal distributions?

For normal N(μ,σ2)N(\mu, \sigma^2): approximately 68% of data falls within μ±σ\mu \pm \sigma, 95% within μ±2σ\mu \pm 2\sigma, and 99.7% within μ±3σ\mu \pm 3\sigma. This empirical rule helps quickly assess outliers and construct confidence intervals. Values beyond 3σ3\sigma are rare (0.3% probability) and often investigated as potential anomalies.

What makes a distribution part of the exponential family?

A distribution belongs to the exponential family if its density can be written as f(x;θ)=c(θ)exp{Qj(θ)Tj(x)}h(x)f(x;\theta) = c(\theta)\exp\{\sum Q_j(\theta)T_j(x)\}h(x). Benefits include: sufficient statistics of finite dimension, nice MLE properties, and existence of conjugate priors for Bayesian inference. Common members: normal, exponential, gamma, chi-square, binomial, Poisson, and beta.

How do I choose the right distribution for my data?

Consider: (1) Data type: discrete (binomial, Poisson) vs. continuous (normal, exponential). (2) Support: bounded ([0,1][0,1] → beta) vs. unbounded (R\mathbb{R} → normal). (3) Shape: symmetric (normal) vs. skewed (gamma, exponential). (4) Context: count data (Poisson), time-to-event (exponential), proportions (beta). Use goodness-of-fit tests (chi-square, Kolmogorov-Smirnov) and Q-Q plots to validate.

What's the relationship between sample size and distribution choice?

Small samples (n<30n < 30): Use exact distributions (t for means, F for variances, exact binomial). Large samples (n30n \geq 30): Central Limit Theorem allows normal approximations for many statistics. The quality of normal approximation depends on the parent distribution's shape: symmetric distributions need smaller nn, heavily skewed distributions need larger nn (sometimes n>50n > 50).