Distribution Families

6-8 Hours

Common Distribution Families

Master probability distribution families and their applications in statistical inference

Learning Objectives

What you'll master in probability distribution theory

Master fundamental distribution families: binomial, Poisson, normal, uniform, exponential
Understand advanced distributions: gamma, chi-square, t-distribution, F-distribution
Learn exponential family theory and its applications in statistical inference
Explore distribution relationships and transformation properties
Apply distribution knowledge to real-world statistical problems
Recognize when to use specific distributions in statistical modeling

Fundamental Distributions

Core probability distributions essential for statistical modeling

Binomial Distribution

Models the number of successes in

n

independent Bernoulli trials

The binomial distribution $B(n,p)$ describes the number of successes in a fixed number of independent trials.

P(X=k) = \binom{n}{k}p^k(1-p)^{n-k}, \quad k=0,1,\ldots,n

Parameters: $n \in \{1,2,3,\ldots\}$ (number of trials), $p \in (0,1)$ (success probability)

Mean

E[X] = np

Variance

\text{Var}(X) = np(1-p)

MGF

M_X(t) = (pe^t + 1-p)^n

Distribution Family

Exponential family

Applications:

Quality control: Number of defective items in a batch
Medical trials: Success rate of treatments across patients
Marketing: Response rates to advertising campaigns

Example: Quality Control Inspection

Problem:

A factory produces items with a 5% defect rate. If we inspect 20 randomly selected items, what is the probability of finding exactly 2 defective items? What is the expected number of defects?

Solution:

Identify parameters: $n=20$ trials, $p=0.05$ defect probability
Distribution: $X \sim B(20, 0.05)$
Calculate probability for exactly $k=2$ defects:
$P(X=2) = \binom{20}{2}(0.05)^2(0.95)^{18}$
Compute: $P(X=2) = 190 \times 0.0025 \times 0.3972 \approx 0.189$
Expected defects: $E[X] = np = 20 \times 0.05 = 1$

Key Insight:

The binomial distribution is appropriate when we have a fixed number of independent trials with constant success probability. Use the binomial formula directly for small $n$ , or normal approximation for large $n$ .

Poisson Distribution

Models the number of rare events occurring in a fixed interval

The Poisson distribution $P(\lambda)$ models the count of events occurring in a fixed time or space interval when events occur independently at a constant average rate.

P(X=k) = \frac{\lambda^k e^{-\lambda}}{k!}, \quad k=0,1,2,\ldots

Parameter: $\lambda > 0$ (average rate per interval)

Mean

E[X] = \lambda

Variance

\text{Var}(X) = \lambda

Limit Property

B(n,p) \to P(np) \text{ as } n \to \infty

when $p \to 0$ , $np \to \lambda$

Additive Property

X_1 + X_2 \sim P(\lambda_1 + \lambda_2)

for independent Poisson RVs

Applications:

Traffic analysis: Accidents per time period on highways
Telecommunications: Call arrivals at service centers
Biology: Mutation counts in DNA sequences

Example: Call Center Arrivals

Problem:

A call center receives an average of 4 calls per minute. Assuming calls arrive independently, what is the probability of receiving exactly 6 calls in the next minute? At most 2 calls?

Solution:

Model: Calls follow $X \sim P(\lambda = 4)$
Probability of exactly 6 calls:
$P(X=6) = \frac{4^6 e^{-4}}{6!} = \frac{4096 \times 0.0183}{720} \approx 0.104$
Probability of at most 2 calls:
$P(X \leq 2) = P(X=0) + P(X=1) + P(X=2)$
Computing: $P(X \leq 2) = e^{-4}(1 + 4 + 8) = 13e^{-4} \approx 0.238$

Key Insight:

Poisson is ideal for counting rare events in fixed intervals. The mean equals the variance ( $\lambda = 4$ ), and we can sum individual probabilities for cumulative calculations.

Normal Distribution

The most important continuous distribution in statistics

The normal (Gaussian) distribution $N(\mu, \sigma^2)$ is fundamental to statistics, appearing naturally in many phenomena due to the Central Limit Theorem.

f(x) = \frac{1}{\sigma\sqrt{2\pi}} \exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right), \quad x \in \mathbb{R}

Parameters: $\mu \in \mathbb{R}$ (mean), $\sigma > 0$ (standard deviation)

Mean

E[X] = \mu

Variance

\text{Var}(X) = \sigma^2

Standard Normal

Z = \frac{X - \mu}{\sigma} \sim N(0,1)

68-95-99.7 Rule

68% within $\mu \pm \sigma$

95% within $\mu \pm 2\sigma$

99.7% within $\mu \pm 3\sigma$

Applications:

Natural phenomena: Heights, weights, measurement errors
Financial returns and risk modeling
Central Limit Theorem applications for sample means

Example: Standardizing Test Scores

Problem:

Test scores are normally distributed with mean $\mu = 75$ and standard deviation $\sigma = 10$ . What percentage of students score above 90? What score separates the top 10% of students?

Solution:

Distribution: $X \sim N(75, 10^2)$
Standardize for $P(X > 90)$ :
$Z = \frac{90 - 75}{10} = 1.5$
From standard normal table: $P(Z > 1.5) = 1 - \Phi(1.5) = 1 - 0.9332 = 0.0668$
About 6.68% score above 90
For top 10%: Find $x$ where $P(X > x) = 0.10$
This means $P(Z > z) = 0.10$ , so $z = 1.28$
Convert back: $x = \mu + z\sigma = 75 + 1.28(10) = 87.8$

Key Insight:

Always standardize normal distributions using $Z = (X - \mu)/\sigma$ to use standard normal tables. For percentiles, work backwards from $Z$ -scores to original units.

Exponential Distribution

Models waiting times and lifetimes with the memoryless property

The exponential distribution $\text{Exp}(\lambda)$ models the time until an event occurs in a Poisson process, characterized by the unique memoryless property.

f(x) = \lambda e^{-\lambda x}, \quad x > 0

Parameter: $\lambda > 0$ (rate parameter)

Mean

E[X] = \frac{1}{\lambda}

Variance

\text{Var}(X) = \frac{1}{\lambda^2}

Memoryless Property

P(X > s+t \mid X > s) = P(X > t)

Past doesn't affect future

Relation to Gamma

\text{Exp}(\lambda) = \Gamma(1, \lambda)

Applications:

Product lifetime and reliability engineering
Service times in queuing systems
Radioactive decay and failure time modeling

Example: Component Lifetime

Problem:

Electronic components have lifetimes that follow $\text{Exp}(\lambda = 0.001)$ failures per hour. What is the probability a component lasts more than 1000 hours? If it has already lasted 500 hours, what's the probability it lasts another 500 hours?

Solution:

Distribution: $X \sim \text{Exp}(0.001)$ , mean lifetime = $1/0.001 = 1000$ hours
Probability of lasting more than 1000 hours:
$P(X > 1000) = e^{-\lambda t} = e^{-0.001 \times 1000} = e^{-1} \approx 0.368$
Use memoryless property for conditional probability:
$P(X > 1000 \mid X > 500) = P(X > 500)$
Calculate: $P(X > 500) = e^{-0.001 \times 500} = e^{-0.5} \approx 0.606$

Key Insight:

The memoryless property means the component's past survival doesn't affect its future lifetime - a unique property of exponential distributions. This makes it ideal for modeling "wear-free" failures.

Advanced Distributions

Derived distributions essential for statistical inference and hypothesis testing

Gamma Distribution

Generalizes exponential distribution for sums of waiting times

The gamma distribution $\Gamma(\alpha, \lambda)$ models the sum of $\alpha$ independent exponential random variables, widely used in Bayesian statistics and queuing theory.

f(x) = \frac{\lambda^\alpha}{\Gamma(\alpha)} x^{\alpha-1} e^{-\lambda x}, \quad x > 0

Parameters: $\alpha > 0$ (shape), $\lambda > 0$ (rate). $\Gamma(\alpha)$ is the gamma function.

Mean

E[X] = \frac{\alpha}{\lambda}

Variance

\text{Var}(X) = \frac{\alpha}{\lambda^2}

Additivity

\Gamma(\alpha_1, \lambda) + \Gamma(\alpha_2, \lambda) = \Gamma(\alpha_1+\alpha_2, \lambda)

Special Cases

$\Gamma(1, \lambda) = \text{Exp}(\lambda)$

$\Gamma(n/2, 1/2) = \chi^2(n)$

Applications:

Waiting time until $k$ -th event in Poisson process
Bayesian inference: conjugate prior for Poisson rate
Rainfall modeling and insurance claim amounts

Example: Time Until Third Failure

Problem:

Machine failures occur at rate $\lambda = 0.5$ per day (exponentially distributed). What is the expected time until the third failure? What's the probability the third failure occurs within 10 days?

Solution:

Time until 3rd failure: $X \sim \Gamma(\alpha=3, \lambda=0.5)$
Expected time: $E[X] = \alpha/\lambda = 3/0.5 = 6$ days
Probability within 10 days requires integrating the PDF:
$P(X \leq 10) = \int_0^{10} \frac{0.5^3}{\Gamma(3)} x^2 e^{-0.5x} dx$
Since $\Gamma(3) = 2!$ :
$P(X \leq 10) = 1 - e^{-5}(1 + 5 + 12.5) \approx 0.875$

Key Insight:

Gamma distribution models the sum of independent exponential waiting times. Use the additivity property and CDF to calculate probabilities for multiple events.

Chi-Square Distribution

Sum of squared standard normal variables, fundamental in hypothesis testing

The chi-square distribution $\chi^2(n)$ arises as the distribution of the sum of squares of $n$ independent standard normal random variables.

f(x) = \frac{1}{2^{n/2}\Gamma(n/2)} x^{n/2-1} e^{-x/2}, \quad x > 0

Parameter: $n \geq 1$ (degrees of freedom)

Definition

\chi^2(n) = \sum_{i=1}^n Z_i^2

where $Z_i \sim N(0,1)$

Mean & Variance

E[X] = n, \quad \text{Var}(X) = 2n

Additivity

\chi^2(n_1) + \chi^2(n_2) = \chi^2(n_1+n_2)

Relation to Gamma

\chi^2(n) = \Gamma(n/2, 1/2)

Applications:

Goodness-of-fit testing for categorical data
Testing independence in contingency tables
Confidence intervals for variance in normal populations

Example: Sample Variance Distribution

Problem:

A sample of $n=10$ observations from $N(\mu, \sigma^2=25)$ has sample variance $S^2$ . Find the distribution of $(n-1)S^2/\sigma^2$ and calculate $P(S^2 > 35)$ .

Solution:

Key theorem: For normal samples, $\frac{(n-1)S^2}{\sigma^2} \sim \chi^2(n-1)$
Here: $\frac{9S^2}{25} \sim \chi^2(9)$
Want $P(S^2 > 35)$ :
$P(S^2 > 35) = P\left(\frac{9S^2}{25} > \frac{9 \times 35}{25}\right) = P(\chi^2(9) > 12.6)$
From chi-square table: $P(\chi^2(9) > 12.6) \approx 0.18$
About 18% chance sample variance exceeds 35

Key Insight:

Sample variance from normal populations follows a scaled chi-square distribution. This forms the basis for variance testing and confidence intervals.

Student's t-Distribution

For small sample inference when population variance is unknown

The t-distribution $t(n)$ is used for inference about means when the population standard deviation is unknown and sample size is small.

f(t) = \frac{\Gamma((n+1)/2)}{\sqrt{n\pi}\Gamma(n/2)} \left(1 + \frac{t^2}{n}\right)^{-(n+1)/2}

Parameter: $n \geq 1$ (degrees of freedom)

Definition

T = \frac{X}{\sqrt{K/n}}

where $X \sim N(0,1)$ , $K \sim \chi^2(n)$

Properties

Symmetric around 0

$E[T] = 0$ (if $n \geq 2$ )

$\text{Var}(T) = \frac{n}{n-2}$ (if $n \geq 3$ )

Limit Behavior

t(n) \to N(0,1) \text{ as } n \to \infty

Heavier Tails

More probability in tails than normal

Applications:

Confidence intervals for population mean when $\sigma$ unknown
One-sample and two-sample t-tests
Regression coefficient significance testing

Example: Small Sample Confidence Interval

Problem:

A sample of $n=9$ measurements has mean $\bar{x}=50$ and sample standard deviation $s=6$ . Construct a 95% confidence interval for the population mean, assuming normality.

Solution:

Since $\sigma$ is unknown, use t-distribution with $n-1=8$ df
For 95% CI, find $t_{0.025}(8)$ from t-table: $t_{0.025}(8) = 2.306$
Confidence interval formula:
$\bar{x} \pm t_{\alpha/2}(n-1) \cdot \frac{s}{\sqrt{n}}$
Standard error: $\text{SE} = s/\sqrt{n} = 6/\sqrt{9} = 2$
Margin of error: $\text{ME} = 2.306 \times 2 = 4.612$
95% CI: $(50 - 4.612, 50 + 4.612) = (45.39, 54.61)$

Key Insight:

Use t-distribution instead of normal when sample size is small ( $n < 30$ ) and $\sigma$ is unknown. The wider interval accounts for additional uncertainty from estimating $\sigma$ .

F-Distribution

Ratio of chi-square variables for comparing variances

The F-distribution $F(m,n)$ is the ratio of two independent chi-square variables, fundamental in ANOVA and variance testing.

F = \frac{K_1/m}{K_2/n} \quad \text{where } K_1 \sim \chi^2(m), K_2 \sim \chi^2(n)

Parameters: $m, n \geq 1$ (degrees of freedom)

Mean

E[F] = \frac{n}{n-2}

for $n > 2$

Reciprocal Property

\frac{1}{F} \sim F(n,m)

Quantile Relation

F_{1-\alpha}(m,n) = \frac{1}{F_\alpha(n,m)}

Connection to t

t^2(n) \sim F(1,n)

Applications:

Testing equality of two population variances
ANOVA F-tests for comparing multiple means
Regression model significance testing

Example: Comparing Two Variances

Problem:

Two independent samples from normal populations have sample variances $s_1^2 = 45$ ( $n_1=10$ ) and $s_2^2 = 20$ ( $n_2=15$ ). Test if the population variances are equal at $\alpha=0.05$ .

Solution:

Test statistic: $F = \frac{s_1^2}{s_2^2} = \frac{45}{20} = 2.25$
Under $H_0: \sigma_1^2 = \sigma_2^2$ , $F \sim F(9, 14)$
Critical values for two-tailed test at $\alpha=0.05$ :
Upper: $F_{0.025}(9,14) \approx 3.21$
Lower: $F_{0.975}(9,14) = 1/F_{0.025}(14,9) \approx 1/3.80 \approx 0.26$
Decision: Since $0.26 < 2.25 < 3.21$ , fail to reject $H_0$
Conclusion: Insufficient evidence that variances differ

Key Insight:

F-test compares variance ratio to F-distribution. Use reciprocal property to find lower critical value. Always put larger variance in numerator for one-tailed tests.

Exponential Family Theory

Unified framework connecting many important distributions

General Form

A distribution belongs to the exponential family if its density can be written as:

f(x;\theta) = c(\theta) \exp\left\{\sum_{j=1}^k Q_j(\theta) T_j(x)\right\} h(x)

$c(\theta)$

Normalizing constant depending only on $\theta$

$Q_j(\theta)$

Natural parameter functions

$T_j(x)$

Sufficient statistics

$h(x)$

Base measure (independent of $\theta$ )

Examples:

Normal $N(\mu, \sigma^2)$ :

c(\mu,\sigma^2) \exp\left\{\frac{\mu}{\sigma^2}x - \frac{1}{2\sigma^2}x^2\right\} \cdot \frac{1}{\sqrt{2\pi}}

$T_1(x)=x$ , $T_2(x)=x^2$

Poisson $P(\lambda)$ :

e^{-\lambda} \exp\{\ln(\lambda) \cdot x\} \cdot \frac{1}{x!}

$T(x)=x$ , natural parameter $\eta=\ln(\lambda)$

Key Properties:

Sufficient statistics have finite dimension
MLE has nice asymptotic properties
Conjugate priors exist for Bayesian inference

Example: Verifying Exponential Family

Problem:

Show that the binomial distribution $B(n,p)$ belongs to the exponential family and identify its natural parameter and sufficient statistic.

Solution:

Start with PMF: $P(X=k) = \binom{n}{k}p^k(1-p)^{n-k}$
Rewrite using logarithms:
$P(X=k) = \binom{n}{k}(1-p)^n \exp\left\{k \ln\left(\frac{p}{1-p}\right)\right\}$
Identify components:
$c(p) = (1-p)^n$
$Q(p) = \ln(p/(1-p))$ (natural parameter)
$T(k) = k$ (sufficient statistic)
$h(k) = \binom{n}{k}$
Therefore $B(n,p)$ is in exponential family with $k$ as sufficient statistic

Key Insight:

Converting to exponential family form reveals the sufficient statistic and natural parameter. The natural parameter $\eta = \ln(p/(1-p))$ is the log-odds.

Distribution Relationships

Understanding how distributions connect and derive from each other

Gamma Family

\Gamma(1, \lambda) = \text{Exp}(\lambda)

\Gamma(n/2, 1/2) = \chi^2(n)

Additivity:

\Gamma(\alpha_1,\lambda) + \Gamma(\alpha_2,\lambda) = \Gamma(\alpha_1+\alpha_2,\lambda)

Normal Connections

Z = (X-\mu)/\sigma \sim N(0,1)

\sum_{i=1}^n Z_i^2 \sim \chi^2(n)

CLT:

\bar{X} \to N(\mu, \sigma^2/n)

for large

n

t and F Origins

t(n) = \frac{N(0,1)}{\sqrt{\chi^2(n)/n}}

F(m,n) = \frac{\chi^2(m)/m}{\chi^2(n)/n}

t^2(n) = F(1,n)

Discrete Limits

Poisson Approx:

B(n,p) \to P(np)

n\to\infty, p\to 0

Normal Approx:

B(n,p) \to N(np, np(1-p))

for large

n

P(\lambda) \to N(\lambda, \lambda)

for large

\lambda

Rigorous Theorem Proofs

Step-by-step mathematical derivations of fundamental distribution theorems

Proof: Poisson Limit Theorem

Binomial converges to Poisson under rare event conditions

Theorem Statement:

Let $X_n \sim B(n, p_n)$ where $n \to \infty$ , $p_n \to 0$ , and $np_n \to \lambda$ for some constant $\lambda > 0$ . Then:

P(X_n = k) \to \frac{\lambda^k e^{-\lambda}}{k!} \quad \text{as } n \to \infty

That is, $B(n, p_n)$ converges in distribution to $P(\lambda)$ .

Proof:

Step 1 (Start with Binomial PMF): For $X_n \sim B(n, p_n)$ :
$P(X_n = k) = \binom{n}{k} p_n^k (1-p_n)^{n-k}$
Step 2 (Expand Binomial Coefficient): Write out the combination:
$P(X_n = k) = \frac{n!}{k!(n-k)!} p_n^k (1-p_n)^{n-k}$
$= \frac{n(n-1)(n-2)\cdots(n-k+1)}{k!} p_n^k (1-p_n)^{n-k}$
Step 3 (Substitute $p_n = \lambda/n + o(1/n)$ ): Since $np_n \to \lambda$ , we have $p_n \sim \lambda/n$ :
$P(X_n = k) = \frac{n(n-1)\cdots(n-k+1)}{n^k} \cdot \frac{(np_n)^k}{k!} \cdot (1-p_n)^n \cdot (1-p_n)^{-k}$
Step 4 (Take Limit of Each Factor): As $n \to \infty$ :
$\frac{n(n-1)\cdots(n-k+1)}{n^k} = \frac{n}{n} \cdot \frac{n-1}{n} \cdots \frac{n-k+1}{n} \to 1$
(Each factor approaches 1 for fixed $k$ )
Step 5 (Exponential Limit): For the $(1-p_n)^n$ term:
$(1-p_n)^n = \left(1 - \frac{\lambda}{n} + o(1/n)\right)^n \to e^{-\lambda}$
Using the fundamental limit $(1-x/n)^n \to e^{-x}$ .
Step 6 (Remaining Term): The $(1-p_n)^{-k}$ term:
$(1-p_n)^{-k} \to 1^{-k} = 1$
since $p_n \to 0$ and $k$ is fixed.
Step 7 (Parameter Convergence): Since $np_n \to \lambda$ :
$(np_n)^k \to \lambda^k$
Step 8 (Combine All Limits): Putting everything together:
$P(X_n = k) \to 1 \cdot \frac{\lambda^k}{k!} \cdot e^{-\lambda} \cdot 1 = \frac{\lambda^k e^{-\lambda}}{k!} \quad \blacksquare$

Practical Significance:

Provides approximation: $B(n,p) \approx P(np)$ when $n > 20, p < 0.05$
Explains why Poisson models rare events in large populations
Justifies using simpler Poisson calculations instead of binomial

Proof: Chi-Square Distribution from Normal

Deriving chi-square as sum of squared standard normals using MGF

Theorem Statement:

Let $Z_1, Z_2, \ldots, Z_n$ be independent $N(0,1)$ random variables. Then:

X = \sum_{i=1}^n Z_i^2 \sim \chi^2(n)

where $\chi^2(n)$ is the chi-square distribution with $n$ degrees of freedom.

Proof:

Step 1 (MGF of Single Squared Normal): First find the MGF of $Y = Z^2$ where $Z \sim N(0,1)$ :
$M_Y(t) = E[e^{tZ^2}] = \int_{-\infty}^{\infty} e^{tz^2} \cdot \frac{1}{\sqrt{2\pi}} e^{-z^2/2} dz$
Step 2 (Combine Exponents): Merge the exponential terms:
$M_Y(t) = \frac{1}{\sqrt{2\pi}} \int_{-\infty}^{\infty} e^{-z^2(1/2 - t)} dz$
This integral converges when $1/2 - t > 0$ , i.e., $t < 1/2$ .
Step 3 (Complete the Square): Rewrite as Gaussian integral:
$M_Y(t) = \frac{1}{\sqrt{2\pi}} \int_{-\infty}^{\infty} e^{-\frac{z^2}{2/(1-2t)}} dz$
This is a normal density with variance $\sigma^2 = 1/(1-2t)$ , so:
$M_Y(t) = \frac{1}{\sqrt{2\pi}} \cdot \sqrt{2\pi} \cdot \frac{1}{\sqrt{1-2t}} = (1-2t)^{-1/2}$
Step 4 (MGF of Sum): For independent $Z_1, \ldots, Z_n$ :
$M_X(t) = E\left[e^{t\sum Z_i^2}\right] = E\left[\prod_{i=1}^n e^{tZ_i^2}\right]$
By independence:
$M_X(t) = \prod_{i=1}^n E[e^{tZ_i^2}] = \prod_{i=1}^n (1-2t)^{-1/2}$
Step 5 (Simplify Product):
$M_X(t) = [(1-2t)^{-1/2}]^n = (1-2t)^{-n/2}$
Step 6 (Recognize Chi-Square MGF): The MGF $(1-2t)^{-n/2}$ uniquely identifies the $\chi^2(n)$ distribution.
Step 7 (Verify with Gamma): Note that $\chi^2(n) = \Gamma(n/2, 1/2)$ , whose MGF is:
$M_{\Gamma}(t) = \left(1 - \frac{t}{1/2}\right)^{-n/2} = (1-2t)^{-n/2} \quad \checkmark$
Step 8 (Conclusion by MGF Uniqueness): Since MGFs uniquely determine distributions:
$\sum_{i=1}^n Z_i^2 \sim \chi^2(n) \quad \blacksquare$

Key Implications:

Sample variance from normal data: $(n-1)S^2/\sigma^2 \sim \chi^2(n-1)$
Foundation for goodness-of-fit tests and contingency tables
Basis for deriving t and F distributions

Proof: Moment Generating Function Uniqueness

MGF uniquely determines the probability distribution

Theorem Statement:

If two random variables $X$ and $Y$ have moment generating functions $M_X(t)$ and $M_Y(t)$ that exist and are equal in an open interval containing 0, then $X$ and $Y$ have the same distribution.

M_X(t) = M_Y(t) \text{ for } |t| < \epsilon \quad \Rightarrow \quad F_X = F_Y

Proof (Sketch):

Step 1 (MGF Defines All Moments): If $M_X(t)$ exists in a neighborhood of 0, we can expand:
$M_X(t) = E[e^{tX}] = \sum_{k=0}^{\infty} \frac{t^k}{k!} E[X^k] = \sum_{k=0}^{\infty} \frac{t^k \mu_k}{k!}$
where $\mu_k = E[X^k]$ are the moments.
Step 2 (Extract Moments): By differentiating the MGF:
$M_X^{(k)}(0) = \frac{d^k M_X}{dt^k}\bigg|_{t=0} = E[X^k]$
Thus, the MGF encodes all moments.
Step 3 (Moment Equality): If $M_X(t) = M_Y(t)$ near 0:
$M_X^{(k)}(0) = M_Y^{(k)}(0) \quad \forall k \geq 0$
Therefore: $E[X^k] = E[Y^k]$ for all $k$ .
Step 4 (Moment Sequence Determines Distribution): Under regularity conditions (moments determine distribution - Carleman's condition), if all moments match:
$\{E[X^k]\}_{k=0}^{\infty} = \{E[Y^k]\}_{k=0}^{\infty}$
then the distributions are identical.
Step 5 (Analytic Uniqueness): More rigorously, the MGF is an analytic function. Two analytic functions equal on an open interval around 0 must be equal everywhere in their domain of analyticity.
Step 6 (Inversion Formula): The distribution can be recovered from MGF via Fourier/Laplace inversion:
$F_X(x) = \mathcal{L}^{-1}\{M_X(t)\}$
If MGFs are equal, inversions yield the same CDF.
Step 7 (Conclusion): Therefore:
$M_X(t) = M_Y(t) \text{ in neighborhood of } 0 \quad \Rightarrow \quad P(X \leq x) = P(Y \leq x) \text{ for all } x \quad \blacksquare$

Applications:

Proving distribution of sums: $M_{X+Y}(t) = M_X(t)M_Y(t)$ for independent $X, Y$
Identifying distributions without deriving full PMF/PDF
Central Limit Theorem proof uses MGF convergence

Frequently Asked Questions

Common questions about probability distributions and their applications

When should I use binomial vs. Poisson distribution?

Use binomial when you have a fixed number of trials ( $n$ ) with constant success probability ( $p$ ). Use Poisson when counting rare events in a continuous interval with no upper limit. As a rule of thumb, if $n > 20$ and $p < 0.05$ , Poisson approximates binomial well with $\lambda = np$ .

Why is the normal distribution so important?

The normal distribution appears naturally due to the Central Limit Theorem: sample means from any distribution approach normality as sample size increases. It's mathematically tractable (closed-form formulas), symmetric, and completely determined by two parameters ( $\mu, \sigma^2$ ). This makes it foundational for inference, hypothesis testing, and confidence intervals.

What's the difference between exponential and gamma distributions?

Exponential $\text{Exp}(\lambda)$ models waiting time until the first event in a Poisson process. Gamma $\Gamma(\alpha, \lambda)$ generalizes this to waiting time until the $\alpha$ -th event. In fact, $\text{Exp}(\lambda) = \Gamma(1, \lambda)$ . Gamma has two parameters allowing more flexible shapes, while exponential has only rate $\lambda$ .

When do I use t-distribution instead of normal?

Use t-distribution when: (1) sample size is small (typically $n < 30$ ), (2) population variance is unknown, and (3) you're estimating the population standard deviation from sample data. As degrees of freedom increase, $t(n) \to N(0,1)$ . For large samples, t and normal are nearly identical.

What is the memoryless property and which distributions have it?

The memoryless property states $P(X > s+t \mid X > s) = P(X > t)$ : the future doesn't depend on the past. Only exponential (continuous) and geometric (discrete) distributions have this property. It's ideal for modeling "wear-free" failures where components don't age. For systems that do wear out, use Weibull or gamma instead.

How do chi-square, t, and F distributions relate?

All derive from normal distributions. Chi-square: $\chi^2(n) = \sum Z_i^2$ (sum of squared normals). t-distribution: $t(n) = Z/\sqrt{\chi^2(n)/n}$ (normal÷chi-square). F-distribution: $F(m,n) = (\chi^2(m)/m)/(\chi^2(n)/n)$ (ratio of chi-squares). Also, $t^2(n) = F(1,n)$ . These relationships connect variance testing, mean testing, and ANOVA.

What is the 68-95-99.7 rule for normal distributions?

For normal $N(\mu, \sigma^2)$ : approximately 68% of data falls within $\mu \pm \sigma$ , 95% within $\mu \pm 2\sigma$ , and 99.7% within $\mu \pm 3\sigma$ . This empirical rule helps quickly assess outliers and construct confidence intervals. Values beyond $3\sigma$ are rare (0.3% probability) and often investigated as potential anomalies.

What makes a distribution part of the exponential family?

A distribution belongs to the exponential family if its density can be written as $f(x;\theta) = c(\theta)\exp\{\sum Q_j(\theta)T_j(x)\}h(x)$ . Benefits include: sufficient statistics of finite dimension, nice MLE properties, and existence of conjugate priors for Bayesian inference. Common members: normal, exponential, gamma, chi-square, binomial, Poisson, and beta.

How do I choose the right distribution for my data?

Consider: (1) Data type: discrete (binomial, Poisson) vs. continuous (normal, exponential). (2) Support: bounded ( $[0,1]$ → beta) vs. unbounded ( $\mathbb{R}$ → normal). (3) Shape: symmetric (normal) vs. skewed (gamma, exponential). (4) Context: count data (Poisson), time-to-event (exponential), proportions (beta). Use goodness-of-fit tests (chi-square, Kolmogorov-Smirnov) and Q-Q plots to validate.

What's the relationship between sample size and distribution choice?

Small samples ( $n < 30$ ): Use exact distributions (t for means, F for variances, exact binomial). Large samples ( $n \geq 30$ ): Central Limit Theorem allows normal approximations for many statistics. The quality of normal approximation depends on the parent distribution's shape: symmetric distributions need smaller $n$ , heavily skewed distributions need larger $n$ (sometimes $n > 50$ ).