Master probability distribution families and their applications in statistical inference
Core probability distributions essential for statistical modeling
The binomial distribution describes the number of successes in a fixed number of independent trials.
Parameters: (number of trials), (success probability)
Mean
Variance
MGF
Distribution Family
Exponential family
Problem:
A factory produces items with a 5% defect rate. If we inspect 20 randomly selected items, what is the probability of finding exactly 2 defective items? What is the expected number of defects?
Solution:
Key Insight:
The binomial distribution is appropriate when we have a fixed number of independent trials with constant success probability. Use the binomial formula directly for small , or normal approximation for large .
The Poisson distribution models the count of events occurring in a fixed time or space interval when events occur independently at a constant average rate.
Parameter: (average rate per interval)
Mean
Variance
Limit Property
when ,
Additive Property
for independent Poisson RVs
Problem:
A call center receives an average of 4 calls per minute. Assuming calls arrive independently, what is the probability of receiving exactly 6 calls in the next minute? At most 2 calls?
Solution:
Key Insight:
Poisson is ideal for counting rare events in fixed intervals. The mean equals the variance (), and we can sum individual probabilities for cumulative calculations.
The normal (Gaussian) distribution is fundamental to statistics, appearing naturally in many phenomena due to the Central Limit Theorem.
Parameters: (mean), (standard deviation)
Mean
Variance
Standard Normal
68-95-99.7 Rule
68% within
95% within
99.7% within
Problem:
Test scores are normally distributed with mean and standard deviation . What percentage of students score above 90? What score separates the top 10% of students?
Solution:
Key Insight:
Always standardize normal distributions using to use standard normal tables. For percentiles, work backwards from -scores to original units.
The exponential distribution models the time until an event occurs in a Poisson process, characterized by the unique memoryless property.
Parameter: (rate parameter)
Mean
Variance
Memoryless Property
Past doesn't affect future
Relation to Gamma
Problem:
Electronic components have lifetimes that follow failures per hour. What is the probability a component lasts more than 1000 hours? If it has already lasted 500 hours, what's the probability it lasts another 500 hours?
Solution:
Key Insight:
The memoryless property means the component's past survival doesn't affect its future lifetime - a unique property of exponential distributions. This makes it ideal for modeling "wear-free" failures.
Derived distributions essential for statistical inference and hypothesis testing
The gamma distribution models the sum of independent exponential random variables, widely used in Bayesian statistics and queuing theory.
Parameters: (shape), (rate). is the gamma function.
Mean
Variance
Additivity
Special Cases
Problem:
Machine failures occur at rate per day (exponentially distributed). What is the expected time until the third failure? What's the probability the third failure occurs within 10 days?
Solution:
Key Insight:
Gamma distribution models the sum of independent exponential waiting times. Use the additivity property and CDF to calculate probabilities for multiple events.
The chi-square distribution arises as the distribution of the sum of squares of independent standard normal random variables.
Parameter: (degrees of freedom)
Definition
where
Mean & Variance
Additivity
Relation to Gamma
Problem:
A sample of observations from has sample variance . Find the distribution of and calculate .
Solution:
Key Insight:
Sample variance from normal populations follows a scaled chi-square distribution. This forms the basis for variance testing and confidence intervals.
The t-distribution is used for inference about means when the population standard deviation is unknown and sample size is small.
Parameter: (degrees of freedom)
Definition
where ,
Properties
Symmetric around 0
(if )
(if )
Limit Behavior
Heavier Tails
More probability in tails than normal
Problem:
A sample of measurements has mean and sample standard deviation . Construct a 95% confidence interval for the population mean, assuming normality.
Solution:
Key Insight:
Use t-distribution instead of normal when sample size is small () and is unknown. The wider interval accounts for additional uncertainty from estimating .
The F-distribution is the ratio of two independent chi-square variables, fundamental in ANOVA and variance testing.
Parameters: (degrees of freedom)
Mean
for
Reciprocal Property
Quantile Relation
Connection to t
Problem:
Two independent samples from normal populations have sample variances () and (). Test if the population variances are equal at .
Solution:
Key Insight:
F-test compares variance ratio to F-distribution. Use reciprocal property to find lower critical value. Always put larger variance in numerator for one-tailed tests.
Unified framework connecting many important distributions
A distribution belongs to the exponential family if its density can be written as:
Normalizing constant depending only on
Natural parameter functions
Sufficient statistics
Base measure (independent of )
Normal :
,
Poisson :
, natural parameter
Problem:
Show that the binomial distribution belongs to the exponential family and identify its natural parameter and sufficient statistic.
Solution:
(natural parameter)
(sufficient statistic)
Key Insight:
Converting to exponential family form reveals the sufficient statistic and natural parameter. The natural parameter is the log-odds.
Understanding how distributions connect and derive from each other
Step-by-step mathematical derivations of fundamental distribution theorems
Theorem Statement:
Let where , , and for some constant . Then:
That is, converges in distribution to .
Proof:
Practical Significance:
Theorem Statement:
Let be independent random variables. Then:
where is the chi-square distribution with degrees of freedom.
Proof:
Key Implications:
Theorem Statement:
If two random variables and have moment generating functions and that exist and are equal in an open interval containing 0, then and have the same distribution.
Proof (Sketch):
Applications:
Common questions about probability distributions and their applications
Use binomial when you have a fixed number of trials () with constant success probability (). Use Poisson when counting rare events in a continuous interval with no upper limit. As a rule of thumb, if and , Poisson approximates binomial well with .
The normal distribution appears naturally due to the Central Limit Theorem: sample means from any distribution approach normality as sample size increases. It's mathematically tractable (closed-form formulas), symmetric, and completely determined by two parameters (). This makes it foundational for inference, hypothesis testing, and confidence intervals.
Exponential models waiting time until the first event in a Poisson process. Gamma generalizes this to waiting time until the -th event. In fact, . Gamma has two parameters allowing more flexible shapes, while exponential has only rate .
Use t-distribution when: (1) sample size is small (typically ), (2) population variance is unknown, and (3) you're estimating the population standard deviation from sample data. As degrees of freedom increase, . For large samples, t and normal are nearly identical.
The memoryless property states : the future doesn't depend on the past. Only exponential (continuous) and geometric (discrete) distributions have this property. It's ideal for modeling "wear-free" failures where components don't age. For systems that do wear out, use Weibull or gamma instead.
All derive from normal distributions. Chi-square: (sum of squared normals). t-distribution: (normal÷chi-square). F-distribution: (ratio of chi-squares). Also, . These relationships connect variance testing, mean testing, and ANOVA.
For normal : approximately 68% of data falls within , 95% within , and 99.7% within . This empirical rule helps quickly assess outliers and construct confidence intervals. Values beyond are rare (0.3% probability) and often investigated as potential anomalies.
A distribution belongs to the exponential family if its density can be written as . Benefits include: sufficient statistics of finite dimension, nice MLE properties, and existence of conjugate priors for Bayesian inference. Common members: normal, exponential, gamma, chi-square, binomial, Poisson, and beta.
Consider: (1) Data type: discrete (binomial, Poisson) vs. continuous (normal, exponential). (2) Support: bounded ( → beta) vs. unbounded ( → normal). (3) Shape: symmetric (normal) vs. skewed (gamma, exponential). (4) Context: count data (Poisson), time-to-event (exponential), proportions (beta). Use goodness-of-fit tests (chi-square, Kolmogorov-Smirnov) and Q-Q plots to validate.
Small samples (): Use exact distributions (t for means, F for variances, exact binomial). Large samples (): Central Limit Theorem allows normal approximations for many statistics. The quality of normal approximation depends on the parent distribution's shape: symmetric distributions need smaller , heavily skewed distributions need larger (sometimes ).