Master the foundational concepts of mathematical statistics from population theory to statistical inference
Understanding the core reasoning and methodology
Mathematical statistics is the science of reasoning and decision-making under uncertainty, focusing on inferring unknown population characteristics from observed sample data.
Infer population distribution from random sample
Uses probability theory as foundation, develops inference procedures with quantifiable reliability
The fundamental distinction lies in the direction of reasoning:
| Aspect | Probability Theory | Mathematical Statistics |
|---|---|---|
| Direction | Population → Sample (Forward) | Sample → Population (Inverse) |
| Known Information | Distribution F is known | Distribution F is unknown |
| Question Type | What samples will we get? | What is the population? |
| Example | Given , , find | Given sample, estimate unknown and |
John Graunt
First statistical analysis of mortality data (London Bills of Mortality)
Carl Friedrich Gauss
Developed least squares method and theory of errors (normal distribution)
Pierre-Simon Laplace
Central Limit Theorem and foundations of probability theory
Karl Pearson
Chi-square test, correlation theory, and method of moments
William Gosset (Student)
Student's t-distribution for small sample inference
Ronald Fisher
Maximum likelihood estimation, sufficiency, ANOVA, and experimental design
Jerzy Neyman & Egon Pearson
Hypothesis testing framework and power analysis
Harald Cramér
Mathematical Methods of Statistics - comprehensive monograph
Understanding the fundamental concepts of statistical populations and random sampling
A statistical population is the complete collection of all individuals or units under study, characterized mathematically by its cumulative distribution function:
where is a random variable representing the characteristic of interest.
Size
Can be finite (all students in a university) or infinite (all possible measurements)
Parameters
Population parameters (, ) are typically unknown constants we aim to estimate
Parametric Families
May belong to a parametric family: where is unknown
Medical Studies
Population: All patients with a certain disease
Characteristic: Recovery time with unknown
Quality Control
Population: All products from production line
Characteristic: Defect indicator with unknown
Education Research
Population: Test scores of all students
Characteristic: Score with unknown ,
Economics
Population: Annual incomes in a country
Characteristic: Income with unknown parameters
Environmental Science
Population: Air pollution levels at different times
Distribution often modeled non-parametrically due to complex patterns
A simple random sample is a collection of random variables:
that are independent and identically distributed (i.i.d.), each with same distribution as population.
1. Independence
Each observation doesn't affect others
2. Identical Distribution
All observations from same distribution
For i.i.d. random variables, the joint distribution factors as product of marginals:
Functions of sample data used to estimate population parameters
The sample mean is the average of sample observations:
Unbiasedness
Expected value equals population mean
Variance
Variance decreases with sample size
Measures the spread of sample data:
Why n-1? Bessel's correction makes an unbiased estimator:
The empirical CDF estimates the population distribution from sample data:
A step function that jumps by at each observed value
Glivenko-Cantelli Theorem
Empirical distribution converges uniformly to true distribution almost surely
The mathematical pillars supporting statistical inference
For i.i.d. samples with finite mean and variance , the sample mean converges in probability to .
First, calculate the expectation and variance of :
Chebyshev's inequality states . Apply this to :
As , the upper bound approaches zero:
Therefore, the probability limit is 0.
For i.i.d. samples with mean and finite variance , the standardized sample mean converges in distribution to .
Let . Then , . The standardized sum is .
The MGF of near 0 is . The MGF of is:
Substitute the expansion:
Using :
is the MGF of the standard normal distribution . By the continuity theorem for MGFs, the distribution converges.
Common questions about mathematical statistics fundamentals
Probability theory works forwards from known distributions to predict sample behavior, while mathematical statistics works backwards from observed samples to infer unknown population characteristics. Probability asks "what samples will we get?", while statistics asks "what is the population?".
Dividing by (Bessel's correction) makes the sample variance an unbiased estimator of the population variance. This correction accounts for the fact that we're using the sample mean rather than the true population mean, which introduces a slight underestimation that corrects.
I.i.d. stands for "independent and identically distributed". It means each observation comes from the same distribution and doesn't depend on other observations. This assumption is crucial because it allows us to apply powerful statistical theorems like the Law of Large Numbers and Central Limit Theorem.
A statistical population is the complete collection of all individuals or units under study, characterized by a distribution function . It can be finite (e.g., all students in a school) or infinite (e.g., all possible measurements of a physical quantity). The population distribution typically contains unknown parameters we want to estimate.
Sample size depends on several factors: desired precision, population variability, confidence level, and the specific inference goal. Generally, larger samples provide more precise estimates. Rules of thumb include for CLT applications, but power analysis provides more rigorous sample size determination for specific tests.
The empirical distribution function is constructed from sample data and represents the proportion of observations . By the Glivenko-Cantelli theorem, it converges uniformly to the true distribution as sample size increases, providing a non-parametric estimate of the population distribution.
Yes! Many statistical methods don't require normality. Non-parametric methods use ranks or empirical distributions, the Central Limit Theorem makes sample means approximately normal for large regardless of population distribution, and transformations can sometimes achieve approximate normality. However, interpretation and power may be affected.
A sufficient statistic captures all information in the sample relevant to estimating a parameter - knowing the statistic is as good as knowing the entire sample for inference purposes. For example, for normal data, the sample mean and variance together are sufficient for estimating and .
Mathematical statistics provides rigorous frameworks for: quality control in manufacturing, drug effectiveness evaluation in clinical trials, A/B testing in tech, risk assessment in finance, election polling, and any scenario requiring data-driven decisions with quantified uncertainty and reliability.
Order statistics are the sample values arranged in increasing order: . They're fundamental in non-parametric statistics, used for calculating quantiles, constructing confidence intervals, and in rank-based tests that don't assume a specific distribution.