Foundation Topic

5-7 Hours

Mathematical Statistics Fundamentals

Master the foundational concepts of mathematical statistics from population theory to statistical inference

Learning Objectives

Master essential mathematical statistics concepts and analytical skills

Understand the fundamental logic and reasoning behind mathematical statistics
Distinguish clearly between probability theory and mathematical statistics
Master population and sample concepts with rigorous mathematical definitions
Learn empirical distribution functions and their convergence properties
Construct statistics and understand their role in parameter estimation
Apply order statistics in data analysis and inference
Understand sampling distributions and their practical applications

The Logic of Mathematical Statistics

Understanding the core reasoning and methodology

What is Mathematical Statistics?

Mathematical statistics is the science of reasoning and decision-making under uncertainty, focusing on inferring unknown population characteristics from observed sample data.

Core Transformation:

\text{Raw Data} \xrightarrow{\text{Statistics}} \text{Information} \xrightarrow{\text{Inference}} \text{Decisions}

Objective

Infer population distribution $F(x)$ from random sample $X_1, X_2, \ldots, X_n$

Methodology

Uses probability theory as foundation, develops inference procedures with quantifiable reliability

Real-World Examples:

Quality Control: Estimate defect rate of entire production from sample inspection
Clinical Trials: Infer drug effectiveness for general population from trial participants
Election Polls: Predict election outcome from surveying small voter sample
A/B Testing: Determine which website design performs better from user data
Manufacturing: Monitor process quality using statistical control charts

Probability Theory vs Mathematical Statistics

The fundamental distinction lies in the direction of reasoning:

Aspect	Probability Theory	Mathematical Statistics
Direction	Population → Sample (Forward)	Sample → Population (Inverse)
Known Information	Distribution F is known	Distribution F is unknown
Question Type	What samples will we get?	What is the population?
Example	Given $\mu=100$ , $\sigma=15$ , find $P(\bar{X} > 105)$	Given sample, estimate unknown $\mu$ and $\sigma$

Historical Development

Key milestones in the evolution of mathematical statistics

1662

John Graunt

First statistical analysis of mortality data (London Bills of Mortality)

1809

Carl Friedrich Gauss

Developed least squares method and theory of errors (normal distribution)

1812

Pierre-Simon Laplace

Central Limit Theorem and foundations of probability theory

1900

Karl Pearson

Chi-square test, correlation theory, and method of moments

1908

William Gosset (Student)

Student's t-distribution for small sample inference

1920s-40s

Ronald Fisher

Maximum likelihood estimation, sufficiency, ANOVA, and experimental design

1933

Jerzy Neyman & Egon Pearson

Hypothesis testing framework and power analysis

1946

Harald Cramér

Mathematical Methods of Statistics - comprehensive monograph

Population & Samples

Understanding the fundamental concepts of statistical populations and random sampling

Statistical Population

A statistical population is the complete collection of all individuals or units under study, characterized mathematically by its cumulative distribution function:

F(x) = P(X \leq x)

where $X$ is a random variable representing the characteristic of interest.

Key Characteristics:

Size
Can be finite (all students in a university) or infinite (all possible measurements)
Parameters
Population parameters ( $\mu$ , $\sigma^2$ ) are typically unknown constants we aim to estimate
Parametric Families
May belong to a parametric family: $F(x; \theta)$ where $\theta \in \Theta$ is unknown

Detailed Examples:

Medical Studies

Population: All patients with a certain disease

Characteristic: Recovery time $X \sim \text{Exp}(\lambda)$ with unknown $\lambda$

Quality Control

Population: All products from production line

Characteristic: Defect indicator $X \sim \text{Bernoulli}(p)$ with unknown $p$

Education Research

Population: Test scores of all students

Characteristic: Score $X \sim N(\mu, \sigma^2)$ with unknown $\mu$ , $\sigma^2$

Economics

Population: Annual incomes in a country

Characteristic: Income $X \sim \text{LogNormal}(\mu, \sigma^2)$ with unknown parameters

Environmental Science

Population: Air pollution levels at different times

Distribution often modeled non-parametrically due to complex patterns

Simple Random Sample (i.i.d.)

A simple random sample is a collection of $n$ random variables:

X_1, X_2, \ldots, X_n

that are independent and identically distributed (i.i.d.), each with same distribution as population.

Two Essential Conditions:

1. Independence

X_1, X_2, \ldots, X_n \text{ are mutually independent}

Each observation doesn't affect others

2. Identical Distribution

X_i \sim F \quad \text{for all } i = 1, 2, \ldots, n

All observations from same distribution

Joint Distribution:

For i.i.d. random variables, the joint distribution factors as product of marginals:

f(x_1, x_2, \ldots, x_n) = \prod_{i=1}^n f(x_i)

Practical Examples:

Drawing balls from urn with replacement - each draw is independent
Polling randomly selected voters - each opinion is independent observation
Testing light bulbs from production - lifetimes are i.i.d. if quality is consistent
Measuring heights of randomly selected students - independent measurements

Sample Statistics & Empirical Distribution

Functions of sample data used to estimate population parameters

Sample Mean

The sample mean is the average of sample observations:

\bar{X} = \frac{1}{n}\sum_{i=1}^n X_i

Unbiasedness

E[\bar{X}] = \mu

Expected value equals population mean

Variance

\text{Var}(\bar{X}) = \frac{\sigma^2}{n}

Variance decreases with sample size

Sample Variance

Measures the spread of sample data:

S^2 = \frac{1}{n-1}\sum_{i=1}^n (X_i - \bar{X})^2

Why n-1? Bessel's correction makes $S^2$ an unbiased estimator: $E[S^2] = \sigma^2$

Empirical Distribution Function

The empirical CDF estimates the population distribution from sample data:

F_n(x) = \frac{1}{n}\sum_{i=1}^n \mathbb{1}\{X_i \leq x\}

A step function that jumps by $1/n$ at each observed value

Glivenko-Cantelli Theorem

\sup_x |F_n(x) - F(x)| \xrightarrow{a.s.} 0 \quad \text{as } n \to \infty

Empirical distribution converges uniformly to true distribution almost surely

Fundamental Theorems

The mathematical pillars supporting statistical inference

Weak Law of Large Numbers (WLLN)

Foundation of Consistency

For i.i.d. samples with finite mean $\mu$ and variance $\sigma^2$ , the sample mean $\bar{X}_n$ converges in probability to $\mu$ .

Mathematical Statement

\lim_{n \to \infty} P(|\bar{X}_n - \mu| \geq \epsilon) = 0 \quad \text{for any } \epsilon > 0

Proof via Chebyshev's Inequality

Moments of Sample Mean

First, calculate the expectation and variance of $\bar{X}_n$ :

E[\bar{X}_n] = \mu, \quad \text{Var}(\bar{X}_n) = \frac{\sigma^2}{n}

Apply Chebyshev's Inequality

Chebyshev's inequality states $P(|Y - E[Y]| \geq \epsilon) \leq \frac{\text{Var}(Y)}{\epsilon^2}$ . Apply this to $Y = \bar{X}_n$ :

P(|\bar{X}_n - \mu| \geq \epsilon) \leq \frac{\text{Var}(\bar{X}_n)}{\epsilon^2} = \frac{\sigma^2}{n\epsilon^2}

Take the Limit

As $n \to \infty$ , the upper bound approaches zero:

\lim_{n \to \infty} \frac{\sigma^2}{n\epsilon^2} = 0

Therefore, the probability limit is 0. $\blacksquare$

Central Limit Theorem (CLT)

Basis of Normal Approximation

For i.i.d. samples with mean $\mu$ and finite variance $\sigma^2$ , the standardized sample mean converges in distribution to $N(0,1)$ .

Mathematical Statement

\sqrt{n}\left(\frac{\bar{X}_n - \mu}{\sigma}\right) \xrightarrow{d} N(0,1)

Proof via Moment Generating Functions

Standardize Variables

Let $Z_i = (X_i - \mu)/\sigma$ . Then $E[Z_i]=0$ , $Var(Z_i)=1$ . The standardized sum is $S_n^* = \frac{1}{\sqrt{n}} \sum_{i=1}^n Z_i$ .

MGF Expansion

The MGF of $Z_i$ near 0 is $M_Z(t) = 1 + \frac{t^2}{2} + o(t^2)$ . The MGF of $S_n^*$ is:

M_{S_n^*}(t) = E[e^{t \sum Z_i / \sqrt{n}}] = \left( M_Z\left(\frac{t}{\sqrt{n}}\right) \right)^n

Substitute and Limit

Substitute the expansion:

\left( 1 + \frac{(t/\sqrt{n})^2}{2} + o\left(\frac{t^2}{n}\right) \right)^n = \left( 1 + \frac{t^2/2}{n} + \dots \right)^n

Using $\lim_{n\to\infty} (1 + x/n)^n = e^x$ :

\lim_{n \to \infty} M_{S_n^*}(t) = e^{t^2/2}

Conclusion

$e^{t^2/2}$ is the MGF of the standard normal distribution $N(0,1)$ . By the continuity theorem for MGFs, the distribution converges. $\blacksquare$

Frequently Asked Questions

Common questions about mathematical statistics fundamentals

What is the difference between mathematical statistics and probability theory?

Probability theory works forwards from known distributions to predict sample behavior, while mathematical statistics works backwards from observed samples to infer unknown population characteristics. Probability asks "what samples will we get?", while statistics asks "what is the population?".

Why do we divide by n-1 instead of n when calculating sample variance?

Dividing by $n-1$ (Bessel's correction) makes the sample variance an unbiased estimator of the population variance. This correction accounts for the fact that we're using the sample mean rather than the true population mean, which introduces a slight underestimation that $n-1$ corrects.

What does i.i.d. mean and why is it important?

I.i.d. stands for "independent and identically distributed". It means each observation comes from the same distribution and doesn't depend on other observations. This assumption is crucial because it allows us to apply powerful statistical theorems like the Law of Large Numbers and Central Limit Theorem.

What is a statistical population?

A statistical population is the complete collection of all individuals or units under study, characterized by a distribution function $F(x)$ . It can be finite (e.g., all students in a school) or infinite (e.g., all possible measurements of a physical quantity). The population distribution typically contains unknown parameters we want to estimate.

How large should my sample size be?

Sample size depends on several factors: desired precision, population variability, confidence level, and the specific inference goal. Generally, larger samples provide more precise estimates. Rules of thumb include $n \geq 30$ for CLT applications, but power analysis provides more rigorous sample size determination for specific tests.

What is the empirical distribution function?

The empirical distribution function $F_n(x)$ is constructed from sample data and represents the proportion of observations $\leq x$ . By the Glivenko-Cantelli theorem, it converges uniformly to the true distribution $F(x)$ as sample size increases, providing a non-parametric estimate of the population distribution.

Can I use statistical methods if my data isn't normally distributed?

Yes! Many statistical methods don't require normality. Non-parametric methods use ranks or empirical distributions, the Central Limit Theorem makes sample means approximately normal for large $n$ regardless of population distribution, and transformations can sometimes achieve approximate normality. However, interpretation and power may be affected.

What is a sufficient statistic?

A sufficient statistic captures all information in the sample relevant to estimating a parameter - knowing the statistic is as good as knowing the entire sample for inference purposes. For example, for normal data, the sample mean and variance together are sufficient for estimating $\mu$ and $\sigma^2$ .

How does mathematical statistics help in real-world decision making?

Mathematical statistics provides rigorous frameworks for: quality control in manufacturing, drug effectiveness evaluation in clinical trials, A/B testing in tech, risk assessment in finance, election polling, and any scenario requiring data-driven decisions with quantified uncertainty and reliability.

What are order statistics and when are they used?

Order statistics are the sample values arranged in increasing order: $X_{(1)} \leq X_{(2)} \leq \cdots \leq X_{(n)}$ . They're fundamental in non-parametric statistics, used for calculating quantiles, constructing confidence intervals, and in rank-based tests that don't assume a specific distribution.