MathIsimple
Back to Mathematical Statistics
Foundation Topic
5-7 Hours

Mathematical Statistics Fundamentals

Master the foundational concepts of mathematical statistics from population theory to statistical inference

Learning Objectives
Master essential mathematical statistics concepts and analytical skills
  • Understand the fundamental logic and reasoning behind mathematical statistics
  • Distinguish clearly between probability theory and mathematical statistics
  • Master population and sample concepts with rigorous mathematical definitions
  • Learn empirical distribution functions and their convergence properties
  • Construct statistics and understand their role in parameter estimation
  • Apply order statistics in data analysis and inference
  • Understand sampling distributions and their practical applications

The Logic of Mathematical Statistics

Understanding the core reasoning and methodology

What is Mathematical Statistics?

Mathematical statistics is the science of reasoning and decision-making under uncertainty, focusing on inferring unknown population characteristics from observed sample data.

Core Transformation:
Raw DataStatisticsInformationInferenceDecisions\text{Raw Data} \xrightarrow{\text{Statistics}} \text{Information} \xrightarrow{\text{Inference}} \text{Decisions}
Objective

Infer population distribution F(x)F(x) from random sample X1,X2,,XnX_1, X_2, \ldots, X_n

Methodology

Uses probability theory as foundation, develops inference procedures with quantifiable reliability

Real-World Examples:
  • Quality Control: Estimate defect rate of entire production from sample inspection
  • Clinical Trials: Infer drug effectiveness for general population from trial participants
  • Election Polls: Predict election outcome from surveying small voter sample
  • A/B Testing: Determine which website design performs better from user data
  • Manufacturing: Monitor process quality using statistical control charts
Probability Theory vs Mathematical Statistics

The fundamental distinction lies in the direction of reasoning:

AspectProbability TheoryMathematical Statistics
DirectionPopulation → Sample (Forward)Sample → Population (Inverse)
Known InformationDistribution F is knownDistribution F is unknown
Question TypeWhat samples will we get?What is the population?
ExampleGiven μ=100\mu=100, σ=15\sigma=15, find P(Xˉ>105)P(\bar{X} > 105)Given sample, estimate unknown μ\mu and σ\sigma
Historical Development
Key milestones in the evolution of mathematical statistics
1662

John Graunt

First statistical analysis of mortality data (London Bills of Mortality)

1809

Carl Friedrich Gauss

Developed least squares method and theory of errors (normal distribution)

1812

Pierre-Simon Laplace

Central Limit Theorem and foundations of probability theory

1900

Karl Pearson

Chi-square test, correlation theory, and method of moments

1908

William Gosset (Student)

Student's t-distribution for small sample inference

1920s-40s

Ronald Fisher

Maximum likelihood estimation, sufficiency, ANOVA, and experimental design

1933

Jerzy Neyman & Egon Pearson

Hypothesis testing framework and power analysis

1946

Harald Cramér

Mathematical Methods of Statistics - comprehensive monograph

Population & Samples

Understanding the fundamental concepts of statistical populations and random sampling

Statistical Population

A statistical population is the complete collection of all individuals or units under study, characterized mathematically by its cumulative distribution function:

F(x)=P(Xx)F(x) = P(X \leq x)

where XX is a random variable representing the characteristic of interest.

Key Characteristics:
  • Size

    Can be finite (all students in a university) or infinite (all possible measurements)

  • Parameters

    Population parameters (μ\mu, σ2\sigma^2) are typically unknown constants we aim to estimate

  • Parametric Families

    May belong to a parametric family: F(x;θ)F(x; \theta) where θΘ\theta \in \Theta is unknown

Detailed Examples:

Medical Studies

Population: All patients with a certain disease

Characteristic: Recovery time XExp(λ)X \sim \text{Exp}(\lambda) with unknown λ\lambda

Quality Control

Population: All products from production line

Characteristic: Defect indicator XBernoulli(p)X \sim \text{Bernoulli}(p) with unknown pp

Education Research

Population: Test scores of all students

Characteristic: Score XN(μ,σ2)X \sim N(\mu, \sigma^2) with unknown μ\mu, σ2\sigma^2

Economics

Population: Annual incomes in a country

Characteristic: Income XLogNormal(μ,σ2)X \sim \text{LogNormal}(\mu, \sigma^2) with unknown parameters

Environmental Science

Population: Air pollution levels at different times

Distribution often modeled non-parametrically due to complex patterns

Simple Random Sample (i.i.d.)

A simple random sample is a collection of nn random variables:

X1,X2,,XnX_1, X_2, \ldots, X_n

that are independent and identically distributed (i.i.d.), each with same distribution as population.

Two Essential Conditions:

1. Independence

X1,X2,,Xn are mutually independentX_1, X_2, \ldots, X_n \text{ are mutually independent}

Each observation doesn't affect others

2. Identical Distribution

XiFfor all i=1,2,,nX_i \sim F \quad \text{for all } i = 1, 2, \ldots, n

All observations from same distribution

Joint Distribution:

For i.i.d. random variables, the joint distribution factors as product of marginals:

f(x1,x2,,xn)=i=1nf(xi)f(x_1, x_2, \ldots, x_n) = \prod_{i=1}^n f(x_i)
Practical Examples:
  • Drawing balls from urn with replacement - each draw is independent
  • Polling randomly selected voters - each opinion is independent observation
  • Testing light bulbs from production - lifetimes are i.i.d. if quality is consistent
  • Measuring heights of randomly selected students - independent measurements

Sample Statistics & Empirical Distribution

Functions of sample data used to estimate population parameters

Sample Mean

The sample mean is the average of sample observations:

Xˉ=1ni=1nXi\bar{X} = \frac{1}{n}\sum_{i=1}^n X_i

Unbiasedness

E[Xˉ]=μE[\bar{X}] = \mu

Expected value equals population mean

Variance

Var(Xˉ)=σ2n\text{Var}(\bar{X}) = \frac{\sigma^2}{n}

Variance decreases with sample size

Sample Variance

Measures the spread of sample data:

S2=1n1i=1n(XiXˉ)2S^2 = \frac{1}{n-1}\sum_{i=1}^n (X_i - \bar{X})^2

Why n-1? Bessel's correction makes S2S^2 an unbiased estimator: E[S2]=σ2E[S^2] = \sigma^2

Empirical Distribution Function

The empirical CDF estimates the population distribution from sample data:

Fn(x)=1ni=1n1{Xix}F_n(x) = \frac{1}{n}\sum_{i=1}^n \mathbb{1}\{X_i \leq x\}

A step function that jumps by 1/n1/n at each observed value

Glivenko-Cantelli Theorem

supxFn(x)F(x)a.s.0as n\sup_x |F_n(x) - F(x)| \xrightarrow{a.s.} 0 \quad \text{as } n \to \infty

Empirical distribution converges uniformly to true distribution almost surely

Fundamental Theorems

The mathematical pillars supporting statistical inference

Weak Law of Large Numbers (WLLN)
Foundation of Consistency

For i.i.d. samples with finite mean μ\mu and variance σ2\sigma^2, the sample mean Xˉn\bar{X}_n converges in probability to μ\mu.

Mathematical Statement

limnP(Xˉnμϵ)=0for any ϵ>0\lim_{n \to \infty} P(|\bar{X}_n - \mu| \geq \epsilon) = 0 \quad \text{for any } \epsilon > 0

Proof via Chebyshev's Inequality

1
Moments of Sample Mean

First, calculate the expectation and variance of Xˉn\bar{X}_n:

E[Xˉn]=μ,Var(Xˉn)=σ2nE[\bar{X}_n] = \mu, \quad \text{Var}(\bar{X}_n) = \frac{\sigma^2}{n}
2
Apply Chebyshev's Inequality

Chebyshev's inequality states P(YE[Y]ϵ)Var(Y)ϵ2P(|Y - E[Y]| \geq \epsilon) \leq \frac{\text{Var}(Y)}{\epsilon^2}. Apply this to Y=XˉnY = \bar{X}_n:

P(Xˉnμϵ)Var(Xˉn)ϵ2=σ2nϵ2P(|\bar{X}_n - \mu| \geq \epsilon) \leq \frac{\text{Var}(\bar{X}_n)}{\epsilon^2} = \frac{\sigma^2}{n\epsilon^2}
3
Take the Limit

As nn \to \infty, the upper bound approaches zero:

limnσ2nϵ2=0\lim_{n \to \infty} \frac{\sigma^2}{n\epsilon^2} = 0

Therefore, the probability limit is 0. \blacksquare

Central Limit Theorem (CLT)
Basis of Normal Approximation

For i.i.d. samples with mean μ\mu and finite variance σ2\sigma^2, the standardized sample mean converges in distribution to N(0,1)N(0,1).

Mathematical Statement

n(Xˉnμσ)dN(0,1)\sqrt{n}\left(\frac{\bar{X}_n - \mu}{\sigma}\right) \xrightarrow{d} N(0,1)

Proof via Moment Generating Functions

1
Standardize Variables

Let Zi=(Xiμ)/σZ_i = (X_i - \mu)/\sigma. Then E[Zi]=0E[Z_i]=0, Var(Zi)=1Var(Z_i)=1. The standardized sum is Sn=1ni=1nZiS_n^* = \frac{1}{\sqrt{n}} \sum_{i=1}^n Z_i.

2
MGF Expansion

The MGF of ZiZ_i near 0 is MZ(t)=1+t22+o(t2)M_Z(t) = 1 + \frac{t^2}{2} + o(t^2). The MGF of SnS_n^* is:

MSn(t)=E[etZi/n]=(MZ(tn))nM_{S_n^*}(t) = E[e^{t \sum Z_i / \sqrt{n}}] = \left( M_Z\left(\frac{t}{\sqrt{n}}\right) \right)^n
3
Substitute and Limit

Substitute the expansion:

(1+(t/n)22+o(t2n))n=(1+t2/2n+)n\left( 1 + \frac{(t/\sqrt{n})^2}{2} + o\left(\frac{t^2}{n}\right) \right)^n = \left( 1 + \frac{t^2/2}{n} + \dots \right)^n

Using limn(1+x/n)n=ex\lim_{n\to\infty} (1 + x/n)^n = e^x:

limnMSn(t)=et2/2\lim_{n \to \infty} M_{S_n^*}(t) = e^{t^2/2}
4
Conclusion

et2/2e^{t^2/2} is the MGF of the standard normal distribution N(0,1)N(0,1). By the continuity theorem for MGFs, the distribution converges. \blacksquare

Frequently Asked Questions

Common questions about mathematical statistics fundamentals

What is the difference between mathematical statistics and probability theory?

Probability theory works forwards from known distributions to predict sample behavior, while mathematical statistics works backwards from observed samples to infer unknown population characteristics. Probability asks "what samples will we get?", while statistics asks "what is the population?".

Why do we divide by n-1 instead of n when calculating sample variance?

Dividing by n1n-1 (Bessel's correction) makes the sample variance an unbiased estimator of the population variance. This correction accounts for the fact that we're using the sample mean rather than the true population mean, which introduces a slight underestimation that n1n-1 corrects.

What does i.i.d. mean and why is it important?

I.i.d. stands for "independent and identically distributed". It means each observation comes from the same distribution and doesn't depend on other observations. This assumption is crucial because it allows us to apply powerful statistical theorems like the Law of Large Numbers and Central Limit Theorem.

What is a statistical population?

A statistical population is the complete collection of all individuals or units under study, characterized by a distribution function F(x)F(x). It can be finite (e.g., all students in a school) or infinite (e.g., all possible measurements of a physical quantity). The population distribution typically contains unknown parameters we want to estimate.

How large should my sample size be?

Sample size depends on several factors: desired precision, population variability, confidence level, and the specific inference goal. Generally, larger samples provide more precise estimates. Rules of thumb include n30n \geq 30 for CLT applications, but power analysis provides more rigorous sample size determination for specific tests.

What is the empirical distribution function?

The empirical distribution function Fn(x)F_n(x) is constructed from sample data and represents the proportion of observations x\leq x. By the Glivenko-Cantelli theorem, it converges uniformly to the true distribution F(x)F(x) as sample size increases, providing a non-parametric estimate of the population distribution.

Can I use statistical methods if my data isn't normally distributed?

Yes! Many statistical methods don't require normality. Non-parametric methods use ranks or empirical distributions, the Central Limit Theorem makes sample means approximately normal for large nn regardless of population distribution, and transformations can sometimes achieve approximate normality. However, interpretation and power may be affected.

What is a sufficient statistic?

A sufficient statistic captures all information in the sample relevant to estimating a parameter - knowing the statistic is as good as knowing the entire sample for inference purposes. For example, for normal data, the sample mean and variance together are sufficient for estimating μ\mu and σ2\sigma^2.

How does mathematical statistics help in real-world decision making?

Mathematical statistics provides rigorous frameworks for: quality control in manufacturing, drug effectiveness evaluation in clinical trials, A/B testing in tech, risk assessment in finance, election polling, and any scenario requiring data-driven decisions with quantified uncertainty and reliability.

What are order statistics and when are they used?

Order statistics are the sample values arranged in increasing order: X(1)X(2)X(n)X_{(1)} \leq X_{(2)} \leq \cdots \leq X_{(n)}. They're fundamental in non-parametric statistics, used for calculating quantiles, constructing confidence intervals, and in rank-based tests that don't assume a specific distribution.