MathIsimple
← Back to Statistics Formulas
Bayesian Statistics

Bayesian Statistics Formulas

Comprehensive collection of Bayesian statistical formulas including Bayes' theorem, conjugate priors, posterior distributions, credible intervals, and computational methods.

Fundamental Bayes' Theorem
The mathematical foundation of Bayesian inference

Basic Bayes' Theorem

π(θx~)=p(x~θ)π(θ)p(x~)\pi(\theta|\tilde{x}) = \frac{p(\tilde{x}|\theta) \pi(\theta)}{p(\tilde{x})}

Posterior = (Likelihood × Prior) / Evidence

Key Components:
  • π(θ|x̃): Posterior distribution - updated beliefs about θ given data
  • p(x̃|θ): Likelihood function - probability of data given parameter
  • π(θ): Prior distribution - initial beliefs about θ
  • p(x̃): Marginal likelihood - normalizing constant

Proportional Form

π(θx~)p(x~θ)π(θ)\pi(\theta|\tilde{x}) \propto p(\tilde{x}|\theta) \pi(\theta)

Often used form ignoring the normalizing constant

Key Components:
  • Easier for calculation since denominator is constant for given data
  • Focus on how likelihood and prior combine
  • Normalize at the end if needed for probability statements

Marginal Likelihood

p(x~)=Θp(x~θ)π(θ)dθp(\tilde{x}) = \int_{\Theta} p(\tilde{x}|\theta) \pi(\theta) d\theta

Evidence obtained by integrating over all parameter values

Key Components:
  • Also called evidence or normalizing constant
  • Used for model comparison via Bayes factors
  • Often computationally challenging to evaluate
Conjugate Prior Families
Prior-likelihood combinations yielding closed-form posteriors

Beta-Binomial Conjugacy

Beta(a,b)+Binomial(n,x)Beta(a+x,b+nx)\text{Beta}(a,b) + \text{Binomial}(n,x) \rightarrow \text{Beta}(a+x, b+n-x)

Beta prior is conjugate to Binomial likelihood

Key Components:
  • Prior: θ ~ Beta(a,b)
  • Likelihood: X ~ Binomial(n,θ)
  • Posterior: θ|X ~ Beta(a+x, b+n-x)
  • Interpretation: a = prior successes, b = prior failures

Gamma-Poisson Conjugacy

Gamma(α,β)+Poisson(xi)Gamma(α+xi,β+m)\text{Gamma}(\alpha,\beta) + \text{Poisson}(\sum x_i) \rightarrow \text{Gamma}(\alpha+\sum x_i, \beta+m)

Gamma prior is conjugate to Poisson likelihood

Key Components:
  • Prior: λ ~ Gamma(α,β)
  • Likelihood: X_i ~ Poisson(λ) for i=1,...,m
  • Posterior: λ|X ~ Gamma(α+Σx_i, β+m)
  • Interpretation: α = prior shape, β = prior rate (observations)

Normal-Normal Conjugacy (Known Variance)

N(μ0,τ2)+N(μ,σ2)N(μn,τn2)N(\mu_0, \tau^2) + N(\mu, \sigma^2) \rightarrow N(\mu_n, \tau_n^2)

Normal prior for mean with known variance

Key Components:
  • Prior: μ ~ N(μ₀, τ²)
  • Likelihood: X_i ~ N(μ, σ²) with known σ²
  • Posterior precision: τₙ⁻² = τ⁻² + nσ⁻²
  • Posterior mean: μₙ = (τ⁻²μ₀ + nσ⁻²x̄) / (τ⁻² + nσ⁻²)

Inverse Gamma-Normal Conjugacy (Known Mean)

InvGamma(a,b)+N(μ,σ2)InvGamma(a+n/2,b+SS/2)\text{InvGamma}(a,b) + N(\mu, \sigma^2) \rightarrow \text{InvGamma}(a+n/2, b+SS/2)

Inverse Gamma prior for variance with known mean

Key Components:
  • Prior: σ² ~ InvGamma(a,b)
  • Likelihood: X_i ~ N(μ, σ²) with known μ
  • SS = Σ(X_i - μ)² is sum of squared deviations
  • Posterior: σ²|X ~ InvGamma(a+n/2, b+SS/2)
Bayesian Point Estimation
Optimal estimators under different loss functions

Posterior Mean (Squared Loss)

θ^Bayes=E[θx~]=θπ(θx~)dθ\hat{\theta}_{\text{Bayes}} = E[\theta|\tilde{x}] = \int \theta \cdot \pi(\theta|\tilde{x}) d\theta

Minimizes expected squared loss

Key Components:
  • Optimal under L(θ,δ) = (θ-δ)²
  • Most commonly used Bayesian estimator
  • Weighted average of prior mean and sample information

Posterior Median (Absolute Loss)

θ^med=median{π(θx~)}\hat{\theta}_{\text{med}} = \text{median}\{\pi(\theta|\tilde{x})\}

Minimizes expected absolute loss

Key Components:
  • Optimal under L(θ,δ) = |θ-δ|
  • 50th percentile of posterior distribution
  • Robust to outliers in posterior

Posterior Mode (0-1 Loss)

θ^MAP=argmaxθπ(θx~)\hat{\theta}_{\text{MAP}} = \arg\max_\theta \pi(\theta|\tilde{x})

Maximum A Posteriori (MAP) estimator

Key Components:
  • Optimal under L(θ,δ) = I(θ≠δ)
  • Most probable value given the data
  • May not exist or be unique
Credible Intervals
Bayesian interval estimation with probability interpretation

Equal-Tail Credible Interval

P(θLθθUx~)=1αP(\theta_L \leq \theta \leq \theta_U | \tilde{x}) = 1-\alpha

α/2 probability in each tail

Key Components:
  • θ_L = posterior (α/2) quantile
  • θ_U = posterior (1-α/2) quantile
  • Direct probability interpretation
  • Easy to compute from posterior CDF

Highest Posterior Density (HPD)

HPD1α={θ:π(θx~)k}\text{HPD}_{1-\alpha} = \{\theta: \pi(\theta|\tilde{x}) \geq k\}

Shortest interval with given probability content

Key Components:
  • k chosen so P(θ ∈ HPD) = 1-α
  • Optimal for asymmetric posteriors
  • All points inside have higher density than points outside
  • May not be connected for multimodal posteriors

Normal Approximation Intervals

θ^n±zα/2Var[θx~]\hat{\theta}_n \pm z_{\alpha/2} \sqrt{\text{Var}[\theta|\tilde{x}]}

Large-sample normal approximation

Key Components:
  • Valid when posterior is approximately normal
  • Uses posterior mean and variance
  • z_{α/2} is standard normal critical value
  • Asymptotically valid for many problems
Bayesian Prediction
Posterior predictive distributions for future observations

Posterior Predictive Distribution

p(zx~)=p(zθ)π(θx~)dθp(z|\tilde{x}) = \int p(z|\theta) \pi(\theta|\tilde{x}) d\theta

Distribution of future observations given past data

Key Components:
  • z: future observation
  • θ: unknown parameter
  • Integrates over parameter uncertainty
  • Accounts for both aleatory and epistemic uncertainty

Beta-Binomial Prediction

P(Z=kx~)=(mk)B(k+a+x,mk+b+nx)B(a+x,b+nx)P(Z=k|\tilde{x}) = \binom{m}{k} \frac{B(k+a+x, m-k+b+n-x)}{B(a+x, b+n-x)}

Prediction for k successes in m future Bernoulli trials

Key Components:
  • B(·,·) is the Beta function
  • a+x, b+n-x are posterior Beta parameters
  • m is number of future trials
  • Expected successes: m(a+x)/(a+b+n)

Normal Predictive Distribution

Zx~N(μn,τn2+σ2)Z|\tilde{x} \sim N(\mu_n, \tau_n^2 + \sigma^2)

Prediction for Normal model with Normal prior

Key Components:
  • μ_n is posterior mean
  • τ_n² is posterior variance (parameter uncertainty)
  • σ² is observation variance (aleatory uncertainty)
  • Total predictive variance = parameter uncertainty + observation noise
Model Comparison & Selection
Bayesian methods for comparing and selecting models

Bayes Factor

BF12=p(x~M1)p(x~M2)=p(x~θ1,M1)π(θ1M1)dθ1p(x~θ2,M2)π(θ2M2)dθ2BF_{12} = \frac{p(\tilde{x}|M_1)}{p(\tilde{x}|M_2)} = \frac{\int p(\tilde{x}|\theta_1,M_1)\pi(\theta_1|M_1)d\theta_1}{\int p(\tilde{x}|\theta_2,M_2)\pi(\theta_2|M_2)d\theta_2}

Ratio of marginal likelihoods comparing two models

Key Components:
  • Evidence in favor of Model 1 vs Model 2
  • BF > 1: evidence for M₁, BF < 1: evidence for M₂
  • BF = 1: no evidence for either model
  • Interpretation scales: 3-10 (moderate), 10-100 (strong), >100 (decisive)

Posterior Model Probabilities

P(Mix~)=p(x~Mi)P(Mi)jp(x~Mj)P(Mj)P(M_i|\tilde{x}) = \frac{p(\tilde{x}|M_i)P(M_i)}{\sum_j p(\tilde{x}|M_j)P(M_j)}

Bayesian model averaging weights

Key Components:
  • P(M_i) is prior model probability
  • p(x̃|M_i) is marginal likelihood for model i
  • Denominator normalizes to sum to 1
  • Can be used for model-averaged predictions

Deviance Information Criterion (DIC)

DIC=Dˉ+pD=D(θˉ)+2pDDIC = \bar{D} + p_D = D(\bar{\theta}) + 2p_D

Bayesian model selection criterion

Key Components:
  • D̄ = E[D(θ)] is posterior mean deviance
  • p_D is effective number of parameters
  • D(θ) = -2 log p(x̃|θ) is deviance
  • Lower DIC indicates better model fit
Computational Methods
Numerical methods for Bayesian computation

Metropolis-Hastings Algorithm

α(θ(t),θ)=min(1,π(θ)q(θ(t)θ)π(θ(t))q(θθ(t)))\alpha(\theta^{(t)}, \theta^*) = \min\left(1, \frac{\pi(\theta^*)q(\theta^{(t)}|\theta^*)}{\pi(\theta^{(t)})q(\theta^*|\theta^{(t)})}\right)

MCMC acceptance probability

Key Components:
  • θ* is proposed new state
  • θ^(t) is current state
  • q(·|·) is proposal distribution
  • Accept θ* with probability α, otherwise stay at θ^(t)

Gibbs Sampling Update

θi(t+1)π(θiθi(t),x~)\theta_i^{(t+1)} \sim \pi(\theta_i | \theta_{-i}^{(t)}, \tilde{x})

Sample each parameter from its full conditional

Key Components:
  • θ_{-i} denotes all parameters except θ_i
  • Full conditional is often recognizable distribution
  • Effective for conjugate models
  • Special case of Metropolis-Hastings with acceptance rate 1

Variational Approximation

q(θ)=argminqQKL(q(θ)π(θx~))q^*(\theta) = \arg\min_{q \in \mathcal{Q}} KL(q(\theta) || \pi(\theta|\tilde{x}))

Approximate posterior via optimization

Key Components:
  • q(θ) is approximate posterior in family Q
  • KL divergence measures approximation quality
  • Mean-field assumption: q(θ) = Πq_i(θ_i)
  • Faster than MCMC but less accurate
Usage Guidelines & Best Practices

When to Use Bayesian Methods:

  • Prior knowledge available: Incorporate expert opinions or historical data
  • Small sample sizes: Prior information helps with limited data
  • Sequential analysis: Update beliefs as new data arrives
  • Decision making: Need probability statements about parameters
  • Complex models: Hierarchical or multilevel structures

Implementation Tips:

  • Check convergence: Use R-hat, effective sample size diagnostics
  • Prior sensitivity: Try multiple reasonable priors
  • Posterior predictive checks: Validate model fit
  • Report uncertainty: Always include credible intervals
  • Computational efficiency: Use conjugate priors when possible