Lesson 5-1: Confidence Intervals & Estimation

What You Will Learn

Construct confidence intervals for means and proportions.
Compute standard error and margin of error.
Choose appropriate critical values and interpret confidence.
Explain how sampling distributions justify intervals.

Prerequisites

Mean, standard deviation, proportion basics.
Normal distribution intuition and z-scores.
Comfort with algebraic manipulation.

Point vs Interval Estimation

Point Estimate

A point estimate uses a single value to approximate a population parameter. For instance, the sample mean $\bar{x}$ estimates the population mean $\mu$ ; the sample proportion $\hat{p}$ estimates the population proportion $p$ .

Example: In a sample of n=100 devices, 52 are functioning after 1,300 hours. The point estimate of the survival proportion is $\hat{p}=0.52$ .

Interval Estimate

An interval estimate provides a range of plausible values for the parameter and is accompanied by a confidence level. It conveys uncertainty due to sampling variability. A common structure is

\text{estimate} \pm z^* \times \text{SE}

where $z^*$ depends on the desired confidence (e.g., 1.645 for 90%, 1.96 for 95%).

Standard Error and Margin of Error

Mean (Unknown Population SD)

\text{SE}_{\bar{x}} = \dfrac{s}{\sqrt{n}}

CI: \quad \bar{x} \pm z^* \cdot \dfrac{s}{\sqrt{n}}

Valid when n is reasonably large or the sample mean is approximately normal by the Central Limit Theorem.

Proportion

\text{SE}_{p} = \sqrt{ \dfrac{ \hat{p}(1-\hat{p}) }{ n } }

CI: \quad \hat{p} \pm z^* \cdot \sqrt{ \dfrac{ \hat{p}(1-\hat{p}) }{ n } }

Use when conditions for normal approximation are met (e.g., $np\ge10$ and $n(1-p)\ge10$ heuristics).

Margin of Error

E = z^* \times SE

Increasing sample size reduces SE, thereby reducing the margin of error for a fixed confidence level.

Example: Proportion Confidence Interval

Survey 1,000 voters: 520 support a proposal. Construct a 95% CI for the true support proportion.

$\hat{p}=0.52$ , $n=1000$ , $z^*=1.96$ .

SE=\sqrt{ \dfrac{0.52\cdot0.48}{1000} } \approx 0.0158

E=1.96\times 0.0158 \approx 0.031

CI: \quad 0.52 \pm 0.031 = [0.489, 0.551]

Interpretation: We are 95% confident the true support lies between 48.9% and 55.1%.

Example: Mean Confidence Interval

Sample 50 devices: mean lifetime $\bar{x}=1300$ hours, sample SD $s=150$ hours. Construct a 95% CI for the true mean lifetime.

SE=\dfrac{150}{\sqrt{50}} \approx 21.21

E=1.96\times 21.21 \approx 41.57

CI: \quad 1300 \pm 41.57 = [1258.43, 1341.57]

Interpretation: We are 95% confident the population mean lifetime is between 1,258.43 and 1,341.57 hours.

Interpretation Pitfalls

Point vs Interval Estimates

Point estimates summarize with a single best guess (e.g., sample mean).
Interval estimates quantify uncertainty via a range plus confidence level.

Standard Error (SE)

For a sample mean: SE = s / sqrt(n).
For a sample proportion: SE = sqrt(p̂(1 - p̂) / n).

Critical Values

Common: z* ≈ 1.96 for 95% confidence.
Higher confidence → larger z* → wider interval.

Common Misreadings

The 95% refers to the procedure over many samples, not a probability that this specific interval contains the parameter.
Higher confidence widens intervals; smaller samples widen intervals.
If conditions for normal approximation fail, a different method (or a larger n) is needed.

Guided Practice

Practice 1: Population Mean

Sample size n=25, sample mean 47.3, sample std 8.2. Build 95% CI for μ.
Calculate SE = s/√n = 8.2/√25 = 1.64.
Use z* = 1.96 for 95% confidence level.
CI: 47.3 ± 1.96(1.64) = (44.08, 50.52).

Practice 2: Population Proportion

Survey: 156 success in 400 trials. Find 90% CI for p.
Calculate p̂ = 156/400 = 0.39, SE = √(0.39×0.61/400) = 0.0244.
Use z* = 1.645 for 90% confidence level.
CI: 0.39 ± 1.645(0.0244) = (0.35, 0.43).

Practice 3: Margin of Error

Given 99% CI for mean: (23.1, 28.9). Find margin of error.
ME = (28.9 - 23.1)/2 = 2.9.
Point estimate = (23.1 + 28.9)/2 = 26.0.
Interpret: 99% confident true mean is within 2.9 of 26.0.

Practice 4: Sample Size Planning

Want ME ≤ 3 for 95% CI of mean, estimate σ ≈ 15. Find needed n.
Formula: n = (z*σ/ME)² = (1.96×15/3)² = 96.04.
Round up: need n = 97 observations.
Check: larger sample gives smaller ME for same confidence.

Practice 5: Confidence Level Effects

Same data: build 90%, 95%, 99% CIs and compare widths.
z* values: 1.645, 1.96, 2.576 respectively.
Higher confidence → larger z* → wider interval.
Trade-off: precision vs. confidence level.

Practice 6: Interpretation Practice

Quality control: 95% CI for defect rate is (0.02, 0.08).
Correct interpretation: "95% confident true rate is between 2% and 8%".
Avoid: "95% chance true rate is in interval" (frequentist view).
Decide: Is defect rate acceptably low based on CI?

Practice 7: Assumptions Check

Before CI construction, verify: random sampling, independence.
For means: normality or large sample (CLT applies when n≥30).
For proportions: np̂≥10 and n(1-p̂)≥10.
When assumptions fail, consider alternative methods.

Practice 8: Real Context Application

Medical study: average recovery time with 95% CI (12.3, 18.7) days.
Interpretation: confident true mean recovery is 12.3-18.7 days.
Decision support: plan hospital resources based on interval.
Consider: does interval suggest treatment is effective?

Mini Cases

Case 1: Customer Satisfaction

Restaurant chain wants to estimate customer satisfaction rating (1-10 scale). Survey 200 customers, mean=7.8, std=1.5.

Build 95% CI for mean satisfaction: 7.8 ± 1.96(1.5/√200).
Check: random sampling, n=200> 30 for CLT validity.
Decision: if CI excludes 7.0, satisfaction above benchmark.

Case 2: Market Research

Brand preference study: 342 of 800 consumers prefer new product. Estimate market share with 90% confidence.

p̂ = 342/800 = 0.4275, check np̂≥10 and n(1-p̂)≥10.
90% CI: 0.4275 ± 1.645√(0.4275×0.5725/800).
Marketing decision: launch if lower bound > 40%.

Case 3: Quality Control

Manufacturing process: measure widget lengths. Sample 50 pieces, mean=12.34cm, std=0.28cm.

99% CI for true mean length: account for strict quality standards.
Assumption: normal distribution of lengths reasonable.
Action: if CI includes 12.30cm spec, process acceptable.

Case 4: Medical Research

Drug trial: 45 of 120 patients show improvement. Estimate treatment effectiveness rate.

95% CI for improvement rate p: verify np̂ and n(1-p̂) ≥ 10.
Compare with historical rate of 30% (placebo effect).
FDA approval: need strong evidence improvement > 30%.

Case 5: Educational Assessment

School district: standardized test scores from 150 students, mean=78.5, std=12.3.

90% CI for district mean score: policy implications.
Large sample: CLT ensures normal sampling distribution.
Budget allocation: higher CI suggests stronger programs.

Case 6: Environmental Study

Water pollution monitoring: 28 of 75 tested sites exceed safety limits.

95% CI for proportion of contaminated sites.
Random sampling assumption critical for validity.
Regulatory action: if CI upper bound > 50%, investigate.

Case 7: Sports Analytics

Basketball player free throw percentage: 156 makes in 200 attempts this season.

95% CI for true free throw rate: contract negotiation context.
Independence assumption: each shot independent outcome.
Team decision: if CI lower bound > 75%, offer extension.

Case 8: Polling & Elections

Pre-election poll: 520 of 1000 likely voters support candidate A.

99% CI for voting percentage: high stakes require confidence.
Sampling method crucial: representative of voter population.
Campaign strategy: if CI includes 50%, race competitive.

Case 9: Financial Planning

Investment return analysis: portfolio average return 8.2% over 60 months, std=3.1%.

95% CI for expected annual return: retirement planning tool.
Assumption: returns approximately normal (CLT with n=60).
Decision: if CI lower bound > 6%, meets retirement goal.

Case 10: Technology Usage

App usage study: average daily screen time 127 minutes among 180 users, std=45 minutes.

90% CI for mean daily usage: product development insights.
Large sample justifies normal approximation to sampling distribution.
Feature design: if CI suggests > 2 hours, add usage controls.

Sample Size Determination

Mean (Known Target Margin)

n = left( dfrac{z^*,s}{E} ight)^2

Choose n to achieve margin E at confidence via pilot s.

Proportion (Conservative)

n = dfrac{ (z^*)^2 cdot 0.25 }{ E^2 }

Use p̂≈0.5 for worst case if no prior estimate.

Bootstrap Confidence Intervals (Concept)

When assumptions are doubtful, resample with replacement to approximate the sampling distribution and form CIs from percentiles.

Resample B times; compute statistic each time.
Take percentile bounds (e.g., 2.5% and 97.5% for 95%).
Check stability vs B and presence of strong skew/outliers.

Extended Case Studies

Case 1: Sample Size Determination

Parameter: population mean height. Goal: 95% CI with ME ≤ 2cm.
Method: z interval, estimate σ from pilot study or literature.
Calculate n = (z*σ/ME)², round up for integer sample size.

Case 2: Bootstrap Concepts

Parameter: median income. Goal: understand sampling distribution.
Method: bootstrap resampling (conceptual) when distribution unknown.
Interpretation: bootstrap CI provides robust alternative to normal theory.

Case 3: Two-Sample Thinking

Parameter: difference in means between two groups.
Method: conceptual extension of one-sample CI to difference.
Decision: if CI for difference excludes 0, significant difference.

Case 4: Power & Sample Size

Parameter: treatment effect size. Goal: adequate power for detection.
Method: relationship between CI width and hypothesis test power.
Decision: narrow CI suggests adequate power to detect meaningful effects.

Case 5: Interval-Test Relationship

Parameter: population proportion. Goal: test specific value.
Method: if 95% CI excludes p₀, reject H₀: p = p₀ at α = 0.05.
Interpretation: CI provides range of plausible parameter values.

Case 6: Robustness Assessment

Parameter: mean response time. Goal: assess assumption sensitivity.
Method: compare normal-based CI with robust alternatives.
Decision: if methods agree, conclusions are robust to assumptions.

Case 7: Cost-Benefit Analysis

Parameter: cost savings per unit. Goal: justify investment decision.
Method: CI for mean savings, consider sampling and measurement costs.
Decision: if CI lower bound > costs, investment is justified.

Case 8: Regulatory Compliance

Parameter: contamination rate. Goal: demonstrate compliance with standards.
Method: CI for proportion, account for regulatory burden of proof.
Decision: CI upper bound must be below regulatory limit.

Case 9: Meta-Analysis Concepts

Parameter: overall effect across studies. Goal: synthesize evidence.
Method: weighted combination of individual study CIs.
Interpretation: combined CI provides stronger evidence than individual studies.

Case 10: Prediction Intervals

Parameter: individual vs. mean prediction. Goal: forecast single observation.
Method: prediction interval wider than CI due to additional variability.
Decision: use appropriate interval type for forecasting vs. estimation.

Case 11: Sequential Analysis

Parameter: treatment effect. Goal: adaptive sample size based on interim results.
Method: monitor CI width and adjust n if precision insufficient.
Decision: stop early if CI achieves target precision or futility.

Case 12: Equivalence Testing

Parameter: difference between treatments. Goal: establish equivalence.
Method: CI for difference, define equivalence margin in advance.
Decision: equivalence if entire CI falls within equivalence region.

Guided Practice Bank

Practice Bank 1: Mean CI

n=36, x̄=24.7, s=4.2. Check normality condition.
SE = 4.2/√36 = 0.7. Point estimate = 24.7.
95% CI: 24.7 ± 1.96(0.7) = (23.33, 26.07).
Larger n gives smaller SE; higher confidence gives larger ME.

Practice Bank 2: Proportion CI

247 successes in 600 trials. Check np̂≥10, n(1-p̂)≥10.
p̂ = 247/600 = 0.412, SE = √(0.412×0.588/600) = 0.0201.
90% CI: 0.412 ± 1.645(0.0201) = (0.379, 0.445).
Quadruple n to halve ME; 99% vs 90% increases width.

Practice Bank 3: Sample Size

Want ME≤0.03 for proportion with 95% confidence.
Use p̂=0.5 (worst case). SE formula gives ME = 1.96√(0.25/n).
Solve: 1.96√(0.25/n) ≤ 0.03 → n ≥ 1067.1 → need n=1068.
Conservative estimate ensures ME requirement met.

Practice Bank 4: Interpretation

CI for mean salary: ($47,200, $52,800). Check for randomness.
Point estimate = $50,000, ME = $2,800.
Interpretation: "95% confident true mean salary is $47,200-$52,800".
Not: "95% probability" - confidence refers to method, not parameter.

Practice Bank 5: Assumptions

Small sample (n=16) from unknown population. Check normality requirement.
If sample shows severe skewness, consider larger n or non-parametric methods.
Bootstrap CI concept: resample to estimate sampling distribution.
Robust methods less sensitive to distributional assumptions.

Practice Bank 6: Applications

Quality control: estimate defect rate in manufacturing process.
Random sample of 200 units, 13 defective. Use proportion CI.
CI for defect rate informs production decisions and standards.
Wide CI suggests need for larger sample for precision.

Practice Bank 7: Confidence Levels

Compare 90%, 95%, 99% CIs for same data. Verify calculations.
Critical values: z* = 1.645, 1.96, 2.576 respectively.
Trade-off: higher confidence means wider interval, less precision.
Choose confidence level based on consequences of being wrong.

Practice Bank 8: Real Data

Survey data: customer satisfaction scores (1-10 scale).
Check assumptions: random sampling, sufficient sample size for CLT.
Construct 95% CI and interpret for business decision-making.
Consider practical significance alongside statistical inference.

Practice Bank 9: Error Analysis

Identify common CI interpretation errors and corrections.
Error: "Parameter is random" vs. Correct: "Method is uncertain".
Error: "Contains 95% of data" vs. Correct: "Estimates parameter".
Emphasize confidence in method, not in specific interval.

Practice Bank 10: Advanced Concepts

Connection between CIs and hypothesis tests: duality principle.
If 95% CI excludes H₀ value, reject at α=0.05 level.
CI provides more information: range of plausible parameter values.
Use CIs to assess practical vs. statistical significance.

FAQ (Extended)

Q: Can CI cover impossible values?

For proportions near 0/1, use adjusted methods; interpret carefully.

Q: Do overlapping 95% CIs imply no difference?

Overlap does not strictly test equality; use hypothesis tests.

Q: How to choose z*?

Map desired confidence to standard normal percentiles.

Q: What if conditions fail?

Consider transformation, larger n, or nonparametric/bootstrap.