The Red Ink That Changed How I Think About Sample Size
I used a z-score on a dataset with 12 observations in my first stats project. My professor wrote "use t" in red ink. I didn't understand why until I saw how much the confidence interval changed — it widened by nearly 40%. Same data. Same confidence level. Completely different conclusion about whether the result was significant.
That's the thing about critical values. Pick the wrong distribution and your entire analysis shifts. Your confidence interval is either too narrow (overconfident) or too wide (useless). The z-score vs. t-score decision isn't a technicality. It's the fork in the road that determines whether your results hold up.
Most stats textbooks bury this choice in a footnote. But if you're running a hypothesis test or building a confidence interval with real data — clinical trial results, A/B test conversions, survey responses — this is the first question you need to answer.
The 30-Second Decision Tree: Z or T?
Two questions. That's all it takes.
Question 1: Do you know the population standard deviation (σ)? Not the sample standard deviation you calculated from your data — the true population parameter. If yes, use z. If no (and you almost never do), move to question 2.
Question 2: Is your sample size above 30? If yes, z works as an approximation because the t-distribution converges toward the normal distribution at larger sample sizes. If no, you must use t.
Z vs. T Decision Flowchart
The practical rule: if you're using sample data (s instead of σ) and n < 30, always use t.
Here's the honest truth: in the real world, you almost never know σ. You're working with sample data. So the real question is usually just about sample size — and even then, many statisticians argue you should default to t regardless. The t-distribution with large degrees of freedom is virtually identical to the z-distribution anyway.
Same Data, Two Formulas, Different Answers
This is where the choice actually bites you. Say you're testing whether a new tutoring method improves test scores. You collect data from 12 students. The sample mean score is 78, the hypothesized population mean is 72, and the sample standard deviation is 10.
Both formulas look almost identical. The z-score formula uses the population standard deviation:
The t-score formula swaps in the sample standard deviation:
Wait — the numbers are the same? Yes. The test statistic itself is identical when σ and s happen to equal the same value. The difference isn't in the calculation. It's in the critical value you compare it against.
At a 95% confidence level, the z critical value is always 1.96. But the t critical value with 11 degrees of freedom (n - 1 = 12 - 1) is 2.201. That's 12% higher. Your test statistic of 2.078 clears the z-threshold but falls short of the t-threshold. One says significant, the other says not enough evidence.
Same data. Same math. Opposite conclusions. That red ink from my professor suddenly made a lot more sense.
The Critical Value Table Your Textbook Should Have Led With
Critical values change based on your confidence level and — for the t-distribution — your degrees of freedom (df). Here's a side-by-side comparison showing how t-values shrink toward z-values as sample size grows.
| Confidence Level | Z Critical Value | T (df = 5) | T (df = 11) | T (df = 29) | T (df = 120) |
|---|---|---|---|---|---|
| 90% | 1.645 | 2.015 | 1.796 | 1.699 | 1.658 |
| 95% | 1.960 | 2.571 | 2.201 | 2.045 | 1.980 |
| 99% | 2.576 | 4.032 | 3.106 | 2.756 | 2.617 |
See the pattern? At df = 5 (a sample of 6 people), the 95% t critical value is 2.571 — a full 31% larger than the z-value of 1.96. By df = 120, the gap shrinks to about 1%. That's why the "n ≥ 30" rule of thumb exists: beyond 30 observations, the t-distribution is close enough to normal that the difference rarely changes your conclusion.
But "close enough" isn't "identical." I've seen A/B tests at startups where the sample was 35 users per group and the team used z-scores because "it's over 30." Their confidence interval was slightly too narrow, and the "winning" variant turned out to be noise. Small margins matter when money is on the line.
Degrees of Freedom: The Penalty for Not Knowing σ
Degrees of freedom (df) is one of those terms that sounds more intimidating than it is. For a one-sample t-test, it's just . You have 20 data points? Your df is 19.
But why subtract 1? Because you used your data to estimate the standard deviation. That "costs" you one degree of freedom — you've consumed one piece of independent information. Think of it like a budget: you started with n independent observations, spent one to calculate s, and now you've got n - 1 left for your test.
William Sealy Gosset — a chemist at the Guinness brewery in Dublin — figured this out in 1908. He was working with tiny samples of barley yields (sometimes as few as 3 or 4 observations) and realized the normal distribution gave confidence intervals that were way too tight for small samples. His solution was the t-distribution, published under the pen name "Student" because Guinness didn't want competitors knowing they used statistics. Hence "Student's t-distribution."
A brewer. Not a mathematician. Sometimes the best insights come from people who actually need the answer, not people writing proofs.
How the Wrong Choice Inflates (or Deflates) Your Confidence Interval
Back to our tutoring example. Sample mean of 78, sample standard deviation of 10, n = 12. Let's build a 95% confidence interval both ways and see what happens.
Using z (incorrectly, since we don't know σ):
Using t with df = 11 (correctly):
The z-based interval is 11.32 points wide. The t-based interval is 12.70 points wide — about 12% wider. That extra width isn't sloppiness. It's honesty. The t-distribution accounts for the additional uncertainty that comes from estimating σ with a small sample. The z-distribution pretends that uncertainty doesn't exist.
Z-Based (Wrong for n=12)
[72.34, 83.66]
Width: 11.32 — looks precise, but it's overconfident. The interval is too narrow to reflect the real uncertainty in your estimate.
T-Based (Correct for n=12)
[71.65, 84.35]
Width: 12.70 — wider, but it honestly reflects the uncertainty from a small sample with an estimated σ.
With 200 observations? The difference between z and t shrinks to fractions of a decimal. The choice barely matters. But with 12? It can flip your conclusion. And in fields like clinical research, where sample sizes of 10–20 are common in early-phase trials, this isn't academic — it's the difference between advancing a drug to the next phase or shelving it.
When Even Experienced Analysts Get This Wrong
I've reviewed dashboards at companies where the analytics team hard-coded z = 1.96 into every confidence interval calculation. Didn't matter if the sample was 15 users or 15,000. Same critical value. Nobody questioned it because the code "worked."
It worked in the sense that it produced a number. Whether that number meant anything for the small-sample segments — new markets, niche user cohorts, pilot programs — was a different story. Their p-values looked great on paper. The replication rate told a different story.
The fix isn't complicated. If you're estimating σ from your sample (which is almost always the case), use the t-distribution. Let the degrees of freedom do their job. As your sample grows, the t-value converges to z anyway — you lose nothing by defaulting to t, but you risk real errors by defaulting to z.
And if you're working with averages from skewed distributions, the choice of z vs. t is only half the battle. Make sure the average you're testing is the right one to begin with.
Frequently Asked Questions
What is a critical value in statistics?
A critical value is the threshold on a test statistic's distribution that separates the "reject" region from the "fail to reject" region in a hypothesis test. At a 95% confidence level (two-tailed), the z critical value is ±1.96 — meaning if your test statistic exceeds 1.96 or falls below -1.96, you reject the null hypothesis. For the t-distribution, the critical value depends on degrees of freedom and is always larger than z for small samples.
When should I use a z-score instead of a t-score?
Use a z-score only when you know the true population standard deviation (σ) — not an estimate from your sample. In practice, this is rare outside of standardized testing or manufacturing with known process parameters. If you're calculating standard deviation from your data (which is the typical case), use the t-distribution. The common "n ≥ 30" shortcut works because the t-distribution closely approximates the normal distribution at higher degrees of freedom, but defaulting to t is always the safer choice.
What are degrees of freedom and why do they matter?
Degrees of freedom (df) represent the number of independent values that can vary in your calculation. For a one-sample t-test, df = n - 1 because one "degree" is used up estimating the sample standard deviation. Lower df means heavier tails on the t-distribution, which produces larger critical values and wider confidence intervals. This is the t-distribution's way of penalizing you for the extra uncertainty that comes with small samples.