Calculate p-values for z-tests, t-tests, and chi-square tests with step-by-step solutions
Z-test: z = 2.5, two-tailed
T-test: t = 1.8, df = 15, two-tailed
Chi-square: χ² = 12.5, df = 6
Z-test: z = 3.0, one-tailed
T-test: t = -2.1, df = 25, one-tailed
Chi-square: χ² = 25.2, df = 10
The p-value is the probability of observing a test statistic as extreme as, or more extreme than, the observed value, assuming the null hypothesis is true.
Z-test: For large samples or known population variance
T-test: For small samples with unknown variance
Chi-square: For categorical data and goodness-of-fit tests
Determine if a new medication performs significantly better than placebo by comparing treatment group outcomes using p-values.
Evaluate whether design changes increase conversion rates by testing if differences between versions A and B are statistically significant.
Test if product defect rates exceed acceptable thresholds or if production batches meet quality specifications.
Analyze poll data to determine if observed differences between demographic groups reflect true population differences or sampling variation.
Test if a trading algorithm generates returns significantly different from market benchmarks or if risk factors matter.
Confusing p-value with hypothesis probability
P-value is NOT the probability that your hypothesis is true. It's the probability of seeing your data IF the null hypothesis were true.
Using 0.05 as an absolute threshold
The 0.05 cutoff is arbitrary convention. P = 0.051 is not fundamentally different from p = 0.049. Consider context and effect size.
Ignoring practical significance
With large samples, tiny meaningless effects can have p \u003c 0.001. Always ask: Is the effect size meaningful in practice?
P-hacking and multiple testing
Running many tests and only reporting significant ones inflates Type I error. Use corrections like Bonferroni for multiple comparisons.
Best Practice
Report p-values alongside effect sizes, confidence intervals, and sample sizes. Interpret results in context, not just by arbitrary thresholds.
| P-Value Range | Interpretation | Common Usage | Strength of Evidence |
|---|---|---|---|
| p < 0.001 | Highly significant | Medical trials, safety-critical research | Very strong evidence against H₀ |
| 0.001 ≤ p < 0.01 | Very significant | Experimental research, clinical studies | Strong evidence against H₀ |
| 0.01 ≤ p < 0.05 | Significant (conventional) | Most scientific research, standard threshold | Moderate evidence against H₀ |
| 0.05 ≤ p < 0.10 | Marginally significant | Exploratory research, preliminary findings | Weak evidence, suggestive trend |
| p ≥ 0.10 | Not significant | Fail to reject null hypothesis | Insufficient evidence against H₀ |
Important Note: These thresholds are conventions, not universal laws. Always consider your field's standards, sample size, effect size, and practical importance when interpreting p-values.
Hypothesis testing involves two types of potential errors, each with different consequences and controlled by different parameters.
Definition: Rejecting H₀ when it's actually true
Probability: α (significance level, typically 0.05)
Example: Concluding a drug works when it doesn't
Control: Lowering α (use stricter threshold like 0.01)
Definition: Failing to reject H₀ when it's actually false
Probability: β (depends on sample size and effect size)
Example: Missing a real drug effect in clinical trial
Control: Increase sample size or relax α
Power is the probability of correctly rejecting H₀ when it's false. Higher power (typically 0.80 or 80%) means better ability to detect true effects. Power increases with larger sample sizes and larger effect sizes.
| H₀ is True | H₀ is False | |
|---|---|---|
| Reject H₀ | Type I Error (α) | Correct Decision (Power = 1-β) |
| Fail to Reject H₀ | Correct Decision (1-α) | Type II Error (β) |
Free lessons on significance tests, p-values, and Type I/II errors.
Open-source college textbook covering null hypotheses, p-values, and test procedures.
The American Statistical Association's official guidance on proper use and interpretation of p-values.