The Question That Stumped Me in Stats Class
I was sitting in my first statistics course, staring at the standard deviation formula on the board. The professor had just written:
I raised my hand. "Why do we square the differences, then take the square root? Why not just take the absolute value of the differences and average those?"
The professor paused. "Good question. The short answer is that squaring has better mathematical properties. The long answer..." He trailed off and moved on.
That non-answer bothered me for years. Turns out, there are three very good reasons for the squaring — and none of them are "because the textbook says so."
Reason 1: Negative Differences Would Cancel Out
Say you have test scores: 70, 80, 90, 100, 110. The mean is 90. If you just subtract each score from the mean, you get:
70 - 90 = -20
80 - 90 = -10
90 - 90 = 0
100 - 90 = +10
110 - 90 = +20
Sum: -20 - 10 + 0 + 10 + 20 = 0
The negatives and positives cancel perfectly. The average deviation is always zero, which tells you nothing about spread. Squaring makes everything positive, so the deviations add up instead of canceling.
You could use absolute values instead — that is called Mean Absolute Deviation (MAD). It works, but it has worse mathematical properties for calculus and probability theory. More on that in a moment.
Reason 2: Squaring Punishes Outliers Harder
Imagine two datasets with the same mean (50):
Dataset A: Tight Cluster
48, 49, 50, 51, 52
Deviations: -2, -1, 0, +1, +2
Squared: 4, 1, 0, 1, 4
Variance: 2.5
Std Dev: 1.58
Dataset B: Wide Spread
30, 40, 50, 60, 70
Deviations: -20, -10, 0, +10, +20
Squared: 400, 100, 0, 100, 400
Variance: 250
Std Dev: 15.81
Dataset B has 10× the deviation range, but squaring makes the variance 100× larger. This is a feature, not a bug. Outliers have disproportionate impact on standard deviation, which makes it sensitive to extreme values — exactly what you want when detecting anomalies or measuring risk.
In finance, this is why volatility (standard deviation of returns) spikes dramatically during market crashes. A few extreme days dominate the calculation because their squared deviations are massive.
Reason 3: Calculus and Probability Need It
Here is the real reason statisticians love squaring: it makes the math work beautifully with derivatives and probability distributions.
The normal distribution (bell curve) has this probability density function:
Notice the in the exponent? That is the squared deviation.
The normal distribution is defined by squaring. If you tried to build a probability distribution around absolute deviations instead, the math would break — you would lose differentiability at zero (absolute value has a sharp corner), and the central limit theorem would not work.
When you take derivatives to find maximum likelihood estimators or minimize loss functions in machine learning, squared terms give you clean, solvable equations. Absolute values give you piecewise functions that are harder to optimize.
Why Take the Square Root at the End?
If squaring is so great, why undo it with a square root? Because variance (the squared deviations) is in squared units. If you are measuring test scores, variance is in "points squared" — which makes no intuitive sense.
Taking the square root brings you back to the original units. A standard deviation of 15 points means "on average, scores deviate by about 15 points from the mean." That is interpretable. A variance of 225 points² is not.
Variance is mathematically convenient (it adds when you combine independent variables). Standard deviation is human-readable (it is in the same units as your data). You need both.
Frequently Asked Questions
What is the difference between variance and standard deviation?
Variance is the average of squared deviations. Standard deviation is the square root of variance. Variance is in squared units (e.g., dollars²), which is hard to interpret. Standard deviation is in the original units (e.g., dollars), which is intuitive. Mathematically, variance is more convenient; practically, standard deviation is more useful.
When should I use MAD instead of standard deviation?
Use MAD when you have extreme outliers that you want to downweight, or when your data is not normally distributed. Use standard deviation when you are working with normal distributions, need mathematical properties like additivity, or want outliers to have strong influence (e.g., risk measurement, quality control).
Why is standard deviation used in finance?
Standard deviation measures volatility — how much returns fluctuate around the average. Squaring amplifies large swings (crashes and rallies), which is exactly what investors care about. A stock with 20% standard deviation is riskier than one with 10% SD, and the squaring makes that difference dramatic. It is also the foundation of the Sharpe ratio and modern portfolio theory.